PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Wednesday, October 26, 2022

[FIXED] How do I scrape a full instagram page in python?

 October 26, 2022     instagram, python, python-3.x, python-requests     No comments   

Issue

Long story short, I'm trying to create an Instagram python scraper, that loads the entire page and grabs all the links to the images. I have it working, only problem is, it only loads the original 12 photos that Instagram shows. Is there anyway I can tell requests to load the entire page?

Working code;

import json
import requests
from bs4 import BeautifulSoup
import sys

r = requests.get('https://www.instagram.com/accountName/')
soup = BeautifulSoup(r.text, 'lxml')

script = soup.find('script', text=lambda t: t.startswith('window._sharedData'))
page_json = script.text.split(' = ', 1)[1].rstrip(';')
data = json.loads(page_json)
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)

for post in data['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['edges']:
    image_src = post['node']['display_url']
    print(image_src)

Solution

As Scratch already mentioned, Instagram uses "infinite scrolling" which won't allow you to load the entire page. But you can check the total amount of messages at the top of the page (within the first span with the _fd86t class). Then you can check if the page already contains all of the messages. Otherwise, you'll have to use a GET request to get a new JSON response. The benefit to this is that this request contains the first field, which seems to allow you to modify how many messages you get. You can modify this from its standard 12 to get all of the remaining messages (hopefully).

The necessary request looks similar to the following (where I've anonymised the actual entries, and with some help from the comments):

https://www.instagram.com/graphql/query/?query_hash=472f257a40c653c64c666ce877d59d2b&variables={"id":"XXX","first":12,"after":"XXX"}


Answered By - Martijn
Answer Checked By - Dawn Plyler (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing