Issue

I am trying to web scrape some data inside a JavaScript tag in a HTML source.

The situation: I can get to the appropriate <script></script> tag. But inside that tag, there is a big string, which needs to be converted and then parsed so I can get the precise data that I need.

The problem is: I don't know how to do that and can't find a clear and satisfying answer to do it.

Here is the code:

My goal is to get this data: "xe7fd4c285496ab91" which is the identification number of the content, also called "contentId".

import requests
import bs4
import re

url = 'https://www.khanacademy.org/computing/computer-programming/programming/drawing-basics/pt/making-drawings-with-code'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text,'html.parser') # by the way I am not sure if this is the right way to parse the link

item = soup.find(string=re.compile('contentId')) # with this line I can get directly to the exact JavaScript tag that I need

print(item) # but as you can see, it's a pretty big string, and I need to parse it to get the desired data. But you can find that the desired data "xe7fd4c285496ab91" is in it.

I tried to use json.parse() but it is not working:

import json
jsonparsed=json.parse(item)

Get this error:

AttributeError: 'NavigableString' object has no attribute 'json'

My question is: How can I get the desired data? Is there a function to convert the string into JavaScript so I can parse it? Or a way to convert this string into a JSON file?

(Keep in mind that I will do this on multiple links with similar HTML/JavaScript).

Solution

You could just stick with regex on text alone without searching for script

import re
import requests

r = requests.get('https://www.khanacademy.org/computing/computer-programming/programming/drawing-basics/pt/making-drawings-with-code')
p = re.compile(r'contentId":"((?:(?!").)*)')  
i = p.findall(r.text)[0]
print(i)

Regex

Answered By - QHarr

Answer Checked By - Terry (PHPFixing Volunteer)

Wednesday, September 7, 2022

[FIXED] How to parse JavaScript code in html source in Python?

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Wednesday, September 7, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To