Issue
I'm trying to request all the sizes in stock from Zalando. I can not quite figure out how to do it since the video I'm watching showing how to request sizes look different than min. The video that I watch was this. Video - 5.30
Does anyone know how to request the sizes in stock and print the sizes that in stock?
The site in trying to request sizes of: here
My code looks like this:
import requests
from bs4 import BeautifulSoup as bs
session = requests.session()
def get_sizes_in_stock():
global session
endpoint = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
response = session.get(endpoint)
soup = bs(response.text, "html.parser")
I have tried to go to the View page source and look for the sizes, but I could not see the sizes in the page source.
I hope someone out there can help me what to do.
Solution
The sizes for the default color of shoe are shown in html. Alongside this are the urls for the other colors. You can extract these into a dictionary and loop, making requests and pulling the different colors and their availability, which I think is what you are actually requesting, as follows (note: I have kept quite generic to avoid hardcoding keys which change across requests):
import requests, re, json
def get_color_results(link):
headers = {"User-Agent": "Mozilla/5.0"}
r = requests.get(link, headers=headers).text
data = json.loads(re.search(r'(\{"enrichedEntity".*size.*)<\/script', r).group(1))
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
return (color, results)
link = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
final = {}
color, results = get_color_results(link)
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
print(final)
Explanatory notes from chat:
Use chrome browser to navigate to link
Press Ctrl + U to view page source
Press Ctrl + F to search for 38.5 in html
The first match is the long string you already know about. The string is long and difficult to navigate in page source and identify which tag it is part of. There are a number of ways I could identify the right script from these, but for now, an easy way would be:
from bs4 import BeautifulSoup as bs
link = 'https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html'
headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get(link, headers = headers)
soup = bs(r.text, 'lxml')
for i in soup.select('script[type="application/json"]'):
if '38.5' in i.text:
print(i)
break
Slower method would be:
soup.find("script", text=re.compile(r'.*38.5.*'))
Whilst I used bs4 to get the right script tag contents, this was so I knew the start and end of the string denoting the JavaScript object I wanted to use
re
to extract, and then to deserialize into a JSON object withjson
; this in a re-write to usere
rather thanbs4
i.e. usere
on entire response text, from the request, and pass a regex pattern which would pull out the same stringI put the entire page source in a regex tool and wrote a regex to return that same string as identified above. See that regex here
Click on right hand side, match 1 group 1, to see highlighted the same string being returned from regex as you saw with BeautifulSoup. Two different ways of getting the same string containing the sizes
That is the string which I needed to examine, as JSON, the structure of. See in json viewer here
You will notice the JSON is very nested with some keys to dictionaries that are likely dynamic, meaning I needed to write code which could traverse the JSON and use certain more stable keys to pull out the colours available, and for the default shoe colour the sizes and availability
There is an expand all button in that JSON viewer. You can then search with Ctrl + F for 38.5 again
10a) I noticed that size and availability were for the default shoe colour
10b) I also noticed that within JSON, if I searched by one of the other colours from the dropdown, I could find URIs for each colour of show listed
- I used Wolf as my search term (as I suspected less matches for that term within the JSON)
You can see one of the alternate colours and its URI listed above
I visited that URI and found the availability and shoe sizes for that colour in same place as I did for the default white shoes
I realised I could make an initial request and get the default colour and sizes with availability. From that same request, extract the other colours and their URIs
I could then make requests to those other URIs and re-use my existing code to extract the sizes/availability for the new colours
This is why I created my
get_color_results()
function. This was the re-usable code to extract the sizes and availability from each pageresults
holds all the matches within the JSON to certain keys I am looking for to navigate to the right place to get the sizes and availabilities, as well as the current colourThis code traverses the JSON to get to the right place to extract data I want to use later
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
- The following pulls out the sizes and availability from results:
{
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
- For the first request only, the following gets the other shoes colours and their URIs into a dictionary to later loop:
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
- This bit gets all the other colours and their availability:
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
- Throughout, I update the dictionary final with the found colour and associated size and availabilities
Answered By - QHarr Answer Checked By - Senaida (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.