PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, August 2, 2022

[FIXED] How to modify code to scrape data off of 2nd table on this webpage

 August 02, 2022     beautifulsoup, dataframe, html-table, pandas-groupby, python-3.8     No comments   

Issue

I am trying to scrape data from a table on the following website: https://www.eliteprospects.com/league/nhl/stats/2021-2022

This is the code I found to successfully scrape off data from the first table for skater stats:

import requests
import pandas as pd
from bs4 import BeautifulSoup

dfs = []
for page in range(1,10):
    url = f"https://www.eliteprospects.com/league/nhl/stats/2021-2022?sort=tp&page={page}"
    print(f"Loading {url=}")
    soup = BeautifulSoup(requests.get(url).content, "html.parser")

    df = (
        pd.read_html(str(soup.select_one(".player-stats")))[0]
        .dropna(how="all")
        .reset_index(drop=True)
    )
    dfs.append(df)

df_final = pd.concat(dfs).reset_index(drop=True)
print(df_final)
df_final.to_csv("data.csv", index=False)

But I am having difficulty scraping off the goalie stats from the bottom table. Any idea how to modify the code to get the stats from the bottom table? I tried changing line 13 to "(".goalie-stats")" but it returned an error when I tried to run the code.

Thank you!!


Solution

I found a way to get the data, but it isn't perfect. When I get it, it makes a lot of unnamed columns. Still, it gets the data, so I hope it's helpful

import requests
import pandas as pd
from bs4 import BeautifulSoup

dfs = []
for page in range(1,3):
    url = f"https://www.eliteprospects.com/league/nhl/stats/2021-2022?sort-goalie-stats=svp&page-goalie={page}#goalies"
    print(f"Loading {url=}")
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    df = (
        pd.read_html(str(soup.select_one(".goalie-stats")).replace('%', ''))[0]
        .dropna(how="all")
        .reset_index(drop=True)
    )
    dfs.append(df)

df_final = pd.concat(dfs).reset_index(drop=True)
print(df_final)
df_final.to_csv("data.csv", index=False)


Answered By - Jet_Mouse
Answer Checked By - Candace Johnson (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing