PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Thursday, July 21, 2022

[FIXED] What is the easiest way to append data to Pandas DataFrame?

 July 21, 2022     append, beautifulsoup, dataframe, pandas, python     No comments   

Issue

I am trying to append scraped data to a dataframe:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import requests
import csv
url="https://en.wikipedia.org/wiki/List_of_German_football_champions"
page=requests.get(url).content
soup=BeautifulSoup(page,"html.parser")

seasons=[]
first_places=[]
runner_ups=[]
third_places=[]
top_scorrers=[]

tbody=soup.find_all("tbody")[7]
trs=tbody.find_all("tr")
for tr in trs:
    season = tr.find_all("a")[0].text
    first_place = tr.find_all("a")[1].text
    runner_up = tr.find_all("a")[2].text
    third_place = tr.find_all("a")[3].text
    top_scorer = tr.find_all("a")[4].text
    seasons.append(season)
    first_places.append(first_place)
    runner_ups.append(runner_up)
    third_places.append(third_place)
    top_scorrers.append(top_scorer)

tuples=list(zip(seasons,first_places,runner_ups,third_places,top_scorrers))
df=pd.DataFrame(tuples,columns=["Season","FirstPlace","RunnerUp","ThirdPlace","TopScorrer"])
df

enter image description here

Is there an easier way to append data directly to an empty dataframe without creating lists and then zipping them?


Solution

While still using pandas "simplest" way to create your DataFrame is going with pandas.read_html():

import pandas as pd

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_German_football_champions')[7]

To simply rename the columns and get rid of the [7]:

df.columns = ['Season', 'Champions', 'Runners-up', 'Third place',
   'Top scorer(s)', 'Goals']

Output:

Season Champions Runners-up Third place Top scorer(s) Goals
0 1963–64 1. FC Köln (2) Meidericher SV Eintracht Frankfurt Uwe Seeler 30
1 1964–65 Werder Bremen (1) 1. FC Köln Borussia Dortmund Rudi Brunnenmeier 24
2 1965–66 TSV 1860 Munich (1) Borussia Dortmund Bayern Munich Friedhelm Konietzka 26
3 1966–67 Eintracht Braunschweig (1) TSV 1860 Munich Borussia Dortmund Lothar Emmerich, Gerd Müller 28
4 1967–68 1. FC Nürnberg (9) Werder Bremen Borussia Mönchengladbach Hannes Löhr 27

...


An alternativ to avoid all these lists, get cleaner in process and using BeautifulSoup directly is to create more structured data - A single list of dicts:

data = []

for tr in soup.select('table:nth-of-type(8) tr:not(:has(th))'):
    data.append({
        'season':tr.find_all("a")[0].text,
        'first_place': tr.find_all("a")[1].text,
        'runner_up': tr.find_all("a")[2].text,
        'third_place': tr.find_all("a")[3].text,
        'top_scorer': tr.find_all("a")[4].text,
    })

pd.DataFrame(data)
Example
import pandas as pd
from bs4 import BeautifulSoup
import requests

url="https://en.wikipedia.org/wiki/List_of_German_football_champions"
page=requests.get(url).content
soup=BeautifulSoup(page,"html.parser")

data = []

for tr in soup.select('table:nth-of-type(8) tr:not(:has(th))'):
    data.append({
        'season':tr.find_all("a")[0].text,
        'first_place': tr.find_all("a")[1].text,
        'runner_up': tr.find_all("a")[2].text,
        'third_place': tr.find_all("a")[3].text,
        'top_scorer': tr.find_all("a")[4].text,
    })

pd.DataFrame(data)


Answered By - HedgeHog
Answer Checked By - Robin (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing