PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, June 28, 2022

[FIXED] How to use groupby on a dataframe

 June 28, 2022     dataframe, graph, pandas, python     No comments   

Issue

I have a dataframe (survey) in which i need to groupby 2 columns. One of the 2 columns is a ranking (5 options : Very Poor, Poor, Average, Good and Excellent) and the second one is a list of times. I need to groupby both of those columns like that :

raking    |   Time   |  Count of how many times the time appears on the column "time" for a raking  
-------------------------------------
Very poor |  0.0     |   6
          |  1.0     |   2    
          |  2.0     |   9             
-------------------------------------                              
Poor      |  0.0     |   3                           
          |  1.0     |   12                          
...

I need to show the results of these table in 5 graphs (one for each raking), with x=Time and Y=Count

I've been stuck for a few hours now, can someone help???


Solution

Setup a MRE:

rank = ['Very Poor', 'Poor', 'Average', 'Good', 'Excellent']
df = pd.DataFrame({'Ranking':  np.random.choice(rank, 100),
                   'Time': np.random.randint(1, 50, 100)})
print(df)

# Output:
      Ranking  Time
0   Excellent    28
1        Poor    33
2   Excellent    28
3     Average    22
4   Very Poor    11
..        ...   ...
95  Very Poor    13
96    Average    26
97  Very Poor    23
98       Good    24
99       Good    36

[100 rows x 2 columns]

Use value_counts to count (Ranking, Time) rather than groupby:

count = df.value_counts(['Ranking', 'Time']).rename('Count').reset_index()
print(count)

# Output:
      Ranking  Time  Count
0        Poor    41      3
1   Very Poor    46      3
2   Very Poor    49      2
3   Very Poor    17      2
4   Excellent    20      2
..        ...   ...    ...
81  Excellent    34      1
82  Excellent    32      1
83  Excellent    27      1
84  Excellent    26      1
85       Good    32      1

[86 rows x 3 columns]

To visualize data, the easiest way is to use seaborn and displot:

# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import seaborn as sns
import matplotlib.pyplot as plt

sns.displot(df, x='Time', col='Ranking', binwidth=1)
plt.show()

enter image description here



Answered By - Corralien
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing