Tuesday, June 28, 2022

[FIXED] How to use groupby on a dataframe

Issue

I have a dataframe (survey) in which i need to groupby 2 columns. One of the 2 columns is a ranking (5 options : Very Poor, Poor, Average, Good and Excellent) and the second one is a list of times. I need to groupby both of those columns like that :

raking    |   Time   |  Count of how many times the time appears on the column "time" for a raking  
-------------------------------------
Very poor |  0.0     |   6
          |  1.0     |   2    
          |  2.0     |   9             
-------------------------------------                              
Poor      |  0.0     |   3                           
          |  1.0     |   12                          
...

I need to show the results of these table in 5 graphs (one for each raking), with x=Time and Y=Count

I've been stuck for a few hours now, can someone help???


Solution

Setup a MRE:

rank = ['Very Poor', 'Poor', 'Average', 'Good', 'Excellent']
df = pd.DataFrame({'Ranking':  np.random.choice(rank, 100),
                   'Time': np.random.randint(1, 50, 100)})
print(df)

# Output:
      Ranking  Time
0   Excellent    28
1        Poor    33
2   Excellent    28
3     Average    22
4   Very Poor    11
..        ...   ...
95  Very Poor    13
96    Average    26
97  Very Poor    23
98       Good    24
99       Good    36

[100 rows x 2 columns]

Use value_counts to count (Ranking, Time) rather than groupby:

count = df.value_counts(['Ranking', 'Time']).rename('Count').reset_index()
print(count)

# Output:
      Ranking  Time  Count
0        Poor    41      3
1   Very Poor    46      3
2   Very Poor    49      2
3   Very Poor    17      2
4   Excellent    20      2
..        ...   ...    ...
81  Excellent    34      1
82  Excellent    32      1
83  Excellent    27      1
84  Excellent    26      1
85       Good    32      1

[86 rows x 3 columns]

To visualize data, the easiest way is to use seaborn and displot:

# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import seaborn as sns
import matplotlib.pyplot as plt

sns.displot(df, x='Time', col='Ranking', binwidth=1)
plt.show()

enter image description here



Answered By - Corralien
Answer Checked By - Gilberto Lyons (PHPFixing Admin)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.