Issue
I have this CSV file:
ID,NAME,CITY,COUNTRY,CPERSON,EMPLCNT,CONTRCNT,CONTRCOST
00000001,Breadpot,Sydney,Australia,Sam.Keng@info.com,250,48,1024.00
00000002,Hoviz,Manchester,UK,harry.ham@hoviz.com,150,7,900.00
00000003,Hoviz,London,UK,hamlet.host@hoviz.com,1500,12800,10510.50
00000004,Grenns,London,UK,grenns@grenns.com,200,12800,128.30
00000005,Magnolia,Chicago,USA,man@info.com,1024,25600,512000.00
00000006,Dozen,San Francisco,USA,dozen@dozen.com,1000,5,1000.20
00000007,Sun,San Francisco,USA,sunny@sun.com,2000,2,10000.01
What I want to do is to find the COUNTRY with the largest number of CONTRCNT. Some countries appear more than once in the dataframe so I need to find the country with the largest sum of CONTRCNT.
I thought about summing up the CONTRCNT for all countries and then finding the largest one but I want to do this in a way that is not bruteforce. I actually want to know how I can use Pandas' groupby function to solve this problem.
Solution
So you can groupby
with sum then do idxmax
df.groupby('COUNTRY')['CONTRCOST'].sum().idxmax()
Then
s = df.groupby('COUNTRY')['CONTRCOST'].sum()
s[s==s.max()]
Answered By - BENY Answer Checked By - Dawn Plyler (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.