Issue
I want to match a1
a2
from the row whose a3
is missing with the entire column of b1 b2 b3
where ever a1 a2
matches with any two b's
value we will grab the 3rd b
value i.e in row 2 a1=84
and a2=5
which is a match for b's
in row 3 where b1=5
and b2=84
so now we will grab the value b3=89
in this case. Similarly for row 5 we will grab the value of b2=66
.
this is just a small data set actual data contains millions of rows.
time | duration | a1 | a2 | a3 | b1 | b2 | b3 |
---|---|---|---|---|---|---|---|
2022-02-28 | 95 | 11 | 2 | 3 | 22 | 67 | 25 |
2022-02-27 | 85 | 84 | 5 | 72 | 23 | 15 | |
2022-02-26 | 87 | 6 | 7 | 8 | 5 | 84 | 89 |
2022-02-25 | 72 | 9 | 10 | 44 | 55 | 78 | 41 |
2022-02-24 | 66 | 19 | 57 | 50 | 60 | 51 | |
2022-02-23 | 88 | 20 | 48 | 67 | 19 | 66 | 57 |
Solution
You can get all permutations of b
columns, left join with original DataFrame filtered only rows with missing values in a3
for a3_
column wich match a1, a2
in list. Then join list to one Series, remove possible duplicates in index and replace missing values of a3
column in original DataFrame:
from itertools import permutations
cols = ['b1','b2','b3']
L = [df[df['a3'].isna()].merge(df.loc[:, x].set_axis(['a1','a2','a3'], axis=1),
how='left', on=['a1','a2'], suffixes=('','_'))['a3_'].dropna()
for x in permutations(cols, 3)]
final = pd.concat(L)
final = final[~final.index.duplicated()]
print (final)
4 66.0
1 89.0
Name: a3_, dtype: float64
df['a3'] = df['a3'].fillna(final)
print (df)
time duration a1 a2 a3 b1 b2 b3
0 2022-02-28 95 11 2 3.0 22 67 25
1 2022-02-27 85 84 5 89.0 72 23 15
2 2022-02-26 87 6 7 8.0 5 84 89
3 2022-02-25 72 9 10 44.0 55 78 41
4 2022-02-24 66 19 57 66.0 50 60 51
5 2022-02-23 88 20 48 67.0 19 66 57
Answered By - jezrael Answer Checked By - Dawn Plyler (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.