Issue
I have a function which I want to be able to apply on variable number of columns based on the input.
def split_and_combine(row, *args, delimiter=';'):
combined = []
for a in args:
if not row[a]:
combined.extend(row[a].split(delimiter))
combined = list(set(combined))
return combined
But I'm not sure how to apply this function to the df, because of the *args. I'm not very familiar with *args
and *kwargs
in python. I tried using partial and set axis=1 as below but get the TypeError below.
df['combined'] = df.apply(partial(split_and_combine, ['col1','col2']),
axis=1)
TypeError: ('list indices must be integers or slices, not Series', 'occurred at index 0')
A dummy example for the above code. I want to be able to pass in flexible number of columns to combine:
Index col1 col2 combined
0 John;Mary Sam;Bill;Eva John;Mary;Sam;Bill;Eva
1 a;b;c a;d;f a;b;c;d;f
Thanks! If there's a better for doing this without df.apply. Please feel free to comment!
Solution
df.apply
docs
args : tuple
Positional arguments to pass to func in addition to the array/series.
**kwds
Additional keyword arguments to pass as keywords arguments to func.
df.apply(split_and_combine, args=('col1', 'col2'), axis=1)
btw you might have some bugs in your function:
def split_and_combine(row, *args, delimiter=';'):
combined = []
for a in args:
if row[a]:
combined.extend(row[a].split(delimiter))
combined = list(set(combined))
return delimiter.join(combined)
Answered By - Asish M. Answer Checked By - Terry (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.