Issue
We have the following dataframe (df)
print(df)
#Gene GSM772 GSM773 GSM774 GSM775 GSM776
0610007P14Rik 0.003485 0.003415 0.005431 0.003667 0.007146
0610009B22Rik 0.001220 0.001351 0.001762 0.001404 0.002177
0610009L18Rik 0.000055 0.000009 0.000152 0.000082 0.000179
0610009O20Rik 0.000000 0.006830 00000000 0.006653 0.006907
0610010F05Rik 0.008310 0.008329 0.007091 0.006919 0.006915
We want to calculate Geometric Mean for every row.
- And append the result as the last column with the column name GeometricMean.
For some rows there are "zero" values, which needs to be ignored so the geometric mean for that row is regarded as zero.
We wrote the following python script,
import scipy
import numpy
import numpy as np
from scipy.stats.mstats import gmean
from scipy import stats
numpy.seterr(divide = 'ignore')
scipy.stats.gmean(df.iloc[:,1:5],axis=1)
gmean = scipy.stats.gmean(df.iloc[:,1:5],axis=1)
df.assign(GeometricMean=gmean)
results = df.assign(GeometricMean=gmean)
print(results)
Following error is encountered:
AttributeError: 'str' object has no attribute 'log' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "calculate_gmean.py", line 99, in <module> scipy.stats.gmean(df.iloc[:,1:5],axis=1) #calculates gmean rowwise, axis=1 for rowwise File "/home/.local/lib/python3.6/site-packages/scipy/stats/stats.py", line 402, in gmean log_a = np.log(np.array(a, dtype=dtype)) TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method
Can anyone please suggest the best way to resolve this issue?
Thanks !!
Solution
Problem solved. Actually, the above script works without any issue. Sorry, this question was posted without hindsight. We cannot delete any question, so this will stay here. Hope the script is useful for someone.
Note, that this script will not work if the dataframe contains any column with strings. After removing those columns, this script will work without any problem in generating the last column with geometric mean for every row.
print(df.shape)
(5, 6)
print(df)
#Gene GSM772 GSM773 GSM774 GSM775 GSM776
0 0610007P14Rik 0.003485 0.003415 0.005431 0.003667 0.007146
1 0610009B22Rik 0.001220 0.001351 0.001762 0.001404 0.002177
2 0610009L18Rik 0.000055 0.000009 0.000152 0.000082 0.000179
3 0610009O20Rik 0.006369 0.006830 0.007176 0.006653 0.006907
4 0610010F05Rik 0.008310 0.008329 0.007091 0.006919 0.006915
print(results)
#Gene GSM772 GSM773 GSM774 GSM775 GSM776 GeometricMean
0 0610007P14Rik 0.003485 0.003415 0.005431 0.003667 0.007146 0.004424
1 0610009B22Rik 0.001220 0.001351 0.001762 0.001404 0.002177 0.001548
2 0610009L18Rik 0.000055 0.000009 0.000152 0.000082 0.000179 0.000064
3 0610009O20Rik 0.006369 0.006830 0.007176 0.006653 0.006907 0.006782
4 0610010F05Rik 0.008310 0.008329 0.007091 0.006919 0.006915 0.007484
Answered By - TraPS-VarI Answer Checked By - Mary Flores (PHPFixing Volunteer)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.