Issue

I have a dataset like below. In this dataset, there are different colored thermometers, and given a 'True' or reference temperature, how different they measure according to some measurement methods 'Method 1' and 'Method 2'.

I am having trouble calculating two important parameters that I need which are Mean Absolute Error (MAE) and Mean Signed Error (MSE). I want to use the non-NaN values for each method and print the result.

I was able to get the to a point where I can return a two column series of index and sum, but the problem in this case is that I need to divide by the number of method values summed, which changes depending on how many NaN's there are in a row. And I do NOT want to just skip an entire row just because there is an NaN in it.

number	date	Thermometer	True Temperature	Method 1	Method 2
0	1/1/2021	red	0.2	0.2	0.5
1	1/1/2021	red	0.6	0.6	0.3
2	1/1/2021	red	0.4	0.6	0.23
3	1/1/2021	green	0.2	0.4	NaN
4	1/1/2021	green	1	1	0.23
5	1/1/2021	yellow	0.4	0.4	0.32
6	1/1/2021	yellow	0.1	NaN	0.4
7	1/1/2021	yellow	1.3	0.5	0.54
8	1/1/2021	yellow	1.5	0.5	0.43
9	1/1/2021	yellow	1.5	0.5	0.43
10	1/1/2021	blue	0.4	0.3	NaN
11	1/1/2021	blue	0.8	0.2	0.11

My Code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('default'

data = pd.read_csv('data.txt', index_col=0)
data

data["M1_ABS_Error"]= abs(data["True_Temperature"]-data["Method_1"])
data["M2_ABS_Error"]= abs(data["True_Temperature"]-data["Method_2"])

MAE_Series=data[['Name', 'M1_ABS_Error', 'M2_ABS_Error' ]]
MAE_Series.sum(axis=1, skipna=True)

but output is something like this at the moment, which doesn't specify which color thermometer this belongs to, and I would like this to print out in a way that is easy to associate it with which it belongs. Also, as I mentioned, this does not yet account for how to divide by the number of values/methods in the given row to account for NaN. :

0       4.94
1       3.03
2      11.88
3       3.28
4       8.14
5       7.80
6       2.76
7       2.71

I would appreciate your help on this. Thanks!

Solution

Edit

I think I understand now, let me know if this is what you want

MAE:

df['MAE'] = df[['M1_ABS_Error','M2_ABS_Error']].mean(axis = 1)
df

produces


    date      Thermometer      True_Temperature    Method_1    Method_2    M1_ABS_Error    M2_ABS_Error    MAE
--  --------  -------------  ------------------  ----------  ----------  --------------  --------------  -----
 0  1/1/2021  red                           0.2         0.2        0.5              0              0.3   0.15
 1  1/1/2021  red                           0.6         0.6        0.3              0              0.3   0.15
 2  1/1/2021  red                           0.4         0.6        0.23             0.2            0.17  0.185
 3  1/1/2021  green                         0.2         0.4      nan                0.2          nan     0.2
 4  1/1/2021  green                         1           1          0.23             0              0.77  0.385
 5  1/1/2021  yellow                        0.4         0.4        0.32             0              0.08  0.04
 6  1/1/2021  yellow                        0.1       nan          0.4            nan              0.3   0.3
 7  1/1/2021  yellow                        1.3         0.5        0.54             0.8            0.76  0.78
 8  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035
 9  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035
10  1/1/2021  blue                          0.4         0.3      nan                0.1          nan     0.1
11  1/1/2021  blue                          0.8         0.2        0.11             0.6            0.69  0.645

and for MSE (Signed error)

df["MSE"]= df[['Method_1','Method_2']].mean(axis = 1)- df['True_Temperature']

produces

    date      Thermometer      True_Temperature    Method_1    Method_2    M1_ABS_Error    M2_ABS_Error    MAE     MSE
--  --------  -------------  ------------------  ----------  ----------  --------------  --------------  -----  ------
 0  1/1/2021  red                           0.2         0.2        0.5              0              0.3   0.15    0.15
 1  1/1/2021  red                           0.6         0.6        0.3              0              0.3   0.15   -0.15
 2  1/1/2021  red                           0.4         0.6        0.23             0.2            0.17  0.185   0.015
 3  1/1/2021  green                         0.2         0.4      nan                0.2          nan     0.2     0.2
 4  1/1/2021  green                         1           1          0.23             0              0.77  0.385  -0.385
 5  1/1/2021  yellow                        0.4         0.4        0.32             0              0.08  0.04   -0.04
 6  1/1/2021  yellow                        0.1       nan          0.4            nan              0.3   0.3     0.3
 7  1/1/2021  yellow                        1.3         0.5        0.54             0.8            0.76  0.78   -0.78
 8  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035  -1.035
 9  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035  -1.035
10  1/1/2021  blue                          0.4         0.3      nan                0.1          nan     0.1    -0.1
11  1/1/2021  blue                          0.8         0.2        0.11             0.6            0.69  0.645  -0.645

Original answer

It is not entirely clear what you want, but somewhat guessing here, is this what you are after? If you groupby by color and apply mean to the `ABS columns within each group

data.groupby('Thermometer', sort = False)[['M1_ABS_Error','M2_ABS_Error']].mean()

you get this


        M1_ABS_Error    M2_ABS_Error
Thermometer     
red     0.066667    0.256667
green   0.100000    0.770000
yellow  0.700000    0.656000
blue    0.350000    0.690000

Here, for example, the first top left number '0.066667is the average of theM1_ABS_Errorcolumn for those Thermometers that arered`. Similar to others. NaNs are skipped within each color/column

to get MSE (which normally means Mean Squared Error so I assume this is what you are after) you can do

import numpy as np
data["M1_Sqr_Error"]= (data["True_Temperature"]-data["Method_1"])**2
data["M2_Sqr_Error"]= (data["True_Temperature"]-data["Method_2"])**2
data.groupby('Thermometer', sort = False)[['M1_Error','M2_Error']].apply(lambda v: np.sqrt(np.mean(v)))

to get


        M1_Error    M2_Error
Thermometer     
red     0.115470    0.263881
green   0.141421    0.770000
yellow  0.812404    0.769909
blue    0.430116    0.690000

Answered By - piterbarg

Answer Checked By - Marilyn (PHPFixing Volunteer)

Friday, October 7, 2022

[FIXED] How to calculate Mean Absolute Error (MAE) and Mean Signed Error (MSE) using pandas/numpy/python math libray?

Issue

Solution

Edit

Original answer

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Friday, October 7, 2022

Issue

Solution

Edit

Original answer

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To