PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] How to calculate Mean Absolute Error (MAE) and Mean Signed Error (MSE) using pandas/numpy/python math libray?

 October 07, 2022     math, numpy, pandas, python, statistics     No comments   

Issue

I have a dataset like below. In this dataset, there are different colored thermometers, and given a 'True' or reference temperature, how different they measure according to some measurement methods 'Method 1' and 'Method 2'.

I am having trouble calculating two important parameters that I need which are Mean Absolute Error (MAE) and Mean Signed Error (MSE). I want to use the non-NaN values for each method and print the result.

I was able to get the to a point where I can return a two column series of index and sum, but the problem in this case is that I need to divide by the number of method values summed, which changes depending on how many NaN's there are in a row. And I do NOT want to just skip an entire row just because there is an NaN in it.

number date Thermometer True Temperature Method 1 Method 2
0 1/1/2021 red 0.2 0.2 0.5
1 1/1/2021 red 0.6 0.6 0.3
2 1/1/2021 red 0.4 0.6 0.23
3 1/1/2021 green 0.2 0.4 NaN
4 1/1/2021 green 1 1 0.23
5 1/1/2021 yellow 0.4 0.4 0.32
6 1/1/2021 yellow 0.1 NaN 0.4
7 1/1/2021 yellow 1.3 0.5 0.54
8 1/1/2021 yellow 1.5 0.5 0.43
9 1/1/2021 yellow 1.5 0.5 0.43
10 1/1/2021 blue 0.4 0.3 NaN
11 1/1/2021 blue 0.8 0.2 0.11

My Code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('default'

data = pd.read_csv('data.txt', index_col=0)
data

data["M1_ABS_Error"]= abs(data["True_Temperature"]-data["Method_1"])
data["M2_ABS_Error"]= abs(data["True_Temperature"]-data["Method_2"])

MAE_Series=data[['Name', 'M1_ABS_Error', 'M2_ABS_Error' ]]
MAE_Series.sum(axis=1, skipna=True)

but output is something like this at the moment, which doesn't specify which color thermometer this belongs to, and I would like this to print out in a way that is easy to associate it with which it belongs. Also, as I mentioned, this does not yet account for how to divide by the number of values/methods in the given row to account for NaN. :

0       4.94
1       3.03
2      11.88
3       3.28
4       8.14
5       7.80
6       2.76
7       2.71

I would appreciate your help on this. Thanks!


Solution

Edit

I think I understand now, let me know if this is what you want

MAE:

df['MAE'] = df[['M1_ABS_Error','M2_ABS_Error']].mean(axis = 1)
df

produces


    date      Thermometer      True_Temperature    Method_1    Method_2    M1_ABS_Error    M2_ABS_Error    MAE
--  --------  -------------  ------------------  ----------  ----------  --------------  --------------  -----
 0  1/1/2021  red                           0.2         0.2        0.5              0              0.3   0.15
 1  1/1/2021  red                           0.6         0.6        0.3              0              0.3   0.15
 2  1/1/2021  red                           0.4         0.6        0.23             0.2            0.17  0.185
 3  1/1/2021  green                         0.2         0.4      nan                0.2          nan     0.2
 4  1/1/2021  green                         1           1          0.23             0              0.77  0.385
 5  1/1/2021  yellow                        0.4         0.4        0.32             0              0.08  0.04
 6  1/1/2021  yellow                        0.1       nan          0.4            nan              0.3   0.3
 7  1/1/2021  yellow                        1.3         0.5        0.54             0.8            0.76  0.78
 8  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035
 9  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035
10  1/1/2021  blue                          0.4         0.3      nan                0.1          nan     0.1
11  1/1/2021  blue                          0.8         0.2        0.11             0.6            0.69  0.645

and for MSE (Signed error)

df["MSE"]= df[['Method_1','Method_2']].mean(axis = 1)- df['True_Temperature']

produces

    date      Thermometer      True_Temperature    Method_1    Method_2    M1_ABS_Error    M2_ABS_Error    MAE     MSE
--  --------  -------------  ------------------  ----------  ----------  --------------  --------------  -----  ------
 0  1/1/2021  red                           0.2         0.2        0.5              0              0.3   0.15    0.15
 1  1/1/2021  red                           0.6         0.6        0.3              0              0.3   0.15   -0.15
 2  1/1/2021  red                           0.4         0.6        0.23             0.2            0.17  0.185   0.015
 3  1/1/2021  green                         0.2         0.4      nan                0.2          nan     0.2     0.2
 4  1/1/2021  green                         1           1          0.23             0              0.77  0.385  -0.385
 5  1/1/2021  yellow                        0.4         0.4        0.32             0              0.08  0.04   -0.04
 6  1/1/2021  yellow                        0.1       nan          0.4            nan              0.3   0.3     0.3
 7  1/1/2021  yellow                        1.3         0.5        0.54             0.8            0.76  0.78   -0.78
 8  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035  -1.035
 9  1/1/2021  yellow                        1.5         0.5        0.43             1              1.07  1.035  -1.035
10  1/1/2021  blue                          0.4         0.3      nan                0.1          nan     0.1    -0.1
11  1/1/2021  blue                          0.8         0.2        0.11             0.6            0.69  0.645  -0.645

Original answer

It is not entirely clear what you want, but somewhat guessing here, is this what you are after? If you groupby by color and apply mean to the `ABS columns within each group

data.groupby('Thermometer', sort = False)[['M1_ABS_Error','M2_ABS_Error']].mean()

you get this


        M1_ABS_Error    M2_ABS_Error
Thermometer     
red     0.066667    0.256667
green   0.100000    0.770000
yellow  0.700000    0.656000
blue    0.350000    0.690000

Here, for example, the first top left number '0.066667is the average of theM1_ABS_Errorcolumn for those Thermometers that arered`. Similar to others. NaNs are skipped within each color/column

to get MSE (which normally means Mean Squared Error so I assume this is what you are after) you can do

import numpy as np
data["M1_Sqr_Error"]= (data["True_Temperature"]-data["Method_1"])**2
data["M2_Sqr_Error"]= (data["True_Temperature"]-data["Method_2"])**2
data.groupby('Thermometer', sort = False)[['M1_Error','M2_Error']].apply(lambda v: np.sqrt(np.mean(v)))

to get


        M1_Error    M2_Error
Thermometer     
red     0.115470    0.263881
green   0.141421    0.770000
yellow  0.812404    0.769909
blue    0.430116    0.690000


Answered By - piterbarg
Answer Checked By - Marilyn (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing