PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, October 8, 2022

[FIXED] How to prevent overflow in MLE method for large data

 October 08, 2022     numpy, python, scipy, scipy-optimize, statistics     No comments   

Issue

I am trying to do a manual MLE estimation using scipy. My dataset is not that large so it surprises me that my values get very large very fast and scipy.optimize.minimize seems to get into NaNs for my density extremely quickly. I've tried to use the sum of logarithms instead of the product of the densities but that did not make things better at all.

This is my data:

[
        1.000, 1.000, 1.000, 1.004, 1.005, 1.008, 1.014, 1.015, 1.023, 1.035, 1.038,
        1.046, 1.048, 1.050, 1.050, 1.052, 1.052, 1.057, 1.063, 1.070, 1.070, 1.076,
        1.087, 1.090, 1.091, 1.096, 1.101, 1.102, 1.113, 1.114, 1.120, 1.130, 1.131,
        1.150, 1.152, 1.154, 1.155, 1.162, 1.170, 1.177, 1.189, 1.191, 1.193, 1.200,
        1.200, 1.200, 1.200, 1.205, 1.210, 1.218, 1.238, 1.238, 1.241, 1.250, 1.250,
        1.256, 1.257, 1.272, 1.278, 1.289, 1.299, 1.300, 1.316, 1.331, 1.349, 1.374,
        1.378, 1.382, 1.396, 1.426, 1.429, 1.439, 1.443, 1.446, 1.473, 1.475, 1.478,
        1.499, 1.506, 1.559, 1.568, 1.594, 1.609, 1.626, 1.649, 1.650, 1.669, 1.675,
        1.687, 1.715, 1.720, 1.735, 1.750, 1.755, 1.787, 1.797, 1.805, 1.898, 1.908,
        1.940, 1.989, 2.010, 2.012, 2.024, 2.047, 2.081, 2.085, 2.097, 2.136, 2.178,
        2.181, 2.193, 2.200, 2.220, 2.301, 2.354, 2.359, 2.382, 2.409, 2.418, 2.430,
        2.477, 2.500, 2.534, 2.572, 2.588, 2.591, 2.599, 2.660, 2.700, 2.700, 2.744,
        2.845, 2.911, 2.952, 3.006, 3.021, 3.048, 3.059, 3.092, 3.152, 3.276, 3.289,
        3.440, 3.447, 3.498, 3.705, 3.870, 3.896, 3.969, 4.000, 4.009, 4.196, 4.202,
        4.311, 4.467, 4.490, 4.601, 4.697, 5.100, 5.120, 5.136, 5.141, 5.165, 5.260,
        5.329, 5.778, 5.794, 6.285, 6.460, 6.917, 7.295, 7.701, 8.032, 8.142, 8.864,
        9.263, 9.359, 10.801, 11.037, 11.504, 11.933, 11.998, 12.000, 14.153, 15.000,
        15.398, 19.793, 23.150, 27.769, 28.288, 34.325, 42.691, 62.037, 77.839
]

How can I perform the MLE algorithm without running into overlow issues?

In case you are wondering, I am trying to fit the function f(x) = alpha * 500000^(alpha) / x^(alpha+1). So what I have is

import numpy as np
from scipy.optimize import minimize

# data = the given dataset
log_pareto_pdf = lambda alpha, x: np.log(alpha * (5e5 ** alpha) / (x ** (alpha + 1)))
ret = minimize(lambda alpha: -np.sum([log_pareto_pdf(alpha, x) for x in data]), x0=np.array([1]))

Solution

You need to avoid the giant exponentiations. One way to do this is to actually simplify your function:

log_pareto_pdf = lambda alpha, x: np.log(alpha) + alpha*np.log(5e5) - (alpha + 1)*np.log(x)

Without simplifying, your program still needs to try to calculate the 5e5**alpha term, which will get really big really fast (overflow around 55).

You'll also need to supply the bounds argument to minimize to prevent any negatives inside the log's.



Answered By - AJ Biffl
Answer Checked By - Terry (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing