PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, October 8, 2022

[FIXED] What is the most efficient way to bootstrap the mean of a list of numbers?

 October 08, 2022     numpy, python, statistics, statistics-bootstrap     No comments   

Issue

I have a list of numbers (floats) and I would like to estimate the mean. I also need to estimate the variation of such mean. My goal is to resample the list 100 times, and my output would be an array with length 100, each element corresponding to the mean of a resampled list.

Here is a simple workable example for what I would like to achieve:

import numpy as np
data = np.linspace(0, 4, 5)
ndata, boot = len(data), 100
output = np.mean(np.array([data[k] for k in np.random.uniform(high=ndata, size=boot*ndata).astype(int)]).reshape((boot, ndata)), axis=1)

This is however quite slow when I have to repeat for many lists with large number of elements. The method also seems very clunky and un-Pythonic. What would be a better way to achieve my goal?

P.S. I am aware of scipy.stats.bootstrap, but I have problem upgrading scipy to 1.7.1 in anaconda to import this.


Solution

Use np.random.choice:

import numpy as np

data = np.linspace(0, 4, 5)
ndata, boot = len(data), 100
output = np.mean(
    np.random.choice(data, size=(100, ndata)),
    axis=1)

If I understood correctly, this expression (in your question's code):

np.array([data[k] for k in np.random.uniform(high=ndata, size=boot*ndata).astype(int)]).reshape((boot, ndata)

is doing a sampling with replacement and that is exactly what np.random.choice does.

Here are some timings for reference:

%timeit np.mean(np.array([data[k] for k in np.random.uniform(high=ndata, size=boot*ndata).astype(int)]).reshape((boot, ndata)), axis=1)
133 µs ± 3.96 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.mean(np.random.choice(data, size=(boot, ndata)),axis=1)
41.1 µs ± 538 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As it can be seen np.random.choice yields 3x improvement.



Answered By - Dani Mesejo
Answer Checked By - Marilyn (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing