I am trying to sample random values from between 0 and 1, with weights provided by datases like the one above. I have found a partial solution to this problem using
scipy.stats.gaussian_kde and its
.resample(n) method. My main issue is that, because the bulk of my data is so close to 0, resampling returns a bunch of negative numbers that mess up my later calculations.
Is there a way to limit my resampling to be all greater than zero, without otherwise changing sample space? I have considered just taking the absolute value of everything to get rid of negatives, but I don’t know if that would be reflective of the distribution weights.
And to clarify, each value that I resample (n) will correspond to a specific variable in my code, so I can’t just delete numbers that are less than zero.
# Here is a little sample dataset if you need something to work this out! import numpy as np data = np.array([0.147, 0.066, 0.017, 0.011, 0.040, 0.087, 0.024, 0.127, 0.071, 0.127, 0.027, 0.008, 0.067, 0.032, 0.247, 0.028, 0.122, 0.304, 0.074, 0.119]) # Thank you!
You could use a distribution whose support does not include negative numbers. For example, sampling from an exponential distribution might work for the example array you provided:
import numpy as np from scipy.stats import expon import matplotlib.pyplot as plt data = np.array([0.147, 0.066, 0.017, 0.011, 0.040, 0.087, 0.024, 0.127, 0.071, 0.127, 0.027, 0.008, 0.067, 0.032, 0.247, 0.028, 0.122, 0.304, 0.074, 0.119]) # fit exponential model using data loc, scale = expon.fit(data) # plot histogram and model fig, ax = plt.subplots() ax.hist(data, density = True) x = np.linspace(0.01, 1, 200) ax.plot(x, expon.pdf(x, loc, scale), 'k-') plt.show() # sample from your modelled distribution using your fitted loc and scale parameters sample = expon.rvs(loc, scale)
Answered By – Ben DeVries
Answer Checked By – David Goodson (BugsFixing Volunteer)