I am trying to establish what people would loosely refer to as a homemade KDE – I suppose. I am trying to evaluate a density of a rather huge set of datapoints. In particular, having many data points for a scatter, I want to indicate the density using a color gradient (see link below).
For exemplification, I provide a random pair of (x,y) data below. The real data will be spread on different scales, hence the difference in X and Y grid point spacing.
import numpy as np from matplotlib import pyplot as plt def homemadeKDE(x, xgrid, y, ygrid, sigmaX = 1, sigmaY = 1): a = np.exp( -((xgrid[:,None]-x)/(2*sigmaX))**2 ) b = np.exp( -((ygrid[:,None]-y)/(2*sigmaY))**2 ) xweights = np.dot(a, x.T)/np.sum(a) yweights = np.dot(b, y.T)/np.sum(b) return xweights, yweights x = np.random.rand(10000) x.sort() y = np.random.rand(10000) xGrid = np.linspace(0, 500, 501) yGrid = np.linspace(0, 10, 11) newX, newY = homemadeKDE(x, xGrid, y, yGrid)
What I am stuck with is, how to project these values back to the original x and y vector so I can use it for plotting a 2D scatter plot (x,y) with a z value for the density colored by a given color map like so:
plt.scatter(x, y, c = z, cmap = "jet")
Plotting and KDE approach is in fact inspired by this great answer
To smooth out some confusion, the idea is to do a gaussian KDE, which would be on a much coarser grid. SigmaX and sigmaY reflect the bandwidth of the kernel in x and y directions, respectively.
I was actually- with a little bit of thinking -able to solve the problem on my own. Also thanks to the help and insightful comments.
import numpy as np from matplotlib import pyplot as plt def gaussianSum1D(gridpoints, datapoints, sigma=1): a = np.exp( -((gridpoints[:,None]-datapoints)/sigma)**2 ) return a #some test data x = np.random.rand(10000) y = np.random.rand(10000) #create grids gridSize = 100 xedges = np.linspace(np.min(x), np.max(x), gridSize) yedges = np.linspace(np.min(y), np.max(y), gridSize) #calculate weights for both dimensions seperately a = gaussianSum1D(xedges, x, sigma=2) b = gaussianSum1D(yedges, y, sigma=0.1) Z = np.dot(a, b.T).T #plot original data fig, ax = plt.subplots() ax.scatter(x, y, s = 1) #overlay data with contours ax.contour(xedges, yedges, Z, cmap = "jet")
Answered By – Fourier
Answer Checked By – Senaida (BugsFixing Volunteer)