[SOLVED] create subset of data frame ,if your column having 99% zero value

Issue

I am having data which contain 1081 columns, some of the columns having 99% zero values, I want to separate those column and store into new data frame, which column having 99% zero values.
I am only able to do write this much, can anyone help me to write the code.

for cl,ro  in df.iteritems():
    n_zeros = (ro == 0).sum() 
    percent_zero = n_zeros / len(df) * 100

Solution

You could use:

df.loc[:, df.eq(0).sum().div(df.shape[0]).gt(0.99)]

example (here with 95% theshold):

np.random.seed(0)
a = np.random.choice([0,1],size=(100, 10),p=[0.95,0.05])
df = pd.DataFrame(a)

mask = df.eq(0).sum().div(df.shape[0]).gt(0.95)
out = df.loc[:, mask]  # use out = df.loc[:, ~mask] to drop the columns instead

output:

    1  3  6  7
0   0  0  0  0
1   0  0  0  0
2   0  0  0  0
3   0  0  0  0
4   0  0  0  0
.. .. .. .. ..
95  1  0  0  0
96  0  0  0  0
97  0  0  0  0
98  0  0  0  0
99  0  0  0  0

[100 rows x 4 columns]

Answered By – mozway

Answer Checked By – Robin (BugsFixing Admin)

Leave a Reply

Your email address will not be published.