Issue
I have one data set which has 780 columns and 87529 rows it contains lots of zero values.
I am using the below code, but I am getting a 780*2 line as result, which is really difficult to read and understand,so i wanted to export this result into excel,can anyone help me to construct the code.
for column_name in df.columns:
column = df[column_name]
count = (column == 0).sum()
percent_zero = (column ==0 ).sum()/87529*100
print('Count of zeros in column ', column_name, ' is : ', count)
Solution
Try this one. (You have to use your own df
)
import pandas as pd
# Use your own dataframe.
df = pd.DataFrame([
{'col1': 0, 'col2': 0},
{'col1': 1, 'col2': 0},
{'col1': 1, 'col2': 1},
])
temp = 'Count of zeros in column "{col}" is : {n_zeros} (Percentage: {percent_zero:.1f}%)'
n_rows = len(df)
seeds = []
for col, ser in df.iteritems():
n_zeros = (ser == 0).sum()
percent_zero = n_zeros / n_rows * 100
print(temp.format(col=col, n_zeros=n_zeros, percent_zero=percent_zero))
seeds.append({'column_name': col, 'number_of_zero': n_zeros, 'percent_of_zero': percent_zero})
df_out = pd.DataFrame(seeds)
df_out.to_excel('out.xlsx', index=False)
If you got an error related to export, try this command:
pip install openpyxl
Answered By – quasi-human
Answer Checked By – Timothy Miller (BugsFixing Admin)