[SOLVED] for loop that cycles through all columns and collect the returned data in a list

Issue

I am experiencing a small issue that I don’t know how to solve. I have done some analysis on each dependent column in a data frame as shown below. What I have done is fit the concentration to a normal distribution (I know the data does not look normal, I just made some up to show here) and found a value of the full width half maximum (fwhm). I want to plot the fwhm vs concentration # i.e., (1, concentration 1’s fwhm), (2, concentration 2’s fwhm) …. Right now I can do this manually but I have a much larger data frame and I would like to learn how to code it such that we get two lists. One list of the concentrations #s so 1,2,3 and another of the fwhm’s from the function so I can make the plots. The picture at the bottom shows one of the results which prints the y value (3.47).

If you run the this you will see that the results I want for this example is
list_x_values = [1,2,3] and list_y_values= [3.47, 3.39, 3.39].

import numpy as np                                # import packages and give them quick handles since they will be used often
import matplotlib.pyplot as plt
from scipy import optimize 
import pandas as pd
import math 
from scipy import stats
from scipy.interpolate import UnivariateSpline
from array import array

d = {'distance': [1, 2, 3, 4, 5,6,7,8,9,10], 'concentration_1': [1,2,3,4,5,6,7,8,8,10], 'concentration_2': [1,2,3,4.4,5,6,7,8,8,10],
     'concentration_3': [1,2,3,4,5,6,7,8,8,1]}
df = pd.DataFrame(data=d)



Function

def get_plot(data_sample):

    def test_func(x, a, b):
        return stats.norm.pdf(x, a, b)

    x_dist = (data_sample['x'][-1] - data_sample['x'][0]) / (len(data_sample['x']) - 1)
    normalization_factor = sum(data_sample['1']) * x_dist
    params, pcov = optimize.curve_fit(test_func, data_sample['x'], data_sample['1'] / normalization_factor)
    
  #  print("offset:",params[0],"+-",np.sqrt(pcov[0][0]))     #print value and one std error of the first fit parameter
   # print("offset:",params[1],"+-",np.sqrt(pcov[1][1])) 
    
    
    plt.scatter(data_sample['x'], data_sample['1'], clip_on=False, label='Data')
    x_detailed = np.linspace(data_sample['x'][0] - 3, data_sample['x'][-1] + 3, 200)
    plt.plot(x_detailed, test_func(x_detailed, params[0], params[1]) * normalization_factor,
             color='crimson', label='Fitted function')

    y = 2*(math.sqrt(2*math.log(2, math.e)))*(params[1])
    plt.axhline(y, color='r', linestyle='--')


    x_1 , x_2 = np.argwhere(np.diff(np.sign(y- test_func(x_detailed, params[0], params[1]) * normalization_factor))).flatten()


    plt.axvline(x= x_detailed[x_1], color='g', linestyle='--', alpha=0.1)
    plt.axvline(x= x_detailed[x_2], color='g', linestyle='--', alpha=0.1)
    
    FWHM_long = x_detailed[x_2]-x_detailed[x_1]
    FWHM = "{:.2f}".format(FWHM_long)

    print('y_value is ' + str(FWHM))
    
    
    plt.axvspan(x_detailed[x_1],x_detailed[x_2] , alpha=0.1, color='g', label='FWHM =' + str(FWHM))
    
    plt.xlabel("Distance")
    plt.ylabel("Concentration")
    plt.legend(loc='best')
    plt.margins(x=0)
    plt.ylim(ymin=0)
    plt.tight_layout()
    plt.show()
    
 

Result 1

data_sample = {'x': np.array(df['distance']),
              '1': np.array(df['concentration_1'])}
    
get_plot(data_sample)

Result 2

data_sample = {'x': np.array(df['distance']),
              '1': np.array(df['concentration_2'])}
    
get_plot(data_sample)

Result 3

data_sample = {'x': np.array(df['distance']),
              '1': np.array(df['concentration_3'])}
    
get_plot(data_sample)

enter image description here

Solution

You just have to remove the irrelevant parts used only for plotting and collect all results in a dataframe:

#import packages and give them quick handles since they will be used often
import numpy as np                                
from scipy import optimize 
import pandas as pd
from scipy import stats
d = {'distance': [1, 2, 3, 4, 5,6,7,8,9,10], 
     'concentration_1': [1,2,3,4,5,6,7,8,8,10], 
     'concentration_2': [1,2,3,4.4,5,6,7,8,8,10],
     'concentration_3': [1,2,3,4,5,6,7,8,8,1]}
df = pd.DataFrame(data=d)

#collect results in this dataframe
df_res = pd.DataFrame({"FWHM": np.zeros(len(df.columns[1:]))}, index=df.columns[1:])

#calculate constants repeatedly used in the loop
x_dist = (df.iloc[-1, 0] - df.iloc[0, 0]) / (df.shape[0] - 1)
x_detailed = np.linspace(df.iloc[0, 0] - 3, df.iloc[-1, 0] + 3, 200)

#cycle through the columns except for the first column "distance"
for col in df.columns[1:]:

    def test_func(x, a, b):
        return stats.norm.pdf(x, a, b)
    
    #the following calculations have been not changed
    #maybe there is some potential for improvement 
    normalization_factor = sum(df[col]) * x_dist
    params, pcov = optimize.curve_fit(test_func, df['distance'], df[col] / normalization_factor)
    
    y = 2*(np.sqrt(2*np.log(2)))*(params[1])

    x_1 , x_2 = np.argwhere(np.diff(np.sign(y- test_func(x_detailed, params[0], params[1]) * normalization_factor))).flatten()
    
    #calculate FWHM and store it in the dataframe
    FWHM = x_detailed[x_2]-x_detailed[x_1]
    df_res.loc[col, "FWHM"] = FWHM

print(df_res)

Sample output:

                     FWHM
concentration_1  3.467337
concentration_2  3.391960
concentration_3  3.391960

I have not checked the validity of your fit and FWHM calculations and taken them at face value. The math module was not necessary since you already imported numpy. Actually, it should be avoided to use the math library with arrays.

Answered By – Mr. T

Answer Checked By – Mildred Charles (BugsFixing Admin)

Leave a Reply

Your email address will not be published.