[SOLVED] Compare two values and add them to list if they match – code improvement


I am working with medical history data and need to figure out to find the series of diagnoses a patient has had.
I have a large database which includes data on unique ID, diagnosis, time of contact with healthcare, and so on.

I made some dummy data here to illustrate:

import pandas as pd
import numpy as np

columns = ["ID","DIAG","TYPE","IN","OUT","GENDER","DOB"]
diags = pd.DataFrame(np.random.randint(0,100,size=(2000,7)),columns=columns)
diags_counter = diags.groupby("ID")["DIAG"].count().to_frame()

I reset the index, since the IDs in the database are more complex, and using .loc wouldn’t work.

My idea was to make a list or dictionary of a list of dataframes for each patient, i.e. a patient might have only one diagnosis, while another might have 2 or more.
Following code is working, but extremely slow, and since I have over half a million patients, this is not going to work:

diags_seq = []
for i in range(0,len(diags_counter)):
  X= []
  for j in range(0,len(diags)):
    if diags_counter.ID.iloc[i] == diags.ID.iloc[j]:
  print(f"\r{i+1} of {len(diags_counter)} found", end="")

Any help to how to approach this otherwise would be greatly appreciated 🙂


I think this will be fine:

unique_id = diags.ID.unique()
dict_of_specifics_id = {}
for id in unique_id:
   dict_of_specifics_id[id] = {}
   dict_of_specifics_id[id]['id_counter'] = 0
   dict_of_specifics_id[id]['diag_list'] = []
for index, row in diags.iterrows():


{21: {'id_counter': 16,
  'diag_list': [45, 41, 92, 91, 62, 54, 16, 18, 23, 18, 0, 47, 9, 45, 2, 61]},

Where 21 is ID and diag_list is list of diagnosis to this ID. id_counter is actualy len of diag_list.

Answered By – Dawid Cimoch

Answer Checked By – Senaida (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *