[SOLVED] How to check in R if the name of the list element contains "this text" in it and pass to the next element in a for loop?

Issue

I’m new at R and have a large list of 30 elements, each of which is a dataframe that contains few hundred rows and around 20 columns (this varies depending on the dataframe). Each dataframe is named after the original .csv filename (for example "experiment data XYZ QWERTY 01"). How can I check through the whole list and only filter those dataframes that don’t have specific text included in their filename AND also add an unique id column to those filtered dataframes (the id value would be first three characters of that filename)? For example all the elements/dataframes/files in the list which include "XYZ QWERTY" as a part of their name won’t be filtered and doesn’t need unique id. I had this pseudo style code:

for(i in 1:length(list_of_dataframes)){
  if 
  list_of_dataframes[[i]] contains "this text" then don't filter
  else
  list_of_dataframes[[i]] <- filter(list_of_dataframes[[i]], rule) AND add unique.id.of.first.three.char.of.list_of_dataframes[[i]]
}

Sorry if the terminology used here is a bit awkward, but just starting out with programming and first time posting here, so there’s still a lot to learn (as a bonus, if you have any good resources/websites to learn to automate and do similar stuff with R, I would be more than glad to get some good recommendations! :-))

EDIT:

The code I tried for the filtering part was:

for(i in 1:length(tbl)){
  if (!(str_detect (tbl[[i]], "OLD"))){
    tbl[[i]] <- filter(tbl[[i]], age < 50)
  }
}

However there was an error message stating "argument is not an atomic vector; coercing" and "the condition has length > 1 and only the first element will be used". Is there any way to get this code working?

Solution

Let there be a directory called files containing these csv files:

'experiment 1.csv'  'experiment 2.csv'  'experiment 3.csv'
'OLDexperiment 1.csv'  'OLDexperiment 2.csv'

This will give you a list of data frames with a filter condition (here: do not contain the substring OLD in the filename). Just remove the ! to only include old experiments instead. A new column id is added containing the file path:

library(tidyverse)

list.files("files")

paths <- list.files("files", full.names = TRUE)
names(paths) <- list.files("files", full.names = TRUE)
list_of_dataframes <- paths %>% map(read_csv)

list_of_dataframes %>%
  enframe() %>%
  filter(! name %>% str_detect("OLD")) %>%
  mutate(value = name %>% map2(value, ~ {
    .y %>% mutate(id = .x)
  })) %>%
  pull(value)

A good resource to start is the free book R for Data Science

This is a much simpler approach without a list to get one big combined table of files matching the same condition:

list.files("files", full.names = TRUE) %>%
  tibble(id = .) %>%
  # discard old experiments
  filter(! id %>% str_detect("OLD")) %>%
  # read the csv table for every matching file
  mutate(data = id %>% map(read_csv)) %>%
  # combine the tables into one big one
  unnest(data)

Answered By – danlooo

Answer Checked By – Robin (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *