I have a column that looks something like this
col1 "business" "BusinesS" "education" "some BUSINESS ." "business of someone, that is cool" " not the b word" "busi ness" "busines." "businesses" "something else"
And I need an efficient way of getting all this string data into a new value
col1 col2 NA 1 NA 1 "education" NA NA 1 NA 1 " not the b word" NA NA 1 NA 1 NA 1 "something else" NA
So the common denominator is "busines", but I don’t know how to efficiently make it sort out all the spaces, punctuation, lower/uppercases, other words etc. in one mutate that creates a new column.
library(dplyr) library(stringr) df %>% mutate(col2 = ifelse(str_detect(col1, "(?i)busi\\s?ness?"), 1, NA)
We can use
ifelse to set
str_detect detects any form of
NA if it doesn’t. Note that
(?i) makes the match case-insensitive and
s? makes the preceding item optional; so
\\s? matches an optional space and
s? matches an optional literal
Answered By – Chris Ruehlemann
Answer Checked By – Gilberto Lyons (BugsFixing Admin)