[SOLVED] How to make new variable that takes 1 if the string in another column contains a word with varying punctuation and font size?

Issue

I have a column that looks something like this

col1 
"business"
"BusinesS"
"education"
"some BUSINESS ."
"business of someone, that is cool"
" not the b word"
"busi ness"
"busines." 
"businesses"
"something else"

And I need an efficient way of getting all this string data into a new value

col1                col2
NA                  1
NA                  1
"education"         NA
NA                  1
NA                  1
" not the b word"   NA
NA                  1
NA                  1
NA                  1
"something else"    NA

So the common denominator is "busines", but I don’t know how to efficiently make it sort out all the spaces, punctuation, lower/uppercases, other words etc. in one mutate that creates a new column.

Solution

library(dplyr)
library(stringr) 
df %>%
  mutate(col2 = ifelse(str_detect(col1, "(?i)busi\\s?ness?"),
                       1,
                       NA)

We can use ifelse to set 1 if str_detect detects any form of business, and NA if it doesn’t. Note that (?i) makes the match case-insensitive and ? in \\s? and s? makes the preceding item optional; so \\s? matches an optional space and s? matches an optional literal s

Answered By – Chris Ruehlemann

Answer Checked By – Gilberto Lyons (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *