Issue
I am new to R and have a very large irregular column in a data frame like this:
x <- data.frame(section = c("BOOK I: Introduction", "Page one: presentation", "Page two: acknowledgments", "MAGAZINE II: Considerations", "Page one: characters", "Page two: index", "BOOK III: General Principles", "BOOK III: General Principles", "Page one: invitation"))
section
BOOK I: Introduction
Page one: presentation
Page two: acknowledgments
MAGAZINE II: Considerations
Page one: characters
Page two: index
BOOK III: General principles
BOOK III: General principles
Page one: invitation
I need to concatenate this column to look like this:
section
BOOK I: Introduction
BOOK I: Introduction / Page one: presentation
BOOK I: Introduction / Page two: acknowledgments
MAGAZINE II: Considerations
MAGAZINE II: Considerations / Page one: characters
MAGAZINE II: Considerations / Page two: index
BOOK III: General Principles
BOOK III: General Principles
BOOK III: General Principles / Page one: invitation
Basically the goal is to extract the value of the upper string based in a condition and then concatenate with the lower actualizing the value with a regex expression, but I really don’t know how to do it.
Thanks in advance.
Solution
You can do:
unlist(lapply(split(x$section, cumsum(grepl('^[A-Z]{3}', x$section))),
function(y) {
if(length(y) == 1) return(y)
else c(y[1], paste(y[1], y[-1], sep = " / "))
}), use.names = FALSE)
#> [1] "BOOK I: Introduction"
#> [2] "BOOK I: Introduction / Page one: presentation"
#> [3] "BOOK I: Introduction / Page two: acknowledgments"
#> [4] "MAGAZINE II: Considerations"
#> [5] "MAGAZINE II: Considerations / Page one: characters"
#> [6] "MAGAZINE II: Considerations / Page two: index"
#> [7] "BOOK III: General Principles"
#> [8] "BOOK III: General Principles"
#> [9] "BOOK III: General Principles / Page one: invitation"
Answered By – Allan Cameron
Answer Checked By – David Marino (BugsFixing Volunteer)