[SOLVED] Extract Text Sections from a Cell

Table of Contents

Issue

Given a cell with text pulled from HTML in a format like this:

OA-1

Interpret products of whole numbers, e.g., interpret 5 × 7 as the
total number of objects in 5 groups of 7 objects each. For example,
describe a context in which a total number…

  • More

OA-2

Interpret whole-number quotients of whole numbers, e.g., interpret 56
÷ 8 as the number of objects in each share when 56 objects are
partitioned equally into 8 shares, or as a number …

Goal: extract a list of header identifiers so that output looks like this:
OA-1,OA-2…

I’ve pulled data via the =importhtml function as shown in the two examples on this MWE sheet.

Noting that char(10) is a return character, I’m thinking of code something like this
psuedocode:

Left(Cell_with_text,number_of_characters = find(first char(10))-1)&","&"find_next_heade"+\r

Another approach might me to make a library of all the headers (eg, "OA-1,OA-2…") and somehow find each instance in the cell, maybe with a find function in an array?

Assumptions

  • The headers could have between 3 and 7 characters.

  • Headers do not always start with the same letter.

  • Headers do always have a dash, but it could be anywhere from the 2nd to the 2nd to last position.

  • There is always a char(10) after each header.

Solution

This formula Splits all of them at once, then just keeps the 1st column (which is your desired output). Then do the JOIN().

=JOIN(", ",INDEX(SPLIT(importhtml("https://contentexplorer.smarterbalanced.org/target/m-g3-c1-ta","list",3),CHAR(10)),,1))

Here is a sample sheet, viewable to all in perpetuity.

Answered By – MattKing

Answer Checked By – Mildred Charles (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *