[SOLVED] Find a regular expression to get substrings with file extensions

Issue

There are several variants of the strings:

  1. "txt files `(*.txt)|*.txt|All files (*.*)|*.*`"
  2. "Image Files`|*.jpg;*.jpeg;*.png;`"
  3. "Excel Files `(*.xls, *.xlsx)|*.xls;*.xlsx|CSV Files (*.csv)|*.csv`"

The substring can end with any character (space, ',', '.', '|', ';') - it doesn’t matter.

Tried the following options: "[^*].{3,4}(.?);", "[^*]+.(.?);".

I need a regular expression to get string[] = {.jpg, .jpeg, ...}, preferably without duplicate elements.

Solution

Do you really need a regular expression?

First off, if you split by |, each odd entry in the result is a list of extensions. You can then split that again by ; to get the extensions, which you can then flatten into a single sequence and trim each element of the starting *. Finally, get the distinct set of that and put that into an array.

This can all be accomplished with Split and Linq:

var extensions = filter.Split('|', StringSplitOptions.RemoveEmptyEntries)
                       .Where((x, i) => i % 2 != 0)
                       .SelectMany(x => x.Split(';', StringSplitOptions.RemoveEmptyEntries))
                       .Select(x => x.TrimStart('*'))
                       .Distinct()
                       .ToArray();

Removing empty entries from the split ensures that if you end with a separator it just gets ignored.

See it in action on .NET Fiddle.

Answered By – Etienne de Martel

Answer Checked By – Timothy Miller (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *