So I'm a big fat noob at regular expresions


#1

I’m not that very experienced at coding and want to know how to use regular expressions. If anybody’s wondering, I’m using JS but knowing how to use them in the Find and Search bar would help too (I don’t know if this makes a difference but yeah).


#2

Regex is pretty simple once you break it into parts. First of all, if you want to test a regular expression, try regex101.com. It allows you to write the expressions, test them under different runtimes, export code, and write unit tests.

There are a few basic concepts in regex. First of all, groups. Opening and closing parentheses denote the beginning and end of a “group.” One of the most common uses for groups is “capturing groups.” Say you want to find all occurrences of “cloud9” in a paragraph, and see what the next word is. Here’s our paragraph:

Cloud9 is available at c9.io. AWS Cloud9 was also recently released.

You can use the following regular expression to find the example above:

/cloud9\s(\w*)/ig

This looks complicated, but we can break it down into pieces:

/
cloud9\s
(\w*)
/ig

At the beginning we have a /. This simply denotes the beginning of our regular expression. Then, we look for cloud9, followed by a space. So “cloud9.” (with a period) would not match, because there is no space after it. Then, we create a capturing group with (. This is the part of the text we would like to know something about. We use \w to denote that we only want characters from the alphabet. Then, we use *. This simply means, after you’ve found a letter, keep capturing letters. Once we reach the space after the word, it will stop, since space isn’t a letter. We only want to capture the word after cloud9, so we’ll stop capturing now with ). To end our regular expression, we use /. Then, we add our “flags.” In this case, I want the regex to be case insensitive, so that it will match “Cloud9”, “cloud9”, and “CLOUD9”, plus any other capitalization. We’ll use the i flag to ignore case. You’ll notice that in our paragraph, we have two occurrences of “cloud9.” In order to tell our regex to continue searching after it’s found the first occurrence, we’ll add the g, or global flag.

Try putting the above into regex101.com, to see how it highlights the matches.

Hopefully what I wrote isn’t too confusing. You can find a lot of detailed explanations and quick starts for regular expressions at regular-expressions.info. The quickstart is a good place to start. Just keep in mind a few things:

  • Regular expressions are designed to process a predictable input. If you can guarantee the input will follow a certain format, regex will work. Things like well-written sentences in any language work somewhat well because there are rules, but keep in mind the exceptions to those rules and account for them.
  • DO NOT try to process HTML (or any XML) with regex. HTML is exceptionally difficult to process due to how unpredictable it is (for instance, self-closing tags).
  • Keep in mind that regex isn’t for everything, even if it’s predictable. For instance, email seems fairly simple. It’s just alphanumeric characters followed by “@,” followed by a domain name, with dot something at the end, right? Well, you might start with that, then consider the different specifications for email, and end up with some of the things like you can find at https://www.regular-expressions.info/email.html. Then, even worse, you realize that it still doesn’t correctly validate all email formats, so you decide to fix it so it works with all email formats, and end up with this. See this StackOverflow post for some terrible regex.
  • Regex takes time to process, and if you make the regex or the input more complicated/longer, it will take a longer time.
  • If you’re stuck, and need help, you might try the regex subreddit, which isn’t the most active, but still has people that might be having the same problems.

#3

That wasn’t too confusing thanks!