skip to content
Keshav Mohta

All About Regex

/ 5 min read

Regular Expression or Regex

Regular expressions are useful to parse a file and validate or replace with our need, few of examples are regex are

  • email address validation
  • number validation
  • password validation and strength check

in this article we will explore more on regex

Syntax Variation

there are 2 variation of writing regex pattern in #JavaScript

using back slash / /

  • when using back slashes , we do not need to wrap the expression in quotes

using string pattern

using ” then no need to wrap pattern in / / and then to create a regexp using new RegExp('pattern')

eg.

const rex = /[a-zA-Z0-9]/;
const stringPattern = "[a-zA-Z0-9]";
const stringRegex = new RegExp(stringPattern);
// here rex and stringRegex are equivalent

also there are subtle difference between these 2 variation and

also when we need to use character class such as \d ( digit only ) \b (word boundary) then in string pattern we need to use extra \ and we can get regex pattern from string pattern using regex.source method

const rex = /\d*\bcolor\b/;
const stringPattern = "\\d*\\bcolor\\b";
const stringRegex = new RegExp(stringPattern);
console.log(stringRegex.source); // return \d*\bcolor\b

flags

flags are very distinctive usage while using regex, the very common flags we use mostly are

  • /g —> do global search
  • /i —> do case insensitive search
  • /m —> do multiline search

and tis is how do we write flag in both regex syntax

in backslash pattern, write after ending /, for eg. /[a-z0-9]/gi in string pattern, we set as second argument, for eg. new RegExp('[a-z0-9]', 'gi')

apart from these common flags there are few other useful flags are which we talk here

/d flag

do not confuse this with \d character class

this flag is useful when we use capture groups and it provide the capture group and matched group index array

also note this works only when we have /g flag , means both comes together

/y flag

this is conditional search in regex. In Regex we can not search from a specific range like we want to search after particular match

for eg. I want to capture all property of a css declaration block , so first search for opening bracket { and after that we search for property and value and so on but this is not possible as regex always start from start of page

here sticky flag /y comes handy, we can set index of regex pattern and then match it

/u flag

this is unicode match flag as if we have smiley, or some unicode pattern in our string.

also when we use \p character class then it must have \u flag

\p is very useful character class , so we talk about character class

character class

very common character class are

  • \d to capture digit
  • \w to capture all word
  • \s to capture white space
  • \p pre defined class, /u flag is mandatory when we use this

RegEx return

normally we use while loop for regex.match to get all matched pattern, but es2023 introduce new method .matchAll() which is easier to work on

with while loop

const stringPattern = "[a-zA-Z0-9]*";
const stringRegex = new RegExp(stringPattern, "g");
const str = "there are 33 states and 7 union territory in india.";
const matches = str.match(stringRegex);
while ((result = str.match(stringRegex)) !== null) {
doSomethingWith(result);
}

Note: above will return Array(21) every word and empty string as match. why?

because we are using * so it means 0 or 1 time

to match only word; change * with +

const stringPattern = "[a-zA-Z0-9]+";

now it will return Array(10)

with matchAll

const stringPattern = "[a-zA-Z0-9]+";
const stringRegex = new RegExp(stringPattern, "g");
const str = "there are 33 states and 7 union territory in india.";
const matches = str.matchAll(stringRegex);
console.log({matches}); // this will be RegExpStringIterator
const output = Array.from(matches);

Capture Groups and Named Group

This is useful when you want to capture 2 or more instances but within array it is hard to identify which is matches what?

for eg, I have to find out browser and version both from a user agent string

navigator.userAgent gives below output in WritableStreamDefaultWriter

“‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36”

const str = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36';
const pattern = '(\\w*)/\(\\d*)+.(\\d*)'
const rx = new RegExp(pattern, 'gi');
const matches = str.matchAll(rx);
console.log([...matches]);

which will give below result

[
[
"Mozilla/5.0",
"Mozilla",
"5",
"0"
],
[
"AppleWebKit/537.36",
"AppleWebKit",
"537",
"36"
],
[
"Chrome/126.0",
"Chrome",
"126",
"0"
],
[
"Safari/537.36",
"Safari",
"537",
"36"
]
]

but above is the result I have coped from console which show each entry as Array(4) but actually result structure is an mixed array ( PHP called it associative array, in Javascript it is called what?)

0: "Mozilla/5.0"
1:"Mozilla"
2:"5"
3:"0"
groups: undefined
index: 0
input: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"

You only access other properties using dot notation only.

now we got all result but we want that it must separate each entry like

{name: chrome, version: 126.0.0.0}
{name: Safari, version: 537.36 }

and so on

we can achieve this using capture group (?<name>); lets change the regex

const pattern = '(?<Browser>\\w*)/\(?<Version>\\d*).(\\d*)'

and now see the result and each entry have value under groups key

0: "Mozilla/5.0"
1: "Mozilla"
2: "5"
3: "0"
groups: {Browser: 'Mozilla', Version: '5'}
index: 0
input: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
length: 4

now you can use .matchAll() and get groups property value separately and we also get index of each match in index property

still we are not getting complete version including minor and patch ; we are getting 5 but the value we need is 5.0

so modify the regex again and create a complete group for the version

const pattern = '(?<Browser>\\w*)/\(?<Version>(\\d*).(\\d*)+)'

Task: works fine when we have version up to 2 dots but for chrome we have 126.0.0.0 and we want to capture complete; what could be the proper regex?