All Articles

MongoDB & RegEx: Searching Documents & Excluding Special Characters


MongoDB Logo

Problem

Summary

You need to find documents in MongoDb excluding a particular character.

Example

Here we have two basic documents. We want to exclude _id: 1 from our search results.

{
  _id: 1
  myField: "exclude (me"
},
{
  _id: 2
  myField: "find me"
}

Solution

Summary

Perform a regular expression against the field contents to find the correct matches.

Code Example

db.getCollection(‘myCollection’).find({
 myField: { $regex:^(?!.*\\(.*)}
})

Explanation

Let’s break down the operators within the Regular Expression that give us the desired result moving from left-to-right.

^ + ( + ?! + .* + \\( + .* + )

  1. ^

Instructs RegEx to perform a search from the beginning of a string.

  1. (

Opens the grouping of the next expression to apply.

  1. ?!

Instructs RegEx to perform a ‘negative lookahead’. A ‘negative lookahead’ looks for occurrences of an expression after this symbol and excludes any matches from our results.

  1. .*

Instructs RegEx to match any character any number of times.

  1. \\(

Represents the character we want to exclude from our search (an opening parenthesis). An opening parenthesis along with a number of other characters in RegEx has a special reserved function, so it must be ‘escaped’ by a preceding \. We must also escape the \ because it too has a special function, thus becoming \\( to find a (.

  1. .*

A repeat of step 3. Combining steps 3, 4 & 5 together - .*\\(.* - represents searching for an opening parenthesis nested in between any possible character any number of times. In practice, this means looking for a ( anywhere within any string.

  1. )

This is the closing bracket of the negative lookahead grouping. The negative lookahead will apply to the expression inside these brackets.

N.B You can see parentheses being used here for one of their special RegEx functions: grouping an expression.

  1. Our final expression ^ + ( + ?! + .* + \\( + .* + )

The reason this is particularly important in MongoDB is because the $not & $regex operators do not work together. Therefore the negative lookup syntax within regex is required for the search.


Problem Solved

If you have any questions you’d like to ask me about this post, feel free to reach me on Twitter or Github.