Groovy: Enhancement to String class to add a regexp find method

2009/03/28

I just submitted a groovy patch that enhances the String class with a “find” method that makes working with regular expressions much easier.

One of the most common use cases is to search a string for a regular expression pattern. If a match is found, then do something with the matched value.

Currently in groovy, the recommended way to do this is to create a matcher and then use indexes to work with any matches that might be found:

assert "10292" == ("New York, NY 10292" =~ /\d{5}/)[0]

If you try to do that on a string that doesn’t actually match the regular expression, you’ll get an IndexOutOfBoundException. To be safe, you need to check matcher.find() (not “matches” as that requires the entire string to match!) to see if the string is actually in there:

def m = ("New York, NY" =~ /\d{5}/)[0]
def zip
if (m.find()) {
   zip = m[0]
}

It also has inconsistent behavior if the regular expression happens to have capture groups in it. Then it returns an array containing the match and the capture groups, forcing you to index into that array to actually get the match you want:

assert "c" == ("foo car baz" =~ /(.)ar/)[0][1]

Groovy has already added closure aware replace method to the String class. The patch adds a complimentary find method to string that will return the string matched by the closure without needing to worry about matcher objects and array indexes.

You can either call it without a closure to get the full found match back (even if it has groups in it):

assert "10292" == "New York, NY 10292".find(/\d{5}/)

It safely returns a null if the match isn’t found, which can clean up boilerplate safety checks quite a bit. The user can check for null using groovy truth if they want to:

def zip = "New York, NY".find(/\d{5}/)   // returns null
 
if (zip) { ... }

If you want to work with capture groups, or manipulate the value, you can pass a closure to the find method that will be passed the full match as well as any capture groups (just as the collection based regular expression methods work):

// no capture groups, only the match is passed to the closure
assert "bar" == "foo bar baz".find(/.ar/) { match -> return match }
 
// one capture group
assert "b" == "foo bar baz".find(/(.)ar/) { match, firstLetter -> return firstLetter }
 
// many capture groups, all passed to the closure after the full match
assert "2339999" == "adsf 233-9999 adsf".find(/(\d{3})?-?(\d{3})-(\d{4})/) { match, areaCode, exchange, stationNumber ->
    assert "233-9999" == match
    assert null == areaCode
    assert "233" == exchange
    assert "9999" == stationNumber
    return "$exchange$stationNumber"
}

If you think this would be a valuable addition to groovy, you can vote for it in JIRA.

There are 4 comments in this article:

  1. 2009/03/29maxwell say:

    Great work.. =)

    The current way to work with regex is really boring…

    What is the JIRA link?

  2. 2009/03/29tednaleid say:

    Thanks! The link’s up above on the words “groovy patch”, and I’ve added a link at the end of the article. My link color is way too close to the text color so it’s hard to see. I’ve been meaning to redesign this blog for a while now :). Here’s the explicit link:

    http://jira.codehaus.org/browse/GROOVY-3443

  3. 2009/04/6Colin Harrington say:

    This is much more Groovy. Excellent work!!

    I noticed that it was added to the 1.6.1 and 1.7-beta-1 streams! http://jira.codehaus.org/browse/GROOVY-3443

    I’m looking forward to using it!

  4. 2009/04/7Ted Naleid » Groovy 1.6.1 released with new find and findAll regexp methods on String say:

    [...] 1.6.1 was released today, and it includes a patch I submitted a few weeks ago to make working with regular expressions much more groovy. Thanks to everyone that voted for the patch in the Groovy [...]

Write a comment: