MyNixOS website logo
Description

A lensy interface to regular expressions.

lens-regex-pcre

Hackage and Docs

Based on pcre-heavy; so it should support any regexes or options which it supports.

Performance is equal, sometimes better than that of pcre-heavy alone.

Which module should you use?

If you need unicode support, use Control.Lens.Regex.Text, if not then Control.Lens.Regex.ByteString is faster.

Working with Regexes in Haskell kinda sucks; it's tough to figure out which libs to use, and even after you pick one it's tough to figure out how to use it; lens-regex-pcre hopes to replace most other solutions by being fast, easy to set up, more adaptable with a more consistent interface.

It helps that there are already HUNDREDS of combinators which interop with lenses :smile:.

As it turns out; regexes are a very lens-like tool; Traversals allow you to select and alter zero or more matches; traversals can even carry indexes so you know which match or group you're working on.

Examples

txt :: Text
txt = "raindrops on roses and whiskers on kittens"

-- Search
>>> has [regex|whisk|] . match txt
True

-- Get matches
>>> txt ^.. [regex|\br\w+|] . match
["raindrops","roses"]

-- Edit matches
>>> txt & [regex|\br\w+|] . match %~ T.intersperse '-' . T.toUpper
"R-A-I-N-D-R-O-P-S on R-O-S-E-S and whiskers on kittens"

-- Get Groups
>>> txt ^.. [regex|(\w+) on (\w+)|] . groups
[["raindrops","roses"],["whiskers","kittens"]]

-- Edit Groups
>>> txt & [regex|(\w+) on (\w+)|] . groups %~ reverse
"roses on raindrops and kittens on whiskers"

-- Get the third match
>>> txt ^? [regex|\w+|] . index 2 . match
Just "roses"

-- Match integers, 'Read' them into ints, then sort them in-place
-- dumping them back into the source text afterwards.
>>> "Monday: 29, Tuesday: 99, Wednesday: 3" 
   & partsOf ([regex|\d+|] . match . unpacked . _Show @Int) %~ sort
"Monday: 3, Tuesday: 29, Wednesday: 99"

Basically anything you want to do is possible somehow.

Performance

See the benchmarks.

Summary

Caveat: I'm by no means a benchmarking expert; if you have tips on how to do this better I'm all ears!

  • Searchlens-regex-pcre is marginally slower than pcre-heavy, but well within acceptable margins (within 0.6%)
  • Replacelens-regex-pcre beats pcre-heavy by ~10%
  • Modifypcre-heavy doesn't support this operation at all, so I guess lens-regex-pcre wins here :)

How can it possibly be faster if it's based on pcre-heavy? lens-regex-pcre only uses pcre-heavy for finding the matches, not substitution/replacement. After that it splits the text into chunks and traverses over them with whichever operation you've chosen. The nature of this implementation makes it a lot easier to understand than imperative implementations of the same thing. This means it's pretty easy to make edits, and is also the reason we can support arbitrary traversals/actions. It was easy enough, so I went ahead and made the whole thing use ByteString Builders, which sped it up a lot. I suspect that pcre-heavy can benefit from the same optimization if anyone feels like back-porting it; it could be (almost) as nicely using simple traverse without any lenses. The whole thing is only about 25 LOC.

I'm neither a benchmarks nor stats person, so please open an issue if anything here seems fishy.

Without pcre-light and pcre-heavy this library wouldn't be possible, so huge thanks to all contributors!

Here are the benchmarks on my 2013 Macbook (2.6 Ghz i5)

benchmarking static pattern search/pcre-heavy ... took 20.78 s, total 56 iterations
benchmarked static pattern search/pcre-heavy
time                 375.3 ms   (372.0 ms .. 378.5 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 378.1 ms   (376.4 ms .. 380.8 ms)
std dev              3.747 ms   (922.3 μs .. 5.609 ms)

benchmarking static pattern search/lens-regex-pcre ... took 20.79 s, total 56 iterations
benchmarked static pattern search/lens-regex-pcre
time                 379.5 ms   (376.2 ms .. 382.4 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 377.3 ms   (376.5 ms .. 378.4 ms)
std dev              1.667 ms   (1.075 ms .. 2.461 ms)

benchmarking complex pattern search/pcre-heavy ... took 95.95 s, total 56 iterations
benchmarked complex pattern search/pcre-heavy
time                 1.741 s    (1.737 s .. 1.746 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.746 s    (1.744 s .. 1.749 s)
std dev              4.499 ms   (3.186 ms .. 6.080 ms)

benchmarking complex pattern search/lens-regex-pcre ... took 97.26 s, total 56 iterations
benchmarked complex pattern search/lens-regex-pcre
time                 1.809 s    (1.736 s .. 1.908 s)
                     0.996 R²   (0.991 R² .. 1.000 R²)
mean                 1.757 s    (1.742 s .. 1.810 s)
std dev              42.83 ms   (11.51 ms .. 70.69 ms)

benchmarking simple replacement/pcre-heavy ... took 23.32 s, total 56 iterations
benchmarked simple replacement/pcre-heavy
time                 423.8 ms   (422.4 ms .. 425.3 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 424.0 ms   (422.9 ms .. 426.2 ms)
std dev              2.684 ms   (1.239 ms .. 4.270 ms)

benchmarking simple replacement/lens-regex-pcre ... took 20.84 s, total 56 iterations
benchmarked simple replacement/lens-regex-pcre
time                 382.8 ms   (374.3 ms .. 391.5 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 378.2 ms   (376.3 ms .. 381.0 ms)
std dev              3.794 ms   (2.577 ms .. 5.418 ms)

benchmarking complex replacement/pcre-heavy ... took 24.77 s, total 56 iterations
benchmarked complex replacement/pcre-heavy
time                 448.1 ms   (444.7 ms .. 450.0 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 450.8 ms   (449.5 ms .. 453.9 ms)
std dev              3.129 ms   (947.0 μs .. 4.841 ms)

benchmarking complex replacement/lens-regex-pcre ... took 21.99 s, total 56 iterations
benchmarked complex replacement/lens-regex-pcre
time                 399.9 ms   (398.4 ms .. 402.2 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 399.6 ms   (399.0 ms .. 400.4 ms)
std dev              1.135 ms   (826.2 μs .. 1.604 ms)

Benchmark lens-regex-pcre-bench: FINISH

Behaviour

Precise Expected behaviour (and examples) can be found in the test suites:

Metadata

Version

1.1.0.0

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows