Regex based parsers.
parser-regex
Regex based parsers
Features
- Parsers based on regular expressions, capable of parsing regular languages. There are no extra features that would make parsing non-regular languages possible.
- Regexes are composed using combinators.
- Resumable parsing of sequences of any type containing values of any type.
- Special support for
Text
andString
in the form of convenient combinators and operations like find and replace. - Parsing runtime is linear in the length of the sequence being parsed. No exponential backtracking.
Example
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative (optional)
import Data.Text (Text)
import Regex.Text (REText)
import qualified Regex.Text as R
import qualified Data.CharSet as CS
data URI = URI
{ scheme :: Maybe Text
, authority :: Maybe Text
, path :: Text
, query :: Maybe Text
, fragment :: Maybe Text
} deriving Show
-- ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
-- A non-validating regex to extract parts of a URI, from RFC 3986
-- Translated:
uriRE :: REText URI
uriRE = URI
<$> optional (R.someTextOf (CS.not ":/?#") <* R.char ':')
<*> optional (R.text "//" *> R.manyTextOf (CS.not "/?#"))
<*> R.manyTextOf (CS.not "?#")
<*> optional (R.char '?' *> R.manyTextOf (CS.not "#"))
<*> optional (R.char '#' *> R.manyText)
>>> R.reParse uriRE "https://github.com/meooow25/parser-regex?tab=readme-ov-file#parser-regex"
Just (URI { scheme = Just "https"
, authority = Just "github.com"
, path = "/meooow25/parser-regex"
, query = Just "tab=readme-ov-file"
, fragment = Just "parser-regex" })
Documentation
Please find the documentation on Hackage: parser-regex
Already familiar with regex patterns? See the Regex pattern cheat sheet.
Alternatives
regex-applicative
regex-applicative
is the primary inspiration for this library, and provides a similar set of features. parser-regex
attempts to be a more fully-featured library built on the ideas of regex-applicative
.
Traditional regex libraries
Other alternatives are more traditional regex libraries that use regex patterns, like regex-tdfa
and regex-pcre
/ regex-pcre-builtin
.
Reasons to use parser-regex
over traditional regex libraries:
- You prefer parser combinators over regex patterns
- You need more powerful parsing capabilities than just submatch extraction
- You need to parse a sequence type that is not supported by these regex libraries
Reasons to use traditional regex libraries over parser-regex
:
- The terseness of regex patterns is better suited for your use case
- You need something very fast, and adversarial input is not a concern. Use
regex-pcre
/regex-pcre-builtin
.
For a more detailed comparison of regex libraries, see here.
Contributing
Questions, bug reports, documentation improvements, code contributions welcome! Please open an issue as the first step.