MyNixOS website logo
Description

A highly generic parser combinators library.

Please see the README on GitHub at https://github.com/nasso/comparse#readme

comparse

Tests

comparse is a parser combinator library supporting arbitrary input types. Combinators do not care about the exact type of the input stream, allowing you to use them on your own data source, as long as it supports a specific set of operations. The ParserT monad transformer can wrap around another monad, allowing you to easily implement features such as tracing, context-sensitive parsing, state machines, and so on. comparse does its best to provide meaningful error messages without sacrificing performances or the developer experience when writing parsers.

How?

The main difference between comparse and other equivalent libraries is that the parsers you write are generic over the exact type of the input stream. That means you can write parsers that work with both String, Text or any other source of Chars with no modification. The combinators in comparse also work with anything that's an instance of Stream, even if it's not a Stream of Chars (e.g. a TokenStream!).

Example

The following example shows how to use the comparse library to parse a simplified subset of the JSON data format:

-- TypeFamilies is required by the `CharParser` constraint
{-# LANGUAGE TypeFamilies #-}

import Control.Monad (void)
import Control.Monad.Parser
import Data.Char (isAlpha, isDigit)
import Data.Text (Text)
import qualified Data.Text as T

data JValue
  = JString String
  | JNumber Int
  | JBool Bool
  | JNull
  | JObject [(String, JValue)]
  | JArray [JValue]
  deriving (Show)

main :: IO ()
main = do
  input <- getContents
  putStrLn "Parsing as String:"
  print (parseString input)
  putStrLn "Parsing as Text:"
  print (parseText $ T.pack input)

parseString :: String -> Maybe JValue
parseString s =
  case runStringParser (json <* eof) s of
    Parsed v _ _ -> Just v
    _ -> Nothing

parseText :: Text -> Maybe JValue
parseText t =
  case runTextParser (json <* eof) t of
    Parsed v _ _ -> Just v
    _ -> Nothing

lexeme :: CharParser p => p a -> p a
lexeme p = spaces *> p <* spaces
  where
    spaces = void $ many $ oneOf " \n\r\t"

symbol :: CharParser p => String -> p String
symbol = lexeme . string

json :: CharParser p => p JValue
json =
  JString <$> stringLiteral
    <|> JNumber <$> number
    <|> JBool <$> bool
    <|> JNull <$ symbol "null"
    <|> JObject <$> object
    <|> JArray <$> array

stringLiteral :: CharParser p => p String
stringLiteral = lexeme $ like '"' *> many (unlike '\"') <* like '"'

number :: CharParser p => p Int
number =
  lexeme
    ( read <$> ((:) <$> like '-' <*> many1 digit)
        <|> read <$> (optional (like '+') *> many1 digit)
    )
    <* notFollowedBy (match isAlpha)
  where
    digit = match isDigit

bool :: CharParser p => p Bool
bool = True <$ symbol "true" <|> False <$ symbol "false"

object :: CharParser p => p [(String, JValue)]
object =
  symbol "{"
    *> sepBy ((,) <$> (stringLiteral <* symbol ":") <*> json) (symbol ",")
    <* symbol "}"

array :: CharParser p => p [JValue]
array = symbol "[" *> sepBy json (symbol ",") <* symbol "]"

The CharParser p constraint lets us easily write parsers specialised to character input streams. It's actually defined as a type alias:

type ParserOf i p = (MonadParser p, Item (Input p) ~ i)

type CharParser p = ParserOf Char p

This design makes it very easy to write reusable parsers and combinators.

For more examples, please refer to the tests.

Metadata

Version

0.2.0.0

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows