Type level string parser combinators.
Please see README.md.
symbol-parser
Type-level string (Symbol
) parser combinators.
Features
- Define parsers compositionally, largely as you would on the term level.
- Pretty parse errors.
- Hopefully decent performance.
- No runtime cost (you shall find no term-level code here).
Examples
ghci> import Data.Type.Symbol.Parser
ghci> :k! Run (Drop 3 :*>: Isolate 2 NatDec :<*>: (Drop 3 :*>: NatHex)) "___10___FF"
...
= Right '( '(10, 255), "")
Why?
Via GHC.Generics
, we may inspect Haskell data types on the type level. Constructor names are Symbols
. Ever reify these, then perform some sort of checking or parsing on the term level? symbol-parser does the parsing on the term level instead. Catch bugs earlier, get faster runtime.
Design
Parser basics
(Note that I may omit "type-level" when referring to type-level operations.)
Until GHC 9.2, Symbol
s were largely opaque. You could get them, but you couldn't decompose them like you could String
s with uncons :: [a] -> Maybe (a, [a])
. Some Haskellers took this as a challenge (see symbols). But realistically, we needed a bit more compiler support.
As of GHC 9.2, Symbol
s may be decomposed via UnconsSymbol :: Symbol -> Maybe (Char, Symbol)
. We thus design a Char
-based parser:
type ParserCh s r = Char -> s -> Result s r
data Result s r = Cont s | Done r | Err E
A parser is a function which takes a Char
, the current state s
, and returns some result:
Cont s
: keep going, here's the next states
Done r
: parse successful with valuer
Err E
: parse error, details in theE
(a structured error)
Run
handles calling the parser Char
by Char
, threading the state through, and does some bookkeeping for nice errors.
This is a good first step, but we have some outstanding issues:
- What do we do when we reach the end of the string?
- How do we initialize parser state?
- How do we pass parsers around? (As of 2024-04-20, GHC HEAD does not support unsaturated type families.)
As always, types are our salvation.
type ParserEnd s r = s -> Either E r
type Parser s r = (ParserChSym s r, ParserEndSym s r, s)
type ParserChSym s r = Char ~> s ~> Result s r
type ParserEndSym s r = s ~> Either E r
We define a parser as a tuple of a character parser, an end handler, and an initial state. The types ending in Sym
are defunctionalization symbols, which enable us to pass our parsers around as type-level functions. The plumbing is provided Oleg's fantastic library defun-core.
Example: NatDec
Let's write a parser that parses a natural decimal.
type NatDec = '(NatDecChSym, NatDecEndSym, 0)
type NatDecCh ch acc = NatDecCh' acc (ParseDecimalDigitSym @@ ch)
type family NatDecCh' acc mDigit where
NatDecCh' acc (Just digit) = Cont (acc * 10 + digit)
NatDecCh' acc Nothing = Err -- ...
type NatDecEnd acc = Right acc
-- boring defunctionalization symbol definitions
At each Char
, we attempt to parse as a digit. If it's valid, we multiply the current accumulator by 10 (a left shift) and add the digit value. At the end of the string, we simply emit the current accumulator.
It can be that easy to define a parser with symbol-parser! But it isn't always. Combinators get weird thanks to state handling. Take a look at Isolate
, then Then
.
Pitfall: Character parsers always consume
There is no backtracking or lookahead, that you do not implement yourself. This keeps the parser execution extremely simple, but breaks null parsers such as Drop 0
, so these must be handled specially (unless you don't getting mind stuck type families on misuse).
For concrete examples, see the implementation of Drop
and Literal
.
Pitfall: Not all parsers are definable
- No changing parser state. Thus, parsers such as
Try :: Parser s r -> Parser (s, [Char]) r
are not definable. Parsers may not backtrack.- Combinators such as
<|>
can emulate backtracking, but they are complex and hard to reason about (they may have bugs!).
- Combinators such as
Contributing
I would gladly accept further combinators or other suggestions. Please add an issue or pull request, or contact me via email or whatever (I'm raehik everywhere).
License
Provided under the MIT license. See LICENSE
for license text.