MyNixOS website logo
Description

Simple implementation of Earley parsing.

A little Earley parser.

Also some utilities for visualizing parse trees.

See README.

A little Earley parser

Parser for context-free grammars.

Example

The following module defines a grammar for arithmetic expressions (like 1+2*3). It defines:

  • A type of non-terminal symbols Ns.
  • A type of terminal symbols Ts.
  • The production rules of the grammar arithRules.
  • A matching function, which interprets terminal symbols Ts as sets of lexemes, of type Char here.
  • A grammar arithG packaging up all of the above.
import Data.Char (isDigit)
import Little.Earley

data Ns = SUM | PRODUCT | FACTOR | NUMBER deriving (Eq, Ord, Enum, Bounded, Show)
data Ts = Digit | OneOf [Char] deriving (Eq, Ord, Show)

arithRules :: Ns -> [Rule Ns Ts]
arithRules n = case n of
  SUM ->
    [ [ N PRODUCT ]
    , [ N SUM, T (OneOf ['+', '-']), N PRODUCT ] ]
  PRODUCT ->
    [ [ N FACTOR ]
    , [ N PRODUCT, T (OneOf ['*', '/']), N FACTOR ] ]
  FACTOR ->
    [ [ N NUMBER ]
    , [ T (OneOf ['(']), N SUM, T (OneOf [')']) ] ]
  NUMBER ->
    [ [ T Digit ]
    , [ T Digit, N NUMBER ] ]

matchTs :: Ts -> Char -> Bool
matchTs Digit = isDigit
matchTs (OneOf s) = (`elem` s)

arithG :: Grammar Ns Ts Char
arithG = mkGrammar arithRules matchTs

Load that file in GHCi:

ghci -package little-earley example.hs

Parse a string using that grammar:

> pparse arithG SUM "1+2*3"

Output:

     +-----+--SUM #1---+
     |     |           |
  SUM #0   |      +PRODUCT #1-+
     |     |      |     |     |
PRODUCT #0 | PRODUCT #0 |     |
     |     |      |     |     |
 FACTOR #0 |  FACTOR #0 | FACTOR #0
     |     |      |     |     |
 NUMBER #0 |  NUMBER #0 | NUMBER #0
     |     |      |     |     |
-----------------------------------
     1     +      2     *     3

The module Little.Earley.Examples contains more examples of grammars using this library.

Links

  • This Earley parser implementation is based on Loup Vaillant's tutorial.
  • See also the Earley library also in Haskell, with much fancier types to encode arbitrary semantic actions instead of clumsy trees.
Metadata

Version

0.2.0.0

License

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows