MyNixOS website logo
Description

Parser combinators for trees in the Penn Treebank format.

This Haskell package provides parsers for syntactic trees annotated in the Penn Treebank format, powered by Megaparsec.

penntreebank-megaparsec :: Megaparsec parsers for trees in the Penn Treebank format

Travis CI build :: master

This Haskell package provides parsers for syntactic trees annotated in the Penn Treebank format, powered by Megaparsec.

It supports vertical composition of custom label parsers with tree parsers, which means you can customize your label parsers and use these parsers to perform parsing of labels at the same time as that of trees. For the following example of categorial grammar trees,

(A/B (B He)
     (B\<A/B> thinks))

the result of tree parsing is:

Node "A/B" [
    Node "B" [
        Node "He" []
    ], 
    Node "B\<A/B>" [
        Node "thinks" []
    ]
] 

which you can make followed by a secondary parsing of labels (= categories):

Node (CatRight (Atom B) (Atom A)) [
    Node (Atom B) [
        Node (Lex "He") []
    ], 
    Node (CatLeft (Atom B) (CatRight (Atom B) (Atom A))) [
        Node (Lex "thinks") []
    ]
] 

Usage

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE MultiParamTypeClasses #-}

import Data.Tree.Parser.Penn.Megaparsec.Char as TC
import Data.Text (Text)
import Text.Megaparsec as MegaP

-- define the node type
data PennNode = NonTerm Text | Term Text deriving (Show)

-- define the node parsers
instance {-# OVERLAPS #-} TC.ParsableAsTerm Text PennNode where
    pNonTerm = NonTerm <$> MegaP.takeRest
    pTerm = Term <$> MegaP.takeRest

-- specify and disambiguate the type of the input stream 
parser :: (Monad m) => TC.PennTreeParserT Text m PennNode
parser = TC.pTree

main :: IO ()
main = MegaP.parseTest parser "(A (B C (D E)))"

Metadata

Version

0.2.0

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows