A Microformats 2 parser.
A parser for Microformats 2 (http:/microformats.orgwiki/microformats2), a simple way to describe structured information in HTML.
microformats2-parser
Microformats 2 parser for Haskell! #IndieWeb
- parses
items
,rels
,rel-urls
- resolves relative URLs (with support for the
<base>
tag), including inside ofhtml
fore-*
properties - parses the value-class-pattern, including date and time normalization
- handles malformed HTML (the actual HTML parser is tagstream-conduit)
- also can convert to JF2
- high performance
- extensively tested
Also check out http-link-header because you often need to read links from the Link header!
DEMO PAGE
Usage
Look at the API docs on Hackage for more info, here's a quick overview:
{-# LANGUAGE OverloadedStrings #-}
import Data.Microformats2.Parser
import Data.Default
import Network.URI
parseMf2 def $ documentRoot $ parseLBS "<body><p class=h-entry><h1 class=p-name>Yay!</h1></p></body>"
parseMf2 (def { baseUri = parseURI "https://where.i.got/that/page/from/" }) $ documentRoot $ parseLBS "<body><base href=\"base/\"><link rel=micropub href='micropub'><p class=h-entry><h1 class=p-name>Yay!</h1></p></body>"
The def
is the default configuration.
The configuration includes:
htmlMode
, an HTML parsing mode (Unsafe
|Escape
|Sanitize
)baseUri
, theMaybe URI
that represents the address you retrieved the HTML from, used for resolving relative addresses -- you should set it
parseMf2
will return an Aeson Value structured like canonical microformats2 JSON. lens-aeson is a good way to navigate it.
Development
Use stack to build.
Use ghci to run tests quickly with :test
(see the .ghci
file).
$ stack build
$ stack test
$ stack ghci
License
This is free and unencumbered software released into the public domain.
For more information, please refer to the UNLICENSE
file or unlicense.org.