A parser for web documents according to the HTML5 specification.
mangrove
provides HTML parsing for the Willow web browser suite. As such, it has not necessarily been written with a broader audience in mind, but the resulting data structures should still be generic enough to serve as a general parsing library should you need HTML5 compatibility (most likely, its codified error recovery algorithms); if you do use this for other projects, please do share any issues —or even just discomforts— that broader usage reveals. Notably, however, mangrove
makes no attempt to parse CSS, JavaScript, or to access linked files, leaving those tasks to other parts of the suite and merely generates a simple document tree from the markup.
About
'mangrove' provides an HTML5-compatible parser for web documents, implemented in Haskell. In keeping with the immutable data paradigms, an emphasis has been placed on avoiding side effects and mutable structures rather than strictly following the official algorithms. The resulting document tree can be returned to willow
to be styled and rendered.
This readme is rather sparse, as it has been written for a subfolder of the complete repository; for full info on the project, see the primary readme in either this directory, its parent, or the online host, whichever of those links may work.
Coverage reporting
Unfortunately, the invocation of hpc by cabal-install <= 3.4.0.0 doesn't work properly when multiple packages are developed as part of the same project.
Until the next version is released, I recommend that you don't enable coverage reports for mangrove
, in order for the tests themselves to run correctly.