Express portable, composable and reusable data tasks and pipelines.
porcupine is centered around the PTask datatype, which represents a computation that will request access to some resources (both in read and write modes) and require some options (parameters, with docstrings and default values). PTasks are composable both sequentially and in parallel, into a pipeline of tasks. The resources and parameters are organized in a tree which will be automatically exposed to the outside world. This makes the pipeline self-documented, and makes it so any option or file required at some point by any task can be visualized and set or remapped (via a combination of a YAML or JSON config file and command-line arguments) before the pipeline will run. That means that the PTasks are completely agnostic of their data inputs, and that new data sources can be added without having to change any of the tasks' logic or even their types. This is done via the LocationAccessor typeclass. `porcupine-core` provides only access to local files (via resourcet), other location accessors will be in separate packages. See for instance the https://hackage.haskell.org/package/porcupine-http package to access HTTP urls. PTasks also provide caching thanks to the funflow package. See the README at https://github.com/tweag/porcupine#README.md and the examples in `porcupine-core` package.