Protein Databank file format library.
Protein Data Bank file format is a most popular format for holding biological macromolecular data.
This is a very fast sequential parser:
below 7s for the largest entry in PDB - 1HTQ which is over 70MB - as compared with
11s of RASMOL 2.7.5,
or 2m15s of BioPython with Python 2.6 interpreter.
In its parallel incarnation it is most probably the fastest parser for PDB format.
It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.
hPDB
Haskell PDB file format parser.
Protein Data Bank file format is a most popular format for holding biomolecule data.
This is a very fast parser:
- below 7s for the largest entry in PDB - 1HTQ which is over 70MB
- as compared with 11s of RASMOL 2.7.5,
- or 2m15s of BioPython with Python 2.6 interpreter.
It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.
Details on official releases are on Hackage
This package is also a part of Stackage - a stable subset of Hackage.
Projects for the future:
Please let me know if you would be willing to push the project further.
In particular one may considering these features:
- Implement basic spatial operations of RMS superposition (with SVD), affine transform on a substructure.
- Use
lens
to facilitate access to the data structures.- torsion angles within protein/RNA chain.
- Add Octree to the default data structure (with automatic update.)
- Migrate out of
text-format
, since it gives portability trouble, and slows things down when printing. - Write a combinator library for generic fast parsing.
- Checking whether GHC 7.8 improved efficiency of fixed point arithmetic, since PDB coordinates have dynamic range of just ~2^20 bits, with smallest step of 0.001.
- Class-based wrappers showing Structure-Model-Chain-Residue-Atom interface with possible wrapping of Repa/Accelerate arrays for fast computation.
Please ask me any questions on Gitter.