Description
Stream data to or from LMDB databases using the streamly library.
Description
Please see the README on GitHub at https://github.com/shlok/streamly-lmdb#readme
README.md
streamly-lmdb
Stream data to or from LMDB databases using the Haskell streamly library.
Requirements
Install LMDB on your system:
- Debian Linux:
sudo apt-get install liblmdb-dev
. - macOS:
brew install lmdb
.
Quick start
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Function ((&))
import qualified Streamly.Data.Fold as F
import qualified Streamly.Data.Stream.Prelude as S
import Streamly.External.LMDB
( Limits (mapSize),
WriteOptions (writeTransactionSize),
defaultLimits,
defaultReadOptions,
defaultWriteOptions,
getDatabase,
openEnvironment,
readLMDB,
tebibyte,
writeLMDB,
)
main :: IO ()
main = do
-- Open an environment. There should already exist a file or
-- directory at the given path. (Empty for a new environment.)
env <-
openEnvironment "/path/to/lmdb-database" $
defaultLimits {mapSize = tebibyte}
-- Get the main database.
-- Note: It is common practice with LMDB to create the database
-- once and reuse it for the remainder of the program’s execution.
db <- getDatabase env Nothing
-- Stream key-value pairs into the database.
let fold' = writeLMDB db defaultWriteOptions {writeTransactionSize = 1}
let writeStream = S.fromList [("baz", "a"), ("foo", "b"), ("bar", "c")]
_ <- S.fold fold' writeStream
-- Stream key-value pairs out of the
-- database, printing them along the way.
-- Output:
-- ("bar","c")
-- ("baz","a")
-- ("foo","b")
let unfold' = readLMDB db Nothing defaultReadOptions
let readStream = S.unfold unfold' undefined
S.mapM print readStream
& S.fold F.drain
Benchmarks
See bench/README.md
. Summary (with rough figures from our machine†):
Reading:
- For iterating through a fully cached LMDB database, this library has roughly a 110 ns/pair overhead compared to C. (Plain Haskell
IO
code has roughly a 70 ns/pair overhead compared to C. The two preceding figures being similar fulfills the promise of streamly and stream fusion.) - By using
unsafeReadLMDB
instead ofreadLMDB
, we can get the overhead down to roughly 100 ns/pair. - By additionally using the
readUnsafeFFI
option (to useunsafe
FFI calls under the hood), we can get the overhead down to roughly 40 ns/pair.
- For iterating through a fully cached LMDB database, this library has roughly a 110 ns/pair overhead compared to C. (Plain Haskell
Writing:
- For writing to an LMDB database, this library has roughly a 210 ns/pair overhead compared to C. (Plain Haskell
IO
code has roughly a 100 ns/pair overhead compared to C. The two preceding figures being similar fulfills the promise of streamly and stream fusion.) - By using the
writeUnsafeFFI
option (to useunsafe
FFI calls under the hood), we can get the overhead down to roughly 140 ns/pair.
- For writing to an LMDB database, this library has roughly a 210 ns/pair overhead compared to C. (Plain Haskell
For most Haskell programs, these differences will not cause problems. (For instance, note that merely opening and reading 1 byte from a file with C already takes us tens of microseconds.)
† May 2023; Linode; Debian 11, Dedicated 32GB: 16 CPU, 640GB SSD storage, 32GB RAM.