Flexible and easy deriving of type classes for finitary types.
Provides a collection of wrappers, allowing you to easily define (among others) Unbox, Storable, Hashable and Binary instances for finitary types with flexibility in terms of representation and efficiency. Never write an Unbox instance by hand again!
finitary-derive
What's this all about, then?
Have you ever written an Unbox
instance for a user-defined type? I hope not, because it's a uniquely tedious chore. If your type is more complex, this can be difficult, fiddly, and frustrating. Storable
is not much better. This is the kind of 'work' that we as Haskellers ought not to put up with.
Now, you don't have to! As long as your type is Finitary
, you can now get Unbox
and Storable
(as well as a whole bunch of other) instances almost automagically:
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DerivingVia #-}
import Data.Finitary
import Data.Finitary.Finiteness
import Data.Finitary.PackInto
import Data.Word
import Data.Hashable
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Storable as VS
data Foo = Bar | Baz (Word8, Word8) | Quux Word16
deriving (Eq, Generic, Finitary)
deriving (Ord, Bounded, Hashable, NFData, Binary) via (Finiteness Foo)
someVector :: VU.Vector (PackInto Foo Word64)
someVector = VU.fromList . fmap Packed $ [Bar, Baz 0x0 0xf, Quux 0x134]
someStorableVector :: VS.Vector (PackInto Foo Word64)
someStorableVector = VS.fromList . fmap Packed $ [Bar, Baz 0x0 0xf, Quux 0x134]
If you don't have access to DerivingVia
, you can still get the benefits of this library -- just use Finitary a
instead of a
. As it is a newtype
, you can coerce
through it if you care about efficiency.
What's the deal with Unbox
and Storable
exactly? What's with all the Pack
types?
Essentially, being Finitary
means that there's a finite set of indexes, one for each inhabitant. That means we can essentially represent any inhabitant as a fixed-length number. It's on the basis of this that we can 'magic up' Storable
and Unbox
.
However, how we represent this fixed-length number isn't immediately obvious. We have a couple of options:
- A string of bits
- A string of bytes
- An array of machine words
Additionally, if we have another finitary type whose cardinality is not smaller, we could potentially 'borrow' its instances as well. Which of these choices is appropriate isn't obvious in general: it depends on whether you care about space or speed, the cardinality of the type, and a bunch of other things too. As we believe that the best people to judge tradeoffs like these are the people using our library, we provide all of these options for you to choose from, so that you can choose the one that best suits you.
So... what's the difference exactly?
PackBits
represents indexes as strings of bits. This is the most compact representation possible (honestly, maths says so), but the least efficient, as accessing individual bits is slower on most architectures than whole bytes or words. Unless you've got large Vector
s, you probably don't need this encoding, but if space is at an absolute premium, this is the best choice.
PackBytes
instead represents indexes as byte strings. This is a more efficient choice than a string of bits, but can still be slow for architectures which prefer whole-word access. It's also fairly compact, especially if your architecture has big Word
s.
PackWords
represents indexes as fixed-length arrays of Word
s. This is the most efficient encoding from the point of view of random reads and writes, but will likely waste a lot of space, unless your type is extremely large (as in, multiple copies of Word
large).
Lastly, PackInto
lets you choose another finitary type whose instances you want to 'borrow', and will use that type as a representation. This is the most flexible, and should be preferred whenever possible. However, it requires that a type of appropriate cardinality (at least as big as the one you want to encode) exists, and has the appropriate instances.
Why can't I DerivingVia
through these Pack
types?
For Unbox
, the short answer is 'role restrictions on unboxed vectors'. If you want a more detailed explanation, check out the GHC wiki on roles, as well as the implementation of Data.Vector.Unboxed
. You might also want to check out stuff about data families.
Additionally, there is some tension in the design. We could have made one of two choices: either define Pack
types as transparent newtype
s, and encode or decode whenever a type class method required it; or define Pack
types as opaque, and encode or decode only when the values were constructed or deconstructed. Ultimately, we went with the second option, as it makes the occurences of encodes and decodes explicit to the user. Had we gone with the first choice, it would be unclear where encodes and decodes occur, especially when using functions built from type class methods. We believe this clarity is worth the inability to use DerivingVia
to define Storable
instances.
Why do PackBytes
, PackWords
and PackInto
have Storable
instances, but not PackBits
?
Because it's not clear what this instance should look like. Let's suppose you want to bit-pack a type Giraffe
with cardinality 11 - what should sizeOf
for PackBits Giraffe
be? How about alignment
? The only obvious solution is padding, but in this case, you might as well use PackBytes
, PackWords
or PackInto
, since then you'll at least know what you're getting, and are explicit about it.
Sounds good! Can I use it?
Certainly - we've tested on the following (all x86_64 only):
- GNU/Linux: GHC 8.4.4, 8.6.5, 8.8.1
- macOS: GHC 8.8.1
- Windows: GHC 8.10.4, 9.0.1
If you would like support for any additional GHC versions, let us know. Unfortunately, while the library will build on 8.4.4, due to hedgehog-classes
being limited to 8.6+, tests cannot be run on this version.
If you build and use this library successfully on any other platforms, we'd like to know too - it'd be beneficial even if nothing breaks, and especially if something does.
License
This library is under the GNU General Public License, version 3 or later (SPDX code GPL-3.0-or-later
). For more details, see the LICENSE.md
file.