Recursive Arbitrary instances without headaches.
Write and derive Arbitrary instances for recursive data without worrying about termination.
safe-gen
Writing Arbitrary
instances is for recursive data is annoying, since it's easy to accidentally write non-terminating generators. A common pattern is to use Gen
's implicit size parameter to guide recursion, but this is tedious and error-prone. safe-gen
automates this process, making recursive instances painless. safe-gen
can be used both to manually write instances or to automatically derive them.
Example
Here's an example of an Arbitrary
instance that runs in unbounded time:
data Trie a
= Leaf a
| Branch (Trie a) (Trie a) (Trie a)
instance Arbitrary a => Arbitrary (Trie a) where
arbitrary = oneof
[ Leaf <$> arbitrary
, Branch <$> arbitrary <*> arbitrary <*> arbitrary
]
And here's how to write it manually using safe-gen
:
instance Arbitrary a => Arbitrary (Trie a) where
arbitrary = runSafeGen go
where go = Safe.oneof
[ Leaf <$> gen arbitrary,
Branch <$> go <*> go <*> go
]
Or, derive it automatically:
data Trie a
= Leaf a
| Branch (Trie a) (Trie a) (Trie a)
deriving stock Generic
deriving anyclass SafeArbitrary
deriving Arbitrary via FromSafeArbitrary (Trie a)
Usage
safe-gen
has a very small API. The two code samples above show it off almost all there is to it.
To manually build SafeGen
s, use gen :: Gen a -> SafeGen a
to construct a single value, compose product types using the Applicative
interface, and sum types using frequency
or oneof
. To run, use runSafeGen :: SafeGen a -> Gen a
. arb
is a convenient synonym for gen arbitrary
.
To derive Arbitrary
instances, you need to first derive a SafeArbitrary
instance, typically as an empty instance declaration. The FromSafeArbitrary
wrapper then provides a convenient way to derive an Arbitrary
instance from the SafeArbitrary
instance.
Design
As mentioned in the intro, a common pattern is to use Gen
's implicit size parameter to guide recursion, but this is tedious and error-prone. safe-gen
automates this by
- automatically dividing the size parameter between the branches of a product type, and
- only going down sufficiently shallow branches of a sum type.
This guarantees that if your generator can terminate in finite time, safe-gen
makes sure that it does.
The inverse is not true; (despite the name) safe-gen
does not make it impossible to write bottom/unbounded/diverging Gen
values. For example, nothing prevents you from writing something like runSafeGen (let go = go in go)
, which will cause an infinite loop. In some cases, such as runSafeGen (let f = oneof [f] in f)
, safe-gen
will be able to catch the infinite recursion and throw an exception, and in some cases it will give you a lazy infinite generator, but in general, (and unlike generic-arbitrary
, about which more below), these issues are left undefined and outside the scope of the library.
Implementation details
A SafeGen
value is a lazy, potentially infinite n-ary tree, whose branches are product or sum types, and leaves are Gen
values.
For every node in the tree, we define a shallowness value, or minimum depth, which is the smallest number of child nodes to traverse to find a leaf. More precisely, for sum types the shallowness is one plus the shallowness of its shallowest child, for product types it is one plus the shallowness of its least shallow child, and for leaves it is zero. Since trees are potentially infinite, we take care to calculate this value lazily as a Peano number.
When we "run" this tree, i.e. go from SafeGen
to Gen
, we start by asking for Gen
's size parameter. Then, with this size parameter in hand, we traverse the tree as follows: For sum branches, we only consider children whose shallowness is at most the size parameter, or if none exist, the shallowest branch(es). For product branches, we divide the size between the number of children, and recurse.
Comparison with other packages
safe-gen
exists in a similar space as generic-arbitrary
. Both make writing Arbitrary
instances less error-prone. The philosophical difference is that generic-arbitrary
does most of its heavy lifting upfront and at the type level, whereas safe-gen
works entirely at the value level.
The two benefits of generic-arbitrary
are that it can provide pretty good guarantees at compile time, and that it only requires a single derived instance. Its main drawback is that there are cases where it can't derive (useful) instances, and it won't always be able to warn you about that. Examples include when you have mutual recursion, tricky fixpoints, or invariants to maintain. safe-gen
won't be able to preclude certain known-bad generators the way generic-arbitrary
can, but on the flip side, it will allow certain valid generators that generic-arbitrary
doesn't, and it makes it easy to dive under the hood to tweak instances where necessary. The choice is yours, and the buy-in for either is generally low enough that you can't go wrong either way.
There is also less-arbitrary
. It attempts to address some of generic-arbitrary
's shortcomings by adding a heuristic based retry for generators that seem to loop and, like safe-gen
, provides a way to manually tweak generators. My main criticism is that, compared to safe-gen
, its API is complicated, and it's still pretty easy to shoot yourself in the foot and write an unbounded generator. By comparison, with safe-gen
, the naive implementation is almost always correct.