HMEP Multi Expression Programming – a genetic programming variant.
A multi expression programming implementation with focus on speed.
Multi Expression Programming
You say, not enough Haskell machine learning libraries?
Here is yet another one!
History
There exist many other Genetic Algorithm (GA) Haskell packages. Personally I have used simple genetic algorithm, GA, and moo for quite a long time. The last package was the most preferred, but the other two are also great.
However, when I came up with this MEP paper, to my surprise there was no MEP implementation in Haskell. Soon I realized that existing GA packages are limited, and it would be more efficient to implement MEP from scratch.
That is how this package was started. I also wish to say thank you to the authors of the moo GA library, which inspired the present hmep package.
About MEP
Multi Expression Programming is a genetic programming variant encoding multiple solutions in the same chromosome. A chromosome is a computer program. Each gene is featuring code reuse.
How MEP is different from other genetic programming (GP) methods? Consider a classical example of tree-based GP. The number of nodes to encode x^N
using a binary tree is 2N-1
. With MEP encoding, however, redundancies can be dramatically diminished so that the shortest chromosome that encodes the same expression has only N/2
nodes! That often results in significantly reduced computational costs when evaluating MEP chromosomes. Moreover, all the intermediate solutions such as x^(N/2)
, x^(N/4)
, etc. are provided by the chromosome as well.
For more details, please check http://mepx.org/papers.html and https://en.wikipedia.org/wiki/Multi_expression_programming.
MEP in open source
- By Mihai Oltean, C++
- By Mark Chenoweth, Go
- Current project, Haskell
The hmep
Features
- Works out of the box. You may use one of the elaborated examples to quickly tailor to your needs.
- Flexibility. The
hmep
package provides adjustable and composable building blocks such as selection, mutation and crossoveroperators. One is also free to use their own operators. - Versatility.
hmep
can be applied to solve regression problems with one or multiple outputs. It means, you can approximate unknown functions or solve classification tasks. The only requirement is a custom loss function.
How to build
Use Stack.
$ git clone https://github.com/masterdezign/hmep.git && cd hmep
$ stack build --install-ghc
Example 1
Now that the package is built, run the first demo to express cos^2(x)
through sin(x)
:
$ stack exec hmep-demo
Average loss in the initial population 15.268705681244962
Population 10: average loss 14.709728527360586
Population 20: average loss 13.497114190675477
Population 30: average loss 8.953185872653737
Population 40: average loss 8.953185872653737
Population 50: average loss 3.3219954564955856e-15
Interpreted expression:
v1 = sin x0
v2 = v1 * v1
result = 1 - v2
Effectively, the solution cos^2(x) = 1 - sin^2(x)
was found. Of course, MEP is a stochastic method, meaning that there is no guarantee to find the globally optimal solution.
The unknown function approximation problem can be illustrated by the following suboptimal solution for a given set of random data points (blue crosses). This example was produced by another run of the same demo, after 100 generations of 100 chromosomes. The following expression was obtained y(x) = 3*0.31248786462471034 - sin(sin^2(x))
. Interestingly, the approximating function lies symmetrically in-between the extrema of the unknown function, approximately described by the blue crosses.
Example 2
A similar example is to approximate sin(x)
using only addition and multiplication operators, i.e. with polynomials.
$ stack exec hmep-sin-approximation
The algorithm is able to automatically figure out the powers of x
. That is where MEP really shines. We calculate 30 expressions represented by each chromosome with practically no additional computational penalty. Then, we choose the best expression. In this run, we have automatically obtained a seventh degree polynomial coded by 14 genes. Pretty cool, yeah?
The result of approximation is visualized below:
Authors
This library is written and maintained by Bogdan Penkovsky.