MyNixOS website logo
Description

Please see the README on GitHub at https://github.com/codedownio/myers-diff#readme.

Please see the README on GitHub at https://github.com/codedownio/myers-diff#readme

Welcome to myers-diffHackage myers-diff

This is a fast Haskell implementation of the Myers text diff algorithm[^1]. It is heavily inspired by the Python version in this post, and should have the same $O(\min(len(a), len(b)))$ space complexity. The implementation uses unboxed mutable vectors for performance.

This repo also can also build a couple other versions for benchmarking comparison, gated behind flags.

  • -funi_myers will build the version from the uni-util package.
  • -fdiff will use the Diff package.

Comparison to other libraries

The Diff package also implements the Myers algorithm, but a less space-efficient variant. That package advertises $O(D^2)$ space complexity, where $D$ is the number of differences between the two inputs. In the worst case, $D = \max(len(a), len(b))$, so the space usage can be quadratic in the input length.

[^1]: E. Myers (1986). "An O(ND) Difference Algorithm and Its Variations". Algorithmica. 1 (2): 251–266. CiteSeerX 10.1.1.4.6927. doi:10.1007/BF01840446. S2CID 6996809.

Benchmarks

You can generate all the benchmarks by running ./run_all_benchmarks.sh. Full results can be found in ./benchmark_results. All benchmarks were run on an Intel i9-13900K.

TL;DR:

  • These benchmarks focus on inputs of different sizes, where a single ~30 character region is either inserted or deleted.
  • myers-diff is faster by around 2.5x, and the advantage grows with larger inputs (around 100k characters).
  • myers-diff is more space-efficient by 5x for tiny inputs, shrinking to 1.5x for 10k character inputs and 1.3x for 100k character inputs.

Other benchmarks could be run, of course. Future work could involve testing inputs with multiple separated edits, and/or edits of different sizes. Please file an issue if you'd like to discuss a particular workload.

Time: small inserts

Test scenario: generate a random input of $N$ characters, then insert a random string of $\leq 30$ characters somewhere to produce a modified input. Generate 100 such pairs and compare the diffing time of myers-diff with Diff using criterion.

Input size (chars)myers-diffDiffSpeedup
10408us1.07ms2.6x
100587us1.53ms2.6x
10001.81ms3.46ms1.9x
1000016.6ms40.8ms2.5x
100000188ms823ms4.4x

Time: small deletes

Test scenario: same as for small inserts, but this time delete $\leq 30$ characters from the second input.

These results are very similar to those for small inserts.

Space: small inserts

Test scenario: same as for the small inserts time test. In this test, we measure the bytes allocated by the diffing process using weigh.

Input size (chars)myers-diff (bytes)Diff (bytes)Diff / myers-diff
101,681,1208,904,1765.3x
1003,619,92813,833,5203.8x
100020,044,04831,669,4801.6x
10000171,103,240250,594,0801.5x
1000001,666,421,8242,172,753,5121.3x.
Metadata

Version

0.3.0.0

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows