MyNixOS website logo
Description

Record Linkage Toolkit.

Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities (I. Fellegi & A. Sunter (1969) <doi:10.1080/01621459.1969.10501049>, T.N. Herzog, F.J. Scheuren, & W.E. Winkler (2007), "Data Quality and Record Linkage Techniques", ISBN:978-0-387-69502-0), forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage. Focus is on memory, CPU performance and flexibility.

reclin2

reclin2 is a package for record linkage an deduplication. The package is an update to the reclin package. As the package is not backwards compatible with reclin and reclin still has some features that are not present in reclin2 it was decided to release the package under a new name.

The focus of reclin2 is on performance, memory and CPU and flexibility. To get the performance reclin2 uses data.table for most of its computations and reclin2 has the ability to spread its computations over multiple CPU cores or machines. In principle record linkage can easily be sped up using parallelization and by using multiple machines using the snow package data can be distributed over multiple machines thereby making use of the memory available on those machines.

Each record linkage project often has its own idiosyncrasies. Therefore, it is important that users are able to customise parts of the linkage process. reclin2 is designed as a kind of toolkit for record linkage. It has functions and methods for different parts of the linkage process. Users are able to mix these different functions to get a custom record linkage process. Furthermore, reclin2 uses relatively simple data structures. The core data structure is a data.table with pairs and the properties of these pairs. Therefore, users can relatively easy manipulate this data and write custom functions that manipulate this data.

Many of the features can be found in the vignettes of the package:

Metadata

Version

0.5.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows