MyNixOS website logo
Description

Tests for Detecting Irregular Digit Patterns.

Provides statistical tests and support functions for detecting irregular digit patterns in numerical data. The package includes tools for extracting digits at various locations in a number, tests for repeated values, and (Bayesian) tests of digit distributions.

R CRAN R_build_status Codecov Bugs Total

digitTests: Tests for Detecting Irregular Digit Patterns

logo

digitTests is an R package providing statistical tests for detecting irregular digit patterns. Such irregular digit patterns can be an indication of potential data manipulation or fraud. Therefore, the type of tests that the package provides can be useful in (but not limited to) the field of auditing to assess whether data have potentially been tampered with. However, please note that real data will never be perfect, and therefore caution should be used when relying on the statistical decision metrics that the package provides.

The package is also implemented with a graphical user interface in the Audit module of JASP, a free and open-source statistical software program.

Overview

For complete documentation of the digitTests package download the package manual.

  1. Installation
  2. Benchmarks
  3. Intended usage
  4. References

1. Installation

The most recently released version of digitTests can be downloaded from CRAN by running the following command in R:

install.packages('digitTests')

Alternatively, you can download the development version from GitHub using:

devtools::install_github('koenderks/digitTests')

After installation, the package can be loaded with:

library(digitTests)

2. Benchmarks

To validate the statistical results, digitTests's automated unit tests regularly verify the main output from the package against the following benchmarks:

3. Intended usage

Function: extract_digits()

The workhorse of the package is the extract_digits() function. This function takes a vector of numbers and returns the requested digits (with or without including 0's).

Full function with default arguments:

extract_digits(x, check = 'first', include.zero = FALSE)

Supported options for the check argument:

checkReturns
fistFirst digit
firsttwoFirst and second digit
beforeAll digits before the decimal separator (.)
afterAll digits after the decimal separator (.)
lasttwoLast two digits
lastLast digit

Example:

x <- c(0.00, 0.20, 1.23, 40.00, 54.04)
extract_digits(x, check = 'first', include.zero = FALSE)
# [1] NA  2  1  4  5

Functions: distr.test() & distr.btest()

The functions distr.test() and distr.btest() take a vector of numeric values, extract the requested digits, and compares the frequencies of these digits to a reference distribution. The function distr.test() performs a frequentist hypothesis test of the null hypothesis that the digits are distributed according to the reference distribution and produces a p value. The function distr.btest() performs a Bayesian hypothesis test of the null hypothesis that the digits are distributed according to the reference distribution against the alternative hypothesis (using the prior parameters specified in alpha) that the digits are not distributed according to the reference distribution and produces a Bayes factor (Kass & Raftery, 1995). The possible options for the check argument are taken over from extract_digits().

Full function with default arguments:

distr.test(x, check = 'first', reference = 'benford')
distr.btest(x, check = 'first', reference = 'benford', alpha = NULL, BF10 = TRUE, log = FALSE)

Supported options for the reference argument:

checkReturns
benfordBenford's law
uniformUniform distribution
Vector of probabilitiesCustom distribution

Example:

Benford’s law (Benford, 1938) is a principle that describes a pattern in many naturally-occurring numbers. According to Benford's law, each possible leading digit d in a naturally occurring, or non-manipulated, set of numbers occurs with a probability:

The distribution of leading digits in a data set of financial transaction values (e.g., the sinoForest data) can be extracted and tested against the expected frequencies under Benford's law using the code below.

# Frequentist hypothesis test
distr.test(sinoForest$value, check = 'first', reference = 'benford')

#
# 	Digit distribution test
#
# data:  sinoForest$value
# n = 772, X-squared = 7.6517, df = 8, p-value = 0.4682
# alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.

# Bayesian hypothesis test using default prior
distr.btest(sinoForest$value, check = 'first', reference = 'benford', BF10 = FALSE)

#
# 	Digit distribution test
#
# data:  sinoForest$value
# n = 772, BF01 = 6899678
# alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.

Function: rv.test()

The function rv.test() analyzes the frequency with which values get repeated within a set of numbers. Unlike Benford's law, and its generalizations, this approach examines the entire number at once, not only the first or last digit. For the technical details of this procedure, see Simohnsohn (2019). The possible options for the check argument are taken over from extract_digits().

Full function with default arguments:

rv.test(x, check = 'last', method = 'af', B = 2000)

Supported options for the method argument:

checkReturns
afAverage frequency
entropyEntropy

Example:

In this example we analyze a data set from a (retracted) paper that describes three experiments run in Chinese factories, where workers were nudged to use more hand-sanitizer. These data were shown to exhibited two classic markers of data tampering: impossibly similar means and the uneven distribution of last digits (Yu, Nelson, & Simohnson, 2018). We can use the rv.test() function to test if these data also contain a greater amount of repeated values than expected if the data were not tampered with.

rv.test(sanitizer$value, check = 'lasttwo', B = 5000)

#
# 	Repeated values test
#
# data:  sanitizer$value
# n = 1600, AF = 1.5225, p-value = 0.0024
# alternative hypothesis: frequencies of repeated values are greater than for random data.

4. References

  • Benford, F. (1938). The law of anomalous numbers. In Proceedings of the American Philosophical Society, 551-572. - View online
  • Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773-795. - View online
  • Simohnsohn, U. (2019, May 25). Number-Bunching: A New Tool for Forensic Data Analysis. - View online
  • Yo, F., Nelson, L., & Simonsohn, U. (2018, December 5). In Press at Psychological Science: A New 'Nudge' Supported by Implausible Data. - View online.
Metadata

Version

0.1.2

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows