MyNixOS website logo
Description

A Linear Model to 'SQL' Compiler.

This is a cross-platform linear model to 'SQL' compiler. It generates 'SQL' from linear and generalized linear models. Its interface consists of a single function, modelc(), which takes the output of lm() or glm() functions (or any object which has the same signature) and outputs a 'SQL' character vector representing the predictions on the scale of the response variable as described in Dunn & Smith (2018) <doi:10.1007/978-1-4419-0118-7> and originating in Nelder & Wedderburn (1972) <doi:10.2307/2344614>. The resultant 'SQL' can be included in a 'SELECT' statement and returns output similar to that of the glm.predict() or lm.predict() predictions, assuming numeric types are represented in the database using sufficient precision. Currently log and identity link functions are supported.

modelc

R buildstatus

modelc is an R model object to SQL compiler. It generates SQL select statements from linear and generalized linear models.

Its interface currently consists of a single function, modelc, which takes a single input, namely an lm or glm model object.

It currently supports Gaussian and gamma family distributions using log or identity link functions.

To import linear models directly to your SQL Server database, consider using Castpack, which depends on modelc.

Usage

Supposing the following data

a <- 1:10
b <- 2*1:10 + runif(1) * 1.5
c <- as.factor(1:10)
df <- data.frame(a,b,c)
formula = b ~ a + c

A vanilla linear model

linear_model <- lm(formula, data=df)
modelc(linear_model)

generates the following SQL

  0.231808555545287 + 2 * `a` + (
    CASE
      WHEN c = 2 THEN -0.00000000000000193216758587821 * c
      WHEN c = 3 THEN -0.000000000000000776180314897008 * c
      WHEN c = 4 THEN -0.000000000000000665297412768863 * c
      WHEN c = 5 THEN -0.00000000000000055441451064072 * c
      WHEN c = 6 THEN -0.000000000000000887620818362638 * c
      WHEN c = 7 THEN -0.000000000000000332648706384432 * c
      WHEN c = 8 THEN -0.00000000000000110994422395641 * c
      WHEN c = 9 THEN -0.00000000000000188723974152839 * c
      WHEN c = 10 THEN 0 * c
    END
  )

GLMs are also supported with log or identity link functions

glm_model <- glm(formula, data=df, family=Gamma(link="log"))
modelc(glm_model)
  EXP(
    0.557874070609732 + 0.244938197625494 * `a` + (
      CASE
        WHEN c = 2 THEN 0.394878990324516 * c
        WHEN c = 3 THEN 0.536977925025217 * c
        WHEN c = 4 THEN 0.570378881020516 * c
        WHEN c = 5 THEN 0.542936294999294 * c
        WHEN c = 6 THEN 0.476536561025273 * c
        WHEN c = 7 THEN 0.383038044594683 * c
        WHEN c = 8 THEN 0.269593156578649 * c
        WHEN c = 9 THEN 0.140849942185343 * c
        WHEN c = 10 THEN 0 * c
      END
    )
  )
glm_model_idlink <- glm(formula, data=df, family=Gamma(link="identity"))
modelc(glm_model_idlink)
  0.231808555545287 + 2 * `a` + (
    CASE
      WHEN c = 2 THEN 0.00000000000000139594865689472 * c
      WHEN c = 3 THEN -0.000000000000000581567338978993 * c
      WHEN c = 4 THEN -0.00000000000000111588502938831 * c
      WHEN c = 5 THEN 0.000000000000000967650035758108 * c
      WHEN c = 6 THEN -0.00000000000000149265067586469 * c
      WHEN c = 7 THEN -0.000000000000000100985345060517 * c
      WHEN c = 8 THEN -0.0000000000000000673235633736781 * c
      WHEN c = 9 THEN 0.00000000000000199047558220559 * c
      WHEN c = 10 THEN 0 * c
    END
  )

In order to avoid generating invalid SQL, modelc temporarily sets your scipen option to 999.

Installing

Using devtools:

install.packages("devtools")
install.packages("remotes")
remotes::install_github("sparkfish/modelc")

Precision

Note that you may encounter minor differences between the output of your R and generated SQL models depending on the precision with which your numeric types are represented in the database. To ensure parity between the two models, numeric types should have a precision of at least 17.

Tests

Tests are written using testthat. To run them, simply do

devtools::test()
Metadata

Version

1.0.0.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows