Description

Data Base Backend for 'mlr3'.

Description

Extends the 'mlr3' package with a backend to transparently work with databases such as 'SQLite', 'DuckDB', 'MySQL', 'MariaDB', or 'PostgreSQL'. The package provides two additional backends: 'DataBackendDplyr' relies on the abstraction of package 'dbplyr' to interact with most DBMS. 'DataBackendDuckDB' operates on 'DuckDB' data bases and also on Apache Parquet files.

README.md

cran.r-project.org

mlr3db

Package website: release | dev

Extends the mlr3 package with a DataBackend to transparently work with databases. Two additional backends are currently implemented:

DataBackendDplyr: Relies internally on the abstraction of dplyr and dbplyr. This allows working on a broad range of DBMS, such as SQLite, MySQL, MariaDB, or PostgreSQL.
DataBackendDuckDB: Connector to duckdb. This includes support for Parquet files (see example below).

To construct the backends, you have to establish a connection to the DBMS yourself with the DBI package. For the serverless SQLite and DuckDB, we provide the converters as_sqlite_backend() and as_duckdb_backend().

Installation

You can install the released version of mlr3db from CRAN with:

install.packages("mlr3db")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("mlr-org/mlr3db")

Example

DataBackendDplyr

library("mlr3db")
#> Loading required package: mlr3

# Create a classification task:
task = tsk("spam")

# Convert the task backend from a in-memory backend (DataBackendDataTable)
# to an out-of-memory SQLite backend via DataBackendDplyr.
# A temporary directory is used here to store the database files.
task$backend = as_sqlite_backend(task$backend, path = tempfile())

# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> <ResampleResult> of 3 iterations
#> * Task: spam
#> * Learner: classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations

DataBackendDuckDB

library("mlr3db")

# Get an example parquet file from the package install directory:
# spam dataset (tsk("spam")) stored as parquet file
file = system.file(file.path("extdata", "spam.parquet"), package = "mlr3db")

# Create a backend on the file
backend = as_duckdb_backend(file)

# Construct classification task on the constructed backend
task = as_task_classif(backend, target = "type")

# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> <ResampleResult> of 3 iterations
#> * Task: backend
#> * Learner: classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations

r-mlr3db

mlr3db

Installation

Example

DataBackendDplyr

DataBackendDuckDB

Version

License

Status

Source

Homepage

Platforms (75)

mlr3db

Installation

Example

DataBackendDplyr

DataBackendDuckDB

Version

License

Status

Source

Homepage

Platforms75 (75)

Platforms (75)