MyNixOS website logo
Description

R Toolkit for 'Databricks'.

Collection of utilities that improve using 'Databricks' from R. Primarily functions that wrap specific 'Databricks' APIs (<https://docs.databricks.com/api>), 'RStudio' connection pane support, quality of life functions to make 'Databricks' simpler to use.

brickster

CRAN status R-CMD-check Codecov test coverage

Overview

{brickster} is the R toolkit for Databricks, it includes:

Quick Start

library(brickster)

# first request will open browser window to login for U2M
Sys.setenv(DATABRICKS_HOST = "https://<workspace-prefix>.cloud.databricks.com")

# open RStudio/Positron connection pane to view Databricks resources
open_workspace()

# list all SQL warehouses
warehouses <- db_sql_warehouse_list()

Refer to the "Connect to a Databricks Workspace" article for more details on getting authentication configured.

Usage

{DBI} Backend

library(brickster)
library(DBI)

# Connect to Databricks using DBI (assumes you followed quickstart to authenticate)
con <- dbConnect(
  DatabricksSQL(),
  warehouse_id = "<warehouse-id>"
)

# Standard {DBI} operations
tables <- dbListTables(con)
dbGetQuery(con, "SELECT * FROM samples.nyctaxi.trips LIMIT 5")

# Use with {dbplyr} for {dplyr} syntax
library(dplyr)
library(dbplyr)

nyc_taxi <- tbl(con, I("samples.nyctaxi.trips"))

result <- nyc_taxi |>
  filter(year(tpep_pickup_datetime) == 2016) |>
  group_by(pickup_zip) |>
  summarise(
    trip_count = n(),
    avg_fare = mean(fare_amount, na.rm = TRUE),
    avg_distance = mean(trip_distance, na.rm = TRUE)
  ) |>
  collect()

Download & Upload to Volume

library(readr)
library(brickster)

# upload `data.csv` to a volume
local_file <- tempfile(fileext = ".csv")
write_csv(x = iris, file = local_file)
db_volume_write(
  path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
  file = local_file
)

# read `data.csv` from a volume and write to a file
downloaded_file <- tempfile(fileext = ".csv")
file <- db_volume_read(
  path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
  destination = downloaded_file
)
volume_csv <- read_csv(downloaded_file)

Databricks REPL

Run commands against an existing interactive Databricks cluster, read this article for more details.

library(brickster)

# commands after this will run on the interactive cluster
# read the vignette for more details
db_repl(cluster_id = "<interactive_cluster_id>")

Installation

install.packages("brickster")

Development Version

# install.packages("pak")
pak::pak("databrickslabs/brickster")

API Coverage

{brickster} is very deliberate with choosing what API's are wrapped. {brickster} isn't intended to replace IaC tooling (e.g. Terraform) or to be used for account/workspace administration.

APIAvailableVersion
DBFSYes2.0
SecretsYes2.0
ReposYes2.0
mlflow Model RegistryYes2.0
ClustersYes2.0
LibrariesYes2.0
WorkspaceYes2.0
EndpointsYes2.0
Query HistoryYes2.0
JobsYes2.1
Volumes (Files)Yes2.0
SQL Statement ExecutionYes2.0
REST 1.2 CommandsPartially1.2
Unity Catalog - TablesYes2.1
Unity Catalog - VolumesYes2.1
Unity CatalogPartially2.1
Metadata

Version

0.2.12

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows