MyNixOS website logo
Description

Interface for 'TensorFlow' 'TFRecord' Files with 'Apache Spark'.

A 'sparklyr' extension that enables reading and writing 'TensorFlow' TFRecord files via 'Apache Spark'.

sparktf

Travis buildstatus

Overview

sparktf is a sparklyr extension that allows writing of Spark DataFrames to TFRecord, the recommended format for persisting data to be used in training with TensorFlow.

Installation

You can install sparktf from CRAN with:

install.packages("sparktf")

You can install the development version of sparktf from GitHub with:

devtools::install_github("rstudio/sparktf")

Example

We first attach the required packages and establish a Spark connection.

library(sparktf)
library(sparklyr)
library(keras)
use_implementation("tensorflow")
library(tensorflow)
tfe_enable_eager_execution()
library(tfdatasets)

sc <- spark_connect(master = "local")

Copied a sample dataset to Spark then write it to disk via spark_write_tfrecord().

data_path <- file.path(tempdir(), "iris")
iris_tbl <- sdf_copy_to(sc, iris)

iris_tbl %>%
  ft_string_indexer_model(
    "Species", "label",
    labels = c("setosa", "versicolor", "virginica")
  ) %>%
  spark_write_tfrecord(
    path = data_path,
    write_locality = "local"
  )

We now read the saved TFRecord file and parse the contents to create a dataset object. For details, refer to the package website for tfdatasets.

dataset <- tfrecord_dataset(list.files(data_path, full.names = TRUE)) %>%
  dataset_map(function(example_proto) {
    features <- list(
      label = tf$FixedLenFeature(shape(), tf$float32),
      Sepal_Length = tf$FixedLenFeature(shape(), tf$float32),
      Sepal_Width = tf$FixedLenFeature(shape(), tf$float32),
      Petal_Length = tf$FixedLenFeature(shape(), tf$float32),
      Petal_Width = tf$FixedLenFeature(shape(), tf$float32)
    )

    features <- tf$parse_single_example(example_proto, features)
    x <- list(
      features$Sepal_Length, features$Sepal_Width,
      features$Petal_Length, features$Petal_Width
      )
    y <- tf$one_hot(tf$cast(features$label, tf$int32), 3L)
    list(x, y)
  }) %>%
  dataset_shuffle(150) %>%
  dataset_batch(16)

Now, we can define a Keras model using the keras package and fit it by feeding the dataset object defined above.

model <- keras_model_sequential() %>%
  layer_dense(32, activation = "relu", input_shape = 4) %>%
  layer_dense(3, activation = "softmax")

model %>%
  compile(loss = "categorical_crossentropy", optimizer = tf$train$AdamOptimizer())

history <- model %>%
  fit(dataset, epochs = 100, verbose = 0)

Finally, we can use the trained model to make some predictions.

new_data <- tf$constant(c(4.9, 3.2, 1.4, 0.2), shape = c(1, 4))
model(new_data)
#> tf.Tensor([[0.69612664 0.13773003 0.1661433 ]], shape=(1, 3), dtype=float32)
Metadata

Version

0.1.0

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows