Description
Yet Another 'ARFF' Reader.
Description
A parser and a writer for 'WEKA' Attribute-Relation File Format <https://waikato.github.io/weka-wiki/arff_stable/> in pure R, with no dependencies. As opposed to other R implementations, this package can read standard (dense) as well as sparse files, i.e. those where each row does only contain nonzero components. Unlike 'RWeka', 'yarr' does not require any 'Java' installation nor is dependent on external software. This implementation is generalized from those in packages 'mldr' and 'mldr.datasets'.
README.md
yarr
Yet Another ARFF Reader
So you need to read an ARFF file. You can use:
foreign::read.arff
, but only on dense format.farff::readARFF
, but only on dense format.RWeka::read.arff
, but you need to configure Java.
yarr::read.arff
can read dense and sparse ARFF files, and it's implemented in pure R.
The implementations in R are derivatives of those in mldr
and mldr.datasets
, packages for management of multilabel learning datasets.
Usage
remotes::install_github("fdavidcl/yarr")
library(yarr)
download.file("https://www.openml.org/data/download/1681111/phpEUwA95", "dexter.arff")
dexter <- read.arff("dexter.arff")
dexter
# An ARFF dataset: dexter
# 20001 attributes and 600 instances
# V0: numeric (min = 0, mean = 0.12, max = 72) 0, 0, 0, 0, 0...
# V1: numeric (min = 0, mean = 0, max = 0) 0, 0, 0, 0, 0...
# V2: numeric (min = 0, mean = 0, max = 0) 0, 0, 0, 0, 0...
# V3: numeric (min = 0, mean = 1.165, max = 427) 0, 0, 0, 0, 0...
# V4: numeric (min = 0, mean = 0.57, max = 342) 0, 0, 0, 0, 0...
# V5: numeric (min = 0, mean = 0.386666666666667, max = 116) 0, 0, 0, 0, 0...
# V6: numeric (min = 0, mean = 0, max = 0) 0, 0, 0, 0, 0...
# V7: numeric (min = 0, mean = 0.448333333333333, max = 121) 0, 0, 0, 0, 0...
# V8: numeric (min = 0, mean = 0.358333333333333, max = 84) 0, 0, 0, 0, 0...
# ...
# class: {-1,1} (-1: 300, 1: 300) 1, 1, 1, -1, 1...
relation(dexter)
# [1] "dexter"
write.arff(dexter, "dexter-new.arff")
Supported features
- Dense format
- Sparse format (
{index value}
, e.g.{0 1, 20 1}
) - Missing values with
?
- MEKA parameter
c
in@relation
.
Unsupported features
- Date attributes are read as mere strings
- Relational attributes are not supported
- Instance weights.