MyNixOS website logo
Description

A Simple Data Management and Sharing Tool.

An interface to store, retrieve, search, join and share datasets, based on Elasticsearch (ES) API. As a decentralized, FAIR and collaborative search engine and database effort, it proposes a simple push/pull/search mechanism only based on ES, a tool which can be deployed on nearly any hardware. It is a high-level R-ES binding to ease data usage using 'elastic' package (S. Chamberlain (2020)) <https://docs.ropensci.org/elastic/>, extends joins from 'dplyr' package (H. Wickham et al. (2020)) <https://dplyr.tidyverse.org/> and integrates specific biological format importation with Bioconductor packages such as 'rtracklayer' (M. Lawrence and al. (2009) <doi:10.1093/bioinformatics/btp328>) <http://bioconductor.org/packages/rtracklayer>, 'Biostrings' (H. Pagès and al. (2020) <doi:10.18129/B9.bioc.Biostrings>) <http://bioconductor.org/packages/Biostrings>, and 'Rsamtools' (M. Morgan and al. (2020) <doi:10.18129/B9.bioc.Rsamtools>) <http://bioconductor.org/packages/Rsamtools>, but also a long list of more common ones with 'rio' (C-h. Chan and al. (2018)) <https://cran.r-project.org/package=rio>.

kibior: easy scientific data handling, searching and sharing with Elasticsearch

Project Status: Active Build Status

Version: 0.1.1

TL;DR

| | | |-|-| | What | kibior is a R package dedicated to ease the pain of data handling in science, and more notably with biological data. | | Where | kibior is using Elasticsearch as database and search engine. | | Who | kibior is built for data science and data manipulation, so when any data-related action or need is involved, notably sharing data. It mainly targets bioinformaticians, and more broadly, data scientists. | | When | Available now from this repository, or CRAN repository. | | Public instances | Use the $get_kibio_instance() method to connect to Kibio and access known datasets. See Kibio datasets at the end of this document for a complete list. | | Cite this package | In R session, run citation("kibior") | | Publication | coming soon. |

Main features

This package allows:

  • Pushing, pulling, joining, sharing and searching tabular data between an R session and one or multiple Elasticsearch instances/clusters.
  • Massive data query and filter with Elasticsearch engine.
  • Multiple living Elasticsearch connections to different addresses.
  • Method autocompletion in proper environments (e.g. R cli, RStudio).
  • Import and export datasets from an to files.
  • Server-side execution for most of operations (i.e. on Elasticsearch instances/clusters).

How

Install

# Get from CRAN
install.packages("kibior")

# or get the latest from Github
devtools::install_github("regisoc/kibior")

Run

# load
library(kibior)

# Get a specific instance
kc <- Kibior$new("server_or_address", port)

# Or try something bigger...
kibio <- Kibior$get_kibio_instance()
kibio$list()

Examples

Here is an extract of some of the features proposed by KibioR. See Introduction vignette for more advanced usage.

Example: push datasets

# Push data (R memory -> Elasticsearch)
dplyr::starwars %>% kc$push("sw")
dplyr::storms %>% kc$push("st")

Example: pull datasets

# Pull data with columns selection (Elasticsearch -> R memory)
kc$pull("sw", query = "homeworld:(naboo || tatooine)", 
              columns = c("name", "homeworld", "height", "mass", "species"))
# see vignette for query syntax

Example: copy datasets

# Copy dataset (Elasticsearch internal operation)
kc$copy("sw", "sw_copy")

Example: delete datasets


# Delete datasets
kc$delete("sw_copy")

Example: list, match dataset names

# List available datasets
kc$list()

# Search for index names starting with "s"
kc$match("s*")

Example: get columns names and list unique keys in values

# Get columns of all datasets starting with "s"
kc$columns("s*")

# Get unique values of a column
kc$keys("sw", "homeworld")

Example: some Elasticsearch basic statistical methods

# Count number of lines in dataset
kc$count("st")

# Count number of lines with query (name of the storm is Anita)
kc$count("st", query = "name:anita")

# Generic stats on two columns
kc$stats("sw", c("height", "mass"))

# Specific descriptive stats with query
kc$avg("sw", c("height", "mass"), query = "homeworld:naboo")

Example: join

# Inner join between:
#   1/ a Elasticsearch-based dataset with query ("sw"), 
#   2/ and a in-memory R dataset (dplyr::starwars) 
kc$inner_join("sw", dplyr::starwars, 
              left_query = "hair_color:black",
              left_columns = c("name", "mass", "height"),
              by = "name")
Metadata

Version

0.1.1

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows