Description
Pulls Information from Prov.
Description
R functions to access provenance information collected by 'rdt' or 'rdtLite'. The information is stored inside a 'ProvInfo' object and can be accessed through a collection of functions that will return the requested data. The exact format of the JSON created by 'rdt' and 'rdtLite' is described in <https://github.com/End-to-end-provenance/ExtendedProvJson>.
README.md
provParseR
Parses the provenance collected by rdtLite or rdt and returns selected provenance as an R data frame.
Installation
Install from GitHub:
# install.packages("devtools")
devtools::install_github("End-to-end-provenance/provParseR")
Once installed, load the package:
library("provParseR")
Usage
The prov.parse function parses the prov.json file or string created by rdtLite or rdt and returns an R object of class ProvInfo. This object can then be queried to return a data frame containing the desired values. For example:
prov <- prov.parse("c:/prov/prov.json")
data.nodes <- get.data.nodes(prov)
creates an R object "prov" (where the path to the provenance file is "c:/prov/prov.json") and a data frame "data.nodes" that contains all data nodes in the provenance graph.
The access functions below return a data frame with the specified content. For more details, please see the help pages for provParseR.
ENVIRONMENT
# Computing environment
get.environment()
# Libraries used
get.libs()
# Provenance collection tool
get.tool.info()
# Arguments
get.args()
SCRIPTS
# Scripts executed
get.scripts()
# Location of saved scripts
get.saved.scripts()
NODES
# Procedure nodes
get.proc.nodes()
# Data nodes
get.data.nodes()
# Function nodes
get.func.nodes()
# Error nodes
get.error.nodes()
EDGES
# Procedure-to-procedure edges
get.proc.proc()
# Data-to-procedure edges
get.data.proc()
# Procedure-to-data edges
get.proc.data()
# Function-to-procedure edges
get.func.proc()
# Function-library edges
get.func.lib()
INPUTS/OUTPUTS
# Files read
get.input.files()
# Files written
get.output.files()
# URLs read
get.urls()
# Standard output
get.stdout.nodes()
VARIABLES
# Variable data type
get.val.type()
# Variables with specified name
get.variable.named()
# Variables assigned
get.variables.set()
# Variables used
get.variables.used()
# Pre-existing variables
get.preexisting()