Familiarity with Dataframes Before Data Manipulation.
Overview
It is often said that data manipulation alone takes 50-70% time of a data science project.
The duration of this never-ending activity can be attributed to our opaqueness with datasets provided.
Functions in this package enable familiarity with the data.frame to further reduce coding errors and re-work.
Installation
# The easiest way to get dplyr is to install from GitHub:
install.packages("dataframeexplorer", dependencies = T)
# Alternatively, you can install development version:
install.packages("devtools")
devtools::install_github("ashrithssreddy/dataframeexplorer")
Next Steps
Functions: [x] Percentiles [x] Level of dataset Univariate Analysis Bivariate Analysis Show progress bar for level_of_dataset Run the level_of_dataset code in parallel for performance
Changes: Return value for all functions to be included into documentationMessage not printed in all codesDefault filename not consistent Outputs not refined Pep 8 formatting examples not consistent Comments not consistent across all codes sink() to be run in glimpse_to_file upon an error Arguement format to be used: dataset = dataset, output_filename = "dataset_glimpse.txt" Throw a warning when duplicate column names are found Level: Unsink when interruptedAdd instructions to interpretation of output
Usage
1. glimpse_to_file
glimpse_to_file(mtcars, "mtcars_glimpse.txt") or glimpse_to_file(mtcars, "C://Users/Desktop/mtcars_glimpse.txt")
![Output](/man/figures/.png)
Contact
Mail [email protected] for suggestions with "dataframeexplorer" in subject line.