Build Machine Learning Models Like Using Python's Scikit-Learn Library in R.
SuperML
The goal of SuperML is to provide sckit-learn's fit
,predict
,transform
standard way of building machine learning models in R. It is build on top of latest r-packages which provides optimized way of training machine learning models.
Installation
You can install latest stable cran version using (recommended):
install.packages("superml")
install.packages("superml", dependencies=TRUE) # to install all dependencies at once
You can install superml from github with:
# install.packages("devtools")
devtools::install_github("saraswatmks/superml")
Description
In superml, every machine learning algorithm is called as a trainer
. Following is the list of trainers available as of today:
- LMTrainer: used to train linear, logistic, ridge, lasso models
- KNNTrainer: K-Nearest Neighbour Models
- KMeansTrainer: KMeans Model
- NBTrainer: Naive Baiyes Model
- SVMTrainer: SVM Model
- RFTrainer: Random Forest Model
- XGBTrainer: XGBoost Model
In addition, there are other useful functions to support modeling tasks such as:
- CountVectorizer: Create Bag of Words model
- TfidfVectorizer: Create TF-IDF feature model
- LabelEncoder: Convert categorical features to numeric
- GridSearchCV: For hyperparameter optimization
- RandomSearchCV: For hyperparameter optimization
- kFoldMean: Target encoding
- smoothMean: Target encoding
To compute text similarity, following functions are available:
- bm_25: Computes bm25 distance
- dot: Computes dot product between two vectors
- dotmat: Computes dot product between vector & matrix
Usage
Any machine learning model can be trained using the following steps:
data(iris)
library(superml)
# random forest
rf <- RFTrainer$new(n_estimators = 100)
rf$fit(iris, "Species")
pred <- rf$predict(iris)
Documentation
The documentation can be found here: SuperML Documentation
Contributions & Support
SuperML is my ambitious effort to help people train machine learning models in R as easily as they do in python. I encourage you to use this library, post bugs and feature suggestions in the issues above.