Description
Fitting Linear and Generalized Linear Models in "Divide and Recombine" Approach to Large Data Sets.
Description
To overcome the memory limitations for fitting linear (LM) and Generalized Linear Models (GLMs) to large data sets, this package implements the Divide and Recombine (D&R) strategy. It basically divides the entire large data set into suitable subsets manageable in size and then fits model to each subset. Finally, results from each subset are aggregated to obtain the final estimate. This package also supports fitting GLMs to data sets that cannot fit into memory and provides methods for fitting GLMs under linear regression, binomial regression, Poisson regression, and multinomial logistic regression settings. Respective models are fitted using different D&R strategies as described by: Xi, Lin, and Chen (2009) <doi:10.1109/TKDE.2008.186>, Xi, Lin and Chen (2006) <doi:10.1109/TKDE.2006.196>, Zuo and Li (2018) <doi:10.4236/ojs.2018.81003>, Karim, M.R., Islam, M.A. (2019) <doi:10.1007/978-981-13-9776-9>.
README.md
drglm
The package ‘drglm’ can be used to fit Linear and Generalized Linear Models to large as well as out of the memory data sets using “Divide and Recombine (D&R)” strategy. This is the first package in R that supports fitting Multinomial logistic regression models to large data sets in D&R method.
Installation
You can install the development version of drglm from GitHub with:
# install.packages("devtools")
devtools::install_github("NayemMH/drglm")
Example
For practical examples and coding, users are refereed to check the vignettes of the ‘drglm’ paper for better understandings.