Estimate Permutation p-Values for Random Forest Importance Metrics.
rfPermute
Description
rfPermute
estimates the significance of importance metrics for a Random Forest model by permuting the response variable. It will produce null distributions of importance metrics for each predictor variable and p-values of observed importances. The package also includes several summary and visualization functions for randomForest
and rfPermute
results. See rfPermuteTutorial()
in the package for a guide on running, summarizing, and diagnosing rfPermute
and randomForest
models.
Contact
- submit suggestions and bug-reports: https://github.com/ericarcher/rfPermute/issues
- send a pull request: https://github.com/ericarcher/rfPermute/
- e-mail: [email protected]
Installation
To install the stable version from CRAN:
install.packages('rfPermute')
To install the latest version from GitHub:
# make sure you have devtools installed
if (!require('devtools')) install.packages('devtools')
# install from GitHub
devtools::install_github('EricArcher/rfPermute')
Current Functions
Variable importance p-value estimation, summary, and visualization
rfPermute
Estimate Permutation p-values for Random Forest Importance Metricsimportance
Extract rfPermute Importance Scores and p-valuesplotNull
Plot Random Forest Importance Null DistributionsplotImpPreds
Distribution of Important Variables
Random Forest model summary
summary
Summarize rfPermute and randomForest modelsconfusionMatrix
Confusion MatrixcasePredictions
Return predictions and votes for training casespctCorrect
Percent Correctly Classified
Random Forest model visualization and diagnostics
plotInbag
Distribution of sample inbag ratesplotPredictedProbs
Distribution of prediction assignment probabilitiesplotProximity
Plot Random Forest Proximity ScoresplotTrace
Trace of cumulative error rates in forestplotVotes
Vote Distribution
Miscellaneous functions
combineRP
Combine rfPermute modelsbalancedSampsize
Balanced Sample SizecleanRFdata
Clean Random Forest Input Data
Changelog
version 2.5.2 (devel)
- fixed bug in plotImportance heatmap to now properly choose top rather than bottom
n
predictors.
version 2.5.1 (CRAN)
- added
pct.correct
argument toplotTrace()
. Default is now to have y-axis as 1 - OOB error rate.
version 2.5
NOTE: v2.5 is a large redevelopment of the package. The structure of rfPermute model objects has changed make them incompatible with previous versions. Also, the name and functionality of several functions has changed to make them more consistent with one another. A tutorial (under construction) is available within the package as rfPermuteTutorial()
.
version 2.2 (on CRAN)
- moved value of OOB expected error rate to end of output vector in
exptdErrRate
- changed default of
threshold
argument inclassConfInt
andconfusionMatrix
toNULL
- added new grouping and labelling options to proximityPlot()
- added binomial test for priors in
exptdErrRate
andconfusionMatrix
version 2.1.81
- Fixed bug in
pctCorrect
- Added
casePredictions
- Updated parallel code
version 2.1.7
- Fixed bug in parallel processing code.
version 2.1.6
- Added
plotConfMat
,plotOOBtimes
,plotRFtrace
, andplotInbag
, andplotImpVarDist
visualizations. - Changed
confusionMatrix
so it will work whenrandomForest
model doesn't have a$confusion
element, like when model is result ofcombine
-ing multiple models. - Improved efficiency and stability of parallel processing code. Changed default value of
num.cores
toNULL
.
version 2.1.5
- Added
type
argument toplotVotes
to choose between area and bar charts. - Changed
plot.rfPermute
toplotNull
to avoid clashes and maintain functionality ofrandomForest::plot.randomForest
. - Changed name of
proximity.plot
toproximityPlot
,exptd.err.rate
toexptdErrRate
, andclean.rf.data
tocleanRFdata
to make camelCase naming scheme more consistent in package. - Changed
plotNull
from base graphics to ggplot2. - Added
symb.metab
data set.
version 2.1.1
- Added
n
argument toimpHeatmap
. - Added functions:
classConfInt
,confusionMatrix
,plotVotes
,pctCorrect
.
version 2.0.1
- Fixed bug in
plot.rfPermute
that was reporting the p-value incorrectly at the top of the figure. - Fixed multi-threading in
rfPermute
so it works on Windows too. - Added
impHeatmap
function. - Switched
proximity.plot
to useggplot2
graphics.
version 2.0
- Fixed bug with calculation of p-values not respecting importance measure scaling (division by standard deviations). New format of output of
rfPemute
has separate$null.dist
and$pval
elements, each with results for unscaled and scaled importance mesures. See?rfPermute
for more information. rp.importance
andplot.rfPermute
now take ascale
argument to specify whether or not importance values should be scaled by standard deviations.- If
nrep = 0
forrfPermute
, arandomForest
object is returned.
version 1.9.3
- Fixed import declarations to avoid
grid
name clashes. - Fixed logic error in
clean.rf.data
where fixed predictors were not removed. - Fixed error in use of
main
argument inplot.rp.importance
.
version 1.9.2
- Added this NEWS.md
- Added README.md
- Added
num.cores
argument torfPermute
to take advantage of multi-threading
version 1.9.1
- Added internal keyword to
calc.imp.pval
to keep it from indexing - Updated imports to match new CRAN policies.