Isolation Forest-Based Presence-Only Species Distribution Modeling.
itsdm
Overview
itsdm
calls isolation forest and variations such as SCiForest and EIF to model species distribution. It provides features including:
- A few functions to download environmental variables.
- Outlier tree-based suspicious environmental outliers detection.
- Isolation forest-based environmental suitability modeling.
- Non-spatial response curves of environmental variables.
- Spatial response maps of environmental variables.
- Variable importance analysis.
- Presence-only model evaluation.
- Method to convert predicted suitability to presence-absence map.
- Variable contribution analysis for the target observations.
- Method to analyze the spatial impacts of changing environment.
Installation
Install the CRAN release of itsdm
with
install.packages("itsdm")
You can install the development version of itsdm from GitHub with:
# install.packages("remotes")
remotes::install_github("LLeiSong/itsdm")
Example
This is a basic example which shows you how to solve a common problem:
library(itsdm)
library(dplyr)
library(stars)
library(ggplot2)
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = obs_type)
# Get environmental variables
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 6, 12, 15))
# Train the model
mod <- isotree_po(
obs_mode = "presence_absence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 200,
sample_size = 0.8, ndim = 2,
seed = 123L)
# Check results
## Suitability
ggplot() +
geom_stars(data = mod$prediction) +
scale_fill_viridis_c('Predicted suitability',
na.value = 'transparent') +
coord_equal() +
theme_linedraw()
## Plot independent response curves
plot(mod$independent_responses,
target_var = c('bio1', 'bio12'))
The Shapley values-based analysis can apply to external models. Here is an example to analyze impacts of the bio12 decreasing 200 mm to species distribution based on Random Forest (RF) prediction:
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
filter(usage == "train")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12)) %>%
split()
model_data <- stars::st_extract(
env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)
mod_rf <- randomForest(
occ ~ .,
data = model_data,
ntree = 200)
pfun <- function(X.model, newdata) {
# for data.frame
predict(X.model, newdata, type = "prob")[, "1"]
}
# Use a fixed value
climate_changes <- detect_envi_change(
model = mod_rf,
var_occ = model_data %>% select(-occ),
variables = env_vars,
target_var = "bio12",
bins = 20,
var_future = -200,
pfun = pfun)
Contributor
- David Cortes, helps to improve the flexibility of calling
isotree
.
We are welcome any helps! Please make a pull request or reach out to [email protected] if you want to make any contribution.
Funding
This package is part of project "Combining Spatially-explicit Simulation of Animal Movement and Earth Observation to Reconcile Agriculture and Wildlife Conservation". This project is funded by NASA FINESST program (award number: 80NSSC20K1640).