MyNixOS website logo
Description

Partial Least-Squares Algorithm for Categorical and Scalar Functional Data.

Performs the Partial Least-Squares ('PLS') algorithm for functional data through the concept of active area integration. This approach builds upon the basis expansion methods for functional 'PLS' regression described in Aguilera et al. (2010) <doi:10.1016/j.chemolab.2010.09.007>. The package seamlessly handles both Scalar Functional Data ('SFD') and Categorical Functional Data ('CFD'), providing interpretable regression curves even for discrete state changes. It was developed during a PhD thesis between 'DECATHLON' and French research institute 'INRIA' 2022-2026. The 'SmoothPLS' method does not directly decompose the data into a basis; rather, it assumes the data is known as precisely as desired, and for every 'PLS' component, the weight functions are decomposed into the basis. For both single-state and multi-state 'CFD' as well as 'SFD', the algorithm is implemented for a scalar response. To provide a baseline, a naive 'PLS' method on time-value functions and standard Functional 'PLS' are also implemented.

SmoothPLS

R-CMD-check GitHub Release Lifecycle: stable License Documentation GitHub last commit

Overview

SmoothPLS is a R package designed for Hybrid Functional Data Analysis. It implements a novel approach to Functional Partial Least Squares (FPLS) by integrating categorical functional predictors through the concept of active area integration.

This work was developed as part of a PhD project at DECATHLON in collaboration with INRIA.

Key features

  • SmoothPLS: integration of categorical states as indicator functions for smoother regression curves.
  • Hybrid data: seamlessly handles both Scalar Functional Data (SFD) and Categorical Functional Data (CFD).
  • Interpretability: provides continuous regression curves.
  • Comparison suite: built-in functions to compare results with Naive (discretized) PLS and Standard Functional PLS.

Methodological background

SmoothPLS builds upon the fundamental principles of Functional PLS (FPLS) regression, specifically the approximation of functional predictors through basis expansions, as established by Aguilera, Preda et al. (2010) [1].

The primary contribution lies in modeling categorical state changes as functional indicator functions, $\mathbb{1}^s_t$. Instead of discretizing transitions, the model computes components via active area integration (illustrated in the package logo), effectively integrating basis functions over the specific intervals, $\tau_s$, where a state is active:

$$\Lambda_{s,j} = \int_{\mathcal{T}} \mathbb{1}^s_t \phi_j(t) dt = \int_{\tau_s} \phi_j(t) dt, \quad \text{with} \quad \tau_s = \lbrace t \in \mathcal{T} \mid X(t) = s\rbrace \quad \text{and} \quad \mathbb{1}_s^t = \begin{cases} 1 & \text{if } X(t) = s \ 0 & \text{otherwise} \end{cases}$$

This formulation ensures that the smoothing process respects the continuous nature of state transitions within the Functional PLS framework.


Installation

Downloads

The package is currently in development. The latest stable version can be installed via:

# install.packages("devtools")
devtools::install_github("FrancoisBassac/SmoothPLS")

Documentation

Documentation

The complete package documentation—including function references, detailed vignettes, and usage examples—is available online:

Explore the SmoothPLS documentation website


Documentation overview

  • Reference: comprehensive manual for all functions (including smoothPLS, funcPLS, and naivePLS).
  • Articles (vignettes): step-by-step tutorials, such as the comparison of PLS methods for CFD and multivariate functional data.
  • Getting started: quick installation guide and basic usage.

Quick start example

The following example demonstrates how to fit and compare models, based on the single-state CFD vignette:

library(SmoothPLS)

# 1. Generate Synthetic Data
df_x <- generate_X_df(nind = 100, curve_type = 'cat')
Y_df <- generate_Y_df(df_x, curve_type = 'cat', 
                      beta_real_func_or_list = beta_1_real_func)

# 2. Fit Smooth PLS Model
basis <- create_bspline_basis(start = 0, end = 100, nbasis = 10)
spls_model <- smoothPLS(df_list = df_x, Y = Y_df$Y_noised, 
                        basis_obj = basis, curve_type_obj = 'cat')

# 3. Predict and Visualize
preds <- smoothPLS_predict(df_x, spls_model$reg_obj, curve_type = 'cat')
plot(spls_model$reg_obj$CatFD_1_state_1, main="SmoothPLS Regression Curve")

Performance tuning: parallel processing

When parallel = TRUE, SmoothPLS utilizes the future framework to parallelize the numerical integration steps (e.g., $\Lambda$ matrix evaluation).

To mitigate computational overhead on smaller datasets, the package implements dynamic load balancing. It calculates an optimal number of background workers required for the specific task to maximize efficiency.

The default threshold is set to 2500 integral evaluations per core. The engine allocates one core for every 2500 integrals (calculated as individuals $\times$ basis functions). For instance: * Under 2,500 integrals: The model executes sequentially (1 core) to avoid setup overhead. * 5,000 integrals: The engine allocates exactly 2 cores. * Large datasets (e.g., 50,000+ integrals): The engine recruits the maximum number of available cores, reserving 2 cores to maintain operating system stability.

This threshold can be manually adjusted based on specific hardware capabilities (e.g., lowered for UNIX systems with low forking overhead) by setting a global option before model execution:

# Lower the threshold to 500 evaluations per core
options(SmoothPLS.parallel_threshold = 500)

Affiliations and applications

Industrial partners

Research institutions

  • Inria – National Institute for Research in Digital Science and Technology.
  • Inria Datavers – The research team specialized in stochastic modeling and data analysis.

Roadmap and future releases

SmoothPLS is under active development. Upcoming updates will focus on computational efficiency and the expansion of theoretical capabilities:

  • [v0.1.4] Parallel processing: implementation of multicore computing to drastically reduce integration time for large datasets (e.g., thousands of Active Areas).
  • [v0.1.6] Hybrid data framework: support for integrating standard non-functional covariates (e.g., user age, weight) alongside Categorical and Scalar Functional Data.
  • [v0.2.0] Penalized functional regression (univariate): addition of roughness penalties to the B-spline coefficients to increase model robustness.
  • [v0.2.1] Penalized functional regression (multivariate): extension of the penalized framework to the full multivariate model.

Detailed Example: One-State Categorical Functional Data

This example illustrates how SmoothPLS processes CFD by modeling transitions as functional objects. For comprehensive details, refer to the full vignette.

1. Data visualization

We simulate a categorical time series where individuals alternate between state 0 and state 1 over time.

library(SmoothPLS)

df_x <- generate_X_df(nind = 100, start = 0, end = 100, curve_type = 'cat')
plot_CFD_individuals(df_x, by_cfda = TRUE)

Figure 1: Synthetic binary state trajectories for 5 individuals.

2. Model fitting and prediction

The SmoothPLS model is fitted to a response variable with added noise, $Y$, and the resulting regression curve, $\beta(t)$, is compared against the ground truth.

# Define a B-spline basis
basis <- create_bspline_basis(start = 0, end = 100, nbasis = 10)
plot(basis)

Figure 2: Cubic B-splines basis of 10 functions.

# Generate response Y linked to the time spent in state 1
Y_df <- generate_Y_df(df_x, curve_type = 'cat', 
                      beta_real_func_or_list = beta_1_real_func)

# Fit the SmoothPLS model
spls_obj <- smoothPLS(df_list = df_x, Y = Y_df$Y_noised, 
                      basis_obj = basis, curve_type_obj = 'cat',
                      print_steps = FALSE, print_nbComp = FALSE, 
                      plot_rmsep = FALSE, plot_reg_curves = FALSE)
# Extract parameters for plotting
delta <- mod_seq$reg_obj$CatFD_1_state_1
regul_time_0 <- seq(0, 100, length.out = length(delta))

y_lim = eval_max_min_y(f_list = list(beta_real_func, 
                                     delta), 
                       regul_time = regul_time_0)

plot(regul_time_0, beta_real_func(regul_time_0), type='l', xlab="Beta_t",
     ylim = c(-2, 3.5))
plot(delta, add=TRUE, col='blue')
legend("topleft",
       legend = c("delta_SmoothPLS"),
       col = c("blue"),
       lty = 1,
       lwd = 1)

Figure 3: The blue curve (SmoothPLS) successfully recovers the underlying red dashed curve (Theoretical Beta).


References

[1] Aguilera, A. M., Escabias, M., Preda, C., & Saporta, G. (2010). "Using basis expansions for estimating functional PLS regression. Applications with chemometric data". Chemometrics and Intelligent Laboratory Systems, 104(2), 289-305. https://doi.org/10.1016/j.chemolab.2010.09.007


Visitors

Metadata

Version

0.1.5

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows