Creates a Scientific Project Skeleton as an R Package.
SCIproj 
An R package for the initialization and organization of a scientific project following reproducible research and FAIR principles.
Overview
SCIproj is an R package that allows users to initialize a project through its function create_proj() and manage a scientific project as an R package or a research compendium. This combines structure, where files are located, and workflow, how analyses are reproduced or replicated.
The package is built on modern reproducibility standards and guidelines such as:
- FAIR Principles - Findable, Accessible, Interoperable, Reusable research outputs
- rOpenSci Reproducibility Guide - Best practices for reproducible research in R
- TIER Protocol - Standards for transparent and reproducible data analysis
- NASA TOPS - Open science and reproducibility guidelines
Defaults
The package has some default settings to ensure reproducibility. These include:
- renv for dependency management: locks exact package versions for full reproducibility
- targets for pipeline-based workflow: allows for automated dependency tracking, caching, and selective re-execution
- CITATION.cff for machine-readable citation metadata (FAIR/open science)
- DATA_SOURCES.md for documenting data provenance (source, license, DOI)
- CONTRIBUTING.md for contribution guidelines
- Git version control
Project structure
your-project/
├── DESCRIPTION # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd # Top-level project description.
├── your-project.Rproj # RStudio project file (if use_rproj = TRUE, default).
├── CITATION.cff # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md # Contribution guidelines.
├── LICENSE.md # Full license text (optional, requires add_license).
├── NAMESPACE # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/ # Raw data files and pre-processing scripts.
│ ├── clean_data.R # Script template for data cleaning.
│ ├── DATA_SOURCES.md # Data provenance: source, license, DOI, download date.
│ └── ...
│
├── data/ # Cleaned datasets stored as .rda files.
│
├── R/ # Custom R functions and dataset documentation.
│ ├── function_ex.R # Template for custom functions.
│ ├── data.R # Template for dataset documentation.
│ └── ...
│
├── analyses/ # R scripts or R Markdown/Quarto documents for analyses.
│ ├── figures/ # Generated plots.
│ └── ...
│
├── docs/ # Publication-ready documents (article, report, presentation).
├── trash/ # Temporary files that can be safely deleted.
│
├── _targets.R # Pipeline definition for reproducible workflow (default).
├── renv/ # renv library and settings (default).
├── renv.lock # Lockfile for reproducible package versions (default).
└── Dockerfile # Container definition for full reproducibility (optional).
Why an R package as research compendium?
- Separation of concerns: Raw data, clean data, functions, analyses, and manuscripts each live in dedicated folders.
- Documented data: Clean datasets are stored as .rda files with roxygen documentation.
- Reproducible workflow: The
targetspipeline tracks dependencies automatically - only re-run what changed. - Locked dependencies:
renvensures the exact same package versions are used everywhere. - Citable and FAIR:
CITATION.cffmakes the project machine-readable and citable,DATA_SOURCES.mddocuments data provenance. - Load and go:
devtools::load_all()instantly makes all clean datasets and custom functions available.
Installation and usage
Install the released version from CRAN:
install.packages("SCIproj")
Or install the development version from GitHub:
### Using remotes
# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")
### Or better: using the new pak package
# install.packages("pak")
pak::pkg_install("saskiaotto/SCIproj")
Creating the project
library("SCIproj")
create_proj("my_research_project")
This creates a project with renv, targets, CITATION.cff, and DATA_SOURCES.md by default. By default, your working directory is set to the new project so you can start working immediately — in RStudio, Positron, VSCode, or any terminal R session.
Customize with parameters:
### Full-featured project with GitHub, CI, and ORCID
create_proj("my_research_project",
add_license = "MIT",
license_holder = "Jane Doe",
orcid = "0000-0001-2345-67893",
create_github_repo = TRUE,
ci = "gh-actions"
)
### Minimal project without workflow tools
create_proj("my_research_project",
use_renv = FALSE,
use_targets = FALSE
)
Parameters
| Parameter | Default | Description |
|---|---|---|
data_raw | TRUE | Add data-raw/ folder with templates |
makefile | FALSE | Add makefile.R template |
testthat | FALSE | Add testthat infrastructure |
use_pipe | FALSE | Add magrittr pipe (native |> recommended) |
add_license | NULL | License type: "MIT", "GPL", "Apache", etc. |
license_holder | "Your name" | License holder / project author |
orcid | NULL | ORCID iD for CITATION.cff |
use_git | TRUE | Initialize local git repo |
create_github_repo | FALSE | Create GitHub repo (needs GITHUB_PAT) |
ci | "none" | CI type: "none" or "gh-actions" |
use_renv | TRUE | Initialize renv for dependency management |
use_targets | TRUE | Add _targets.R pipeline template |
use_docker | FALSE | Add Dockerfile template |
use_rproj | TRUE | Create .Rproj file (disable for Positron/VSCode-only projects) |
setwd_to_proj | TRUE | Set working directory to new project after creation |
open_proj | FALSE | Open new project in a fresh RStudio or Positron session |
Developing the project
Create the project with
create_proj().Edit
DESCRIPTIONwith project metadata: title, summary, contributors (with ORCID), license, dependencies.Edit
README.Rmdwith project details: objectives, timeline, workflow.Document your data provenance in
data-raw/DATA_SOURCES.md: source, license, download date, DOI for each dataset.Place original (raw) data in
data-raw/. Useclean_data.R(or more scripts) for pre-processing. Store clean datasets withusethis::use_data().Document clean datasets using roxygen in
R/(see templatedata.R). For details, see Documenting data.Place custom functions in
R/with roxygen documentation. See the documentation chapter in the R Packages book.Write tests for your functions in
tests/(settestthat = TRUEincreate_proj()). See Testing basics.Place analysis scripts/notebooks in
analyses/. Save plots inanalyses/figures/.Place final manuscripts, reports, and presentations in
docs/. Use R Markdown, Quarto, or templates from rticles, thesisdown, or Quarto journal extensions.Keep dependencies in sync:
usethis::use_package()for DESCRIPTION,renv::snapshot()for the lockfile.Update
CITATION.cffwhen you archive your project or publish.
Workflow
- Load the project:
devtools::load_all()or Ctrl/Cmd + Shift + /kbd in RStudio. - Build documentation:
devtools::document()or Ctrl/Cmd + Shift + D. - Run tests:
devtools::test()or Ctrl/Cmd + Shift + T. - Run the pipeline:
targets::tar_make()to execute all targets.targets::tar_visnetwork()to visualize dependencies. - Update lockfile:
renv::snapshot()after installing or updating packages.
For a detailed introduction to targets, see the user manual.
For maximum reproducibility, consider also using Docker (use_docker = TRUE). See the Rocker Project for R-specific Docker images.
Archiving and DOI
When your project is finalized:
- Archive the GitHub repo to make it read-only.
- Get a DOI via Zenodo (integrates directly with GitHub) or another DOI Registration Agency.
- Update
CITATION.cffwith the DOI. - Optionally, generate a
codemeta.jsonwithcodemetar::write_codemeta()for richer metadata.
Useful resources
Guidelines and standards
- FAIR Principles - Findable, Accessible, Interoperable, Reusable
- TIER Protocol - Teaching Integrity in Empirical Research
- NASA TOPS Open Science - Transform to Open Science
- The Turing Way - A guide to reproducible, ethical, and collaborative data science
- The rOpenSci guide: Reproducibility in Science
R packages and tools
- R Packages (2nd edition) by Hadley Wickham and Jenny Bryan
- The targets R Package User Manual by Will Landau
- renv: Project Environments
Research compendium concept
- Gentleman, R. & Temple Lang, D. (2004): Statistical Analyses and Reproducible Research. Bioconductor Project Working Papers. Working Paper 2
- Noble, W.S. (2009): A Quick Guide to Organizing Computational Biology Projects. PLoS Comput Biol 5(7): e1000424. https://doi.org/10.1371/journal.pcbi.1000424
- Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging data analytical work reproducibly using R (and friends). The American Statistician 72(1), 80-88. https://doi.org/10.1080/00031305.2017.1375986
- Karthik Ram (2019): How To Make Your Data Analysis Notebooks More Reproducible. Presentation at the rstudio::conf(2019)
Credits
- Francisco Rodriguez-Sanchez and his template package
- Ben Marwick and his rrtools package
- The group of participants in the Reproducible Science Curriculum Workshop that created the rr-init package.