MyNixOS website logo
Description

Creates a Scientific Project Skeleton as an R Package.

Provides a template for new research projects structured as an R package-based research compendium. Everything - data, R scripts, custom functions and manuscript or reports - is contained within the same package to facilitate collaboration and promote reproducible research, following the FAIR principles.

SCIproj

CRANstatus R-CMD-check CRAN downloadstotal License:MIT Author:SaskiaOtto

An R package for the initialization and organization of a scientific project following reproducible research and FAIR principles.

Overview

SCIproj is an R package that allows users to initialize a project through its function create_proj() and manage a scientific project as an R package or a research compendium. This combines structure, where files are located, and workflow, how analyses are reproduced or replicated.

The package is built on modern reproducibility standards and guidelines such as:

Defaults

The package has some default settings to ensure reproducibility. These include:

  • renv for dependency management: locks exact package versions for full reproducibility
  • targets for pipeline-based workflow: allows for automated dependency tracking, caching, and selective re-execution
  • CITATION.cff for machine-readable citation metadata (FAIR/open science)
  • DATA_SOURCES.md for documenting data provenance (source, license, DOI)
  • CONTRIBUTING.md for contribution guidelines
  • Git version control

Project structure

your-project/
├── DESCRIPTION             # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd              # Top-level project description.
├── your-project.Rproj      # RStudio project file (if use_rproj = TRUE, default).
├── CITATION.cff            # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md         # Contribution guidelines.
├── LICENSE.md              # Full license text (optional, requires add_license).
├── NAMESPACE               # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/               # Raw data files and pre-processing scripts.
│   ├── clean_data.R        # Script template for data cleaning.
│   ├── DATA_SOURCES.md     # Data provenance: source, license, DOI, download date.
│   └── ...
│
├── data/                   # Cleaned datasets stored as .rda files.
│
├── R/                      # Custom R functions and dataset documentation.
│   ├── function_ex.R       # Template for custom functions.
│   ├── data.R              # Template for dataset documentation.
│   └── ...
│
├── analyses/               # R scripts or R Markdown/Quarto documents for analyses.
│   ├── figures/            # Generated plots.
│   └── ...
│
├── docs/                   # Publication-ready documents (article, report, presentation).
├── trash/                  # Temporary files that can be safely deleted.
│
├── _targets.R              # Pipeline definition for reproducible workflow (default).
├── renv/                   # renv library and settings (default).
├── renv.lock               # Lockfile for reproducible package versions (default).
└── Dockerfile              # Container definition for full reproducibility (optional).

Why an R package as research compendium?

  • Separation of concerns: Raw data, clean data, functions, analyses, and manuscripts each live in dedicated folders.
  • Documented data: Clean datasets are stored as .rda files with roxygen documentation.
  • Reproducible workflow: The targets pipeline tracks dependencies automatically - only re-run what changed.
  • Locked dependencies: renv ensures the exact same package versions are used everywhere.
  • Citable and FAIR: CITATION.cff makes the project machine-readable and citable, DATA_SOURCES.md documents data provenance.
  • Load and go: devtools::load_all() instantly makes all clean datasets and custom functions available.

Installation and usage

Install the released version from CRAN:

install.packages("SCIproj")

Or install the development version from GitHub:

### Using remotes
# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")

### Or better: using the new pak package
# install.packages("pak")
pak::pkg_install("saskiaotto/SCIproj")

Creating the project

library("SCIproj")
create_proj("my_research_project")

This creates a project with renv, targets, CITATION.cff, and DATA_SOURCES.md by default. By default, your working directory is set to the new project so you can start working immediately — in RStudio, Positron, VSCode, or any terminal R session.

Customize with parameters:

### Full-featured project with GitHub, CI, and ORCID
create_proj("my_research_project",
  add_license = "MIT",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-67893",
  create_github_repo = TRUE,
  ci = "gh-actions"
)

### Minimal project without workflow tools
create_proj("my_research_project",
  use_renv = FALSE,
  use_targets = FALSE
)

Parameters

ParameterDefaultDescription
data_rawTRUEAdd data-raw/ folder with templates
makefileFALSEAdd makefile.R template
testthatFALSEAdd testthat infrastructure
use_pipeFALSEAdd magrittr pipe (native |> recommended)
add_licenseNULLLicense type: "MIT", "GPL", "Apache", etc.
license_holder"Your name"License holder / project author
orcidNULLORCID iD for CITATION.cff
use_gitTRUEInitialize local git repo
create_github_repoFALSECreate GitHub repo (needs GITHUB_PAT)
ci"none"CI type: "none" or "gh-actions"
use_renvTRUEInitialize renv for dependency management
use_targetsTRUEAdd _targets.R pipeline template
use_dockerFALSEAdd Dockerfile template
use_rprojTRUECreate .Rproj file (disable for Positron/VSCode-only projects)
setwd_to_projTRUESet working directory to new project after creation
open_projFALSEOpen new project in a fresh RStudio or Positron session

Developing the project

  1. Create the project with create_proj().

  2. Edit DESCRIPTION with project metadata: title, summary, contributors (with ORCID), license, dependencies.

  3. Edit README.Rmd with project details: objectives, timeline, workflow.

  4. Document your data provenance in data-raw/DATA_SOURCES.md: source, license, download date, DOI for each dataset.

  5. Place original (raw) data in data-raw/. Use clean_data.R (or more scripts) for pre-processing. Store clean datasets with usethis::use_data().

  6. Document clean datasets using roxygen in R/ (see template data.R). For details, see Documenting data.

  7. Place custom functions in R/ with roxygen documentation. See the documentation chapter in the R Packages book.

  8. Write tests for your functions in tests/ (set testthat = TRUE in create_proj()). See Testing basics.

  9. Place analysis scripts/notebooks in analyses/. Save plots in analyses/figures/.

  10. Place final manuscripts, reports, and presentations in docs/. Use R Markdown, Quarto, or templates from rticles, thesisdown, or Quarto journal extensions.

  11. Keep dependencies in sync: usethis::use_package() for DESCRIPTION, renv::snapshot() for the lockfile.

  12. Update CITATION.cff when you archive your project or publish.

Workflow

  • Load the project: devtools::load_all() or Ctrl/Cmd + Shift + /kbd in RStudio.
  • Build documentation: devtools::document() or Ctrl/Cmd + Shift + D.
  • Run tests: devtools::test() or Ctrl/Cmd + Shift + T.
  • Run the pipeline: targets::tar_make() to execute all targets. targets::tar_visnetwork() to visualize dependencies.
  • Update lockfile: renv::snapshot() after installing or updating packages.

For a detailed introduction to targets, see the user manual.

For maximum reproducibility, consider also using Docker (use_docker = TRUE). See the Rocker Project for R-specific Docker images.

Archiving and DOI

When your project is finalized:

  1. Archive the GitHub repo to make it read-only.
  2. Get a DOI via Zenodo (integrates directly with GitHub) or another DOI Registration Agency.
  3. Update CITATION.cff with the DOI.
  4. Optionally, generate a codemeta.json with codemetar::write_codemeta() for richer metadata.

Useful resources

Guidelines and standards

R packages and tools

Research compendium concept

Credits

Metadata

Version

1.0.1

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows