MyNixOS website logo
Description

Stable Diffusion Image Generation.

Provides Stable Diffusion image generation in R using the 'ggmlR' tensor library. Supports text-to-image and image-to-image generation with multiple model versions (SD 1.x, SD 2.x, 'SDXL', Flux). Implements the full inference pipeline including CLIP text encoding, 'UNet' noise removal, and 'VAE' encoding/decoding. Unified sd_generate() entry point with automatic strategy selection (direct, tiled sampling, high-resolution fix) based on output resolution and available 'VRAM'. High-resolution generation (2K, 4K+) via tiled 'VAE' decoding, tiled diffusion sampling ('MultiDiffusion'), and classic two-pass refinement (text-to-image, then upscale with image-to-image). Multi-GPU parallel generation via sd_generate_multi_gpu(). Multi-GPU model parallelism via 'device_layout' in sd_ctx(): distribute diffusion, text encoders, and 'VAE' across separate 'Vulkan' devices. Built-in profiling (sd_profile_start(), sd_profile_summary()) for per-stage timing of text encoding, sampling, and 'VAE' decode. Interactive Shiny GUI via sd_app() with non-blocking asynchronous generation (C++ std::thread), live progress bar, auto-detection of model architecture, and ETA display. Supports CPU and 'Vulkan' GPU. No 'Python' or external API dependencies required. Cross-platform: Linux, macOS, Windows.

sd2R

R-hub check on the R Consortium cluster

sd2R is an R package that provides a native, GPU-accelerated Stable Diffusion pipeline by wrapping the C++ implementation from stable-diffusion.cpp and using ggmlR as the tensor backend.

Overview

sd2R exposes a high-level R interface for text-to-image and image-to-image generation, while all heavy computation (tokenization, encoders, denoiser, sampler, VAE, model loading) is implemented in C++. Supports SD 1.x, SD 2.x, SDXL, and Flux model families. Targets local inference on Linux with Vulkan-enabled AMD GPUs (with automatic CPU fallback via ggml), without relying on external Python or web APIs.

Architecture

Flux without Python:

R  →  sd2R  →  ggmlR  →  ggml  →  Vulkan  →  GPU
  • C++ core (src/sd/): tokenizers, text encoders (CLIP, Mistral, Qwen, UMT5), diffusion UNet/MMDiT denoiser, samplers, VAE encoder/decoder, and model loading for .safetensors and .gguf weights.
  • R layer: user-facing pipeline functions, parameter validation, image helpers, testing, and documentation-friendly API.
  • Backend: links against ggmlR (headers via LinkingTo) and libggml.a, reusing the same GGML/Vulkan stack that also powers llamaR and other ggmlR-based packages.

Key Features

  • Unified sd_generate() — single entry point for all generation modes. Automatically selects the optimal strategy (direct, tiled sampling, or highres fix) based on output resolution and available VRAM (vram_gb parameter in sd_ctx()). Users don't need to think about tiling at all.
  • CRAN-ready defaults: verbose = FALSE by default — no console output unless explicitly enabled. Cross-platform build system with configure/configure.win generating Makevars from templates.
  • VRAM-aware auto-routing: queries free GPU memory at runtime and routes to direct generation (fits in VRAM), highres fix (txt2img + upscale + tiled img2img, preferred for coherent large images), or tiled sampling (MultiDiffusion fallback). VAE tiling is also VRAM-aware — enabled automatically only when free memory is insufficient for the given resolution. Set vram_gb in sd_ctx() to override auto-detection.
  • Multi-GPU data parallelism: sd_generate_multi_gpu() distributes prompts across Vulkan GPUs via callr, one process per GPU, with progress reporting.
  • Multi-GPU model parallelism: device_layout parameter in sd_ctx() distributes sub-models across multiple Vulkan GPUs within a single process. Presets: "mono" (all on one GPU), "split_encoders" (CLIP/T5 on GPU 1, diffusion + VAE on GPU 0), "split_vae" (CLIP/T5 + VAE on GPU 1, diffusion on GPU 0), "encoders_cpu" (text encoders on CPU). Manual override via diffusion_gpu, clip_gpu, vae_gpu.
  • Profiling: built-in per-stage timing via sd_profile_start() / sd_profile_stop() / sd_profile_summary(). Tracks model loading, text encoding (with CLIP/T5 breakdown), sampling, and VAE decode/encode stages.
  • Text-to-image generation supporting Stable Diffusion 1.x, 2.x, SDXL, and Flux models with typical generations taking a few seconds on Vulkan-enabled GPUs.
  • Image-to-image workflows with noise strength control and reuse of the same denoising pipeline as text-to-image. Requires vae_decode_only = FALSE in context.
  • Optional upscaling using a dedicated upscaler context managed entirely in C++ and exposed to R through external pointers.
  • VRAM-aware Tiled VAE for high-resolution images (2K, 4K+) with bounded VRAM usage. vae_mode = "auto" (default) queries free GPU memory before VAE decode and enables tiling only when estimated peak usage exceeds available VRAM (with a 50 MB safety reserve). Falls back to a pixel-area threshold (vae_auto_threshold) when Vulkan memory query is unavailable (CPU backend, no GPU). Supports per-axis relative tile sizing (vae_tile_rel_x, vae_tile_rel_y) for non-square aspect ratios.
  • Tiled diffusion sampling (MultiDiffusion): at each denoising step the latent is split into overlapping tiles, each denoised independently, and merged with Gaussian weighting. VRAM usage scales with tile size, not output resolution.
  • Highres Fix: classic two-pass pipeline — generates base image at native model resolution, upscales (bilinear or ESRGAN), then refines with tiled img2img at low denoising strength. Produces coherent high-resolution images (2K, 4K+) with global composition preserved.
  • Image utilities in R: saving generated images to PNG, converting between internal tensors and R raw vectors, and simple inspection of output tensors.
  • System introspection via sd_system_info(), reporting GGML/Vulkan capabilities as detected by ggmlR at build time.
  • Pipeline graph API: sd_pipeline() + sd_node() for composable, sequential multi-step workflows (txt2img → upscale → img2img → save). Pipelines are serializable to JSON via sd_save_pipeline() / sd_load_pipeline().
  • Shiny GUI: sd_app() launches an interactive web interface with auto-detection of model architecture, non-blocking async generation (C++ std::thread), live progress bar with ETA, and automatic role assignment for multi-file models (Flux, SD3).

Shiny GUI

Launch an interactive web interface for image generation:

sd_app(model_dir = "/path/to/models")

Features:

  • Auto-detects model architecture (Flux, SD3, SDXL, SD1/2) and assigns component roles (diffusion, VAE, CLIP, T5)
  • Non-blocking generation with live progress bar and ETA
  • Prevents incompatible model combinations

Pipeline Example

pipe <- sd_pipeline(
  sd_node("txt2img", prompt = "a cat in space", width = 512, height = 512),
  sd_node("upscale", factor = 2),
  sd_node("img2img", strength = 0.3),
  sd_node("save", path = "output.png")
)

# Save / load as JSON
sd_save_pipeline(pipe, "my_pipeline.json")
pipe <- sd_load_pipeline("my_pipeline.json")

# Run
ctx <- sd_ctx("model.safetensors")
sd_run_pipeline(pipe, ctx, upscaler_ctx = upscaler)

Implementation Details

  • Rcpp bindings: src/sd2R_interface.cpp defines a thin bridge between R and the C API in stable-diffusion.h, returning XPtr objects with custom finalizers for correct lifetime management of sd_ctx_t and upscaler_ctx_t.
  • Build system: configure / configure.win generate Makevars from .in templates, resolving ggmlR paths, OpenMP, and Vulkan at configure time. Per-target -include r_ggml_compat.h applied only to sd/*.cpp sources to avoid macro conflicts with system headers.
  • Package metadata: DESCRIPTION declares Rcpp and ggmlR in LinkingTo, and NAMESPACE is generated via roxygen2 with useDynLib and Rcpp imports.
  • On load: .onLoad() initializes logging and registers constant values that mirror the underlying C++ enums using 0-based indices.

CRAN Readiness

  • verbose = FALSE by default — no output unless requested.
  • Per-target compiler flags for cross-platform compatibility (Linux, macOS, Windows).
  • All C++ warnings fixed (-Winconsistent-missing-override, deprecated codecvt).
  • Large tokenizer vocabularies (CLIP, Mistral, Qwen, UMT5) downloaded automatically during installation from GitHub Releases, keeping the source tarball small.

Installation

# Install ggmlR first (if not already installed)
remotes::install_github("Zabis13/ggmlR")

# Install sd2R
remotes::install_github("Zabis13/sd2R")

During installation, the configure script automatically downloads tokenizer vocabulary files (~128 MB total) from GitHub Releases. This requires curl or wget.

Offline / Manual Installation

If you don't have internet access during installation, download the vocabulary files manually and place them into src/sd/ before building:

# Download from https://github.com/Zabis13/sd2R/releases/tag/assets
# Files: vocab.hpp, vocab_mistral.hpp, vocab_qwen.hpp, vocab_umt5.hpp

wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_mistral.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_qwen.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_umt5.hpp -P src/sd/

R CMD INSTALL .

System Requirements

  • R ≥ 4.1.0, C++17 compiler
  • curl or wget (for downloading vocabulary files during installation)
  • Optional GPU: libvulkan-dev + glslc (Linux) or Vulkan SDK (Windows)
  • Platforms: Linux, macOS, Windows (x86-64, ARM64)

Benchmarks

FLUX.1-dev Q4_K_S — 10 steps

CLIP-L + T5-XXL text encoders, VAE. sample_steps = 10.

TestAMD RX 9070 (16 GB)Tesla P100 (16 GB)2x Tesla T4 (16 GB)
1. 768x768 direct44.2 s94.0 s133.1 s
2. 1024x1024 tiled VAE163.6 s151.4 s243.6 s
3. 2048x1024 highres fix309.7 s312.5 s492.2 s
4. img2img 768x768 direct29.6 s51.0 s73.5 s
5. 1024x1024 direct163.0 s152.2 s243.3 s
6. Multi-GPU 4 prompts----284.9 s (4 img)

FLUX.1-dev Q4_K_S — 25 steps

CLIP-L + T5-XXL (Q5_K_M) text encoders, VAE. sample_steps = 25.

TestAMD RX 9070 (16 GB)2x Tesla T4 (16 GB)
768x768 direct110.8 s--
1024x1024 direct--553.1 s

Model size comparison

SD 1.5Flux Q4_K_S
Diffusion params~860 MB~6.5 GB
Text encodersCLIP ~240 MBCLIP-L + T5-XXL ~3.9 GB
Sampling per step (768x768)~0.1–0.3 s~3.9 s
ArchitectureUNetMMDiT (57 blocks)

Examples

For a live, runnable demo see the Kaggle notebook: Stable Diffusion in R (ggmlR + Vulkan GPU).

See Also

  • llamaR — LLM inference in R
  • sd2R — Stable Diffusion in R
  • ggml — underlying C library

License

MIT.

Metadata

Version

0.1.9

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows