MyNixOS website logo
Description

Retrieval-Augmented Generation (RAG) Workflows in R with Local and Web Search.

Enables Retrieval-Augmented Generation (RAG) workflows in R by combining local vector search using 'DuckDB' with optional web search via the 'Tavily' API. Supports 'OpenAI'- and 'Ollama'-compatible embedding models, full-text and 'HNSW' (Hierarchical Navigable Small World) indexing, and modular large language model (LLM) invocation. Designed for advanced question-answering, chat-based applications, and production-ready AI pipelines. This package is the R equivalent of the 'python' package 'RAGFlowChain' available at <https://pypi.org/project/RAGFlowChain/>.

RAGFlowChainR

License:MIT CRANstatus TotalDownloads Codecov testcoverage LastCommit Issues

Overview

RAGFlowChainR is an R package that brings Retrieval-Augmented Generation (RAG) capabilities to R, inspired by LangChain. It enables intelligent retrieval of documents from a local vector store (DuckDB), optional web search, and seamless integration with Large Language Models (LLMs).

Features include:

  • ๐Ÿ“‚ Ingest files and websites
  • ๐Ÿ” Semantic search using vector embeddings
  • ๐Ÿง  RAG chain execution with conversational memory and dynamic prompt construction
  • ๐Ÿ”Œ Plug-and-play with OpenAI, Ollama, Groq, and Anthropic

Python version: RAGFlowChain (PyPI)
GitHub (Python): RAGFlowChain


Installation

install.packages("RAGFlowChainR")

Development version

To get the latest features or bug fixes, you can install the development version of RAGFlowChainR from GitHub:

# If needed
install.packages("remotes")

remotes::install_github("knowusuboaky/RAGFlowChainR")

See the full function reference or the package website for more details.


๐Ÿ” Environment Setup

Sys.setenv(TAVILY_API_KEY    = "your-tavily-api-key")
Sys.setenv(OPENAI_API_KEY    = "your-openai-api-key")
Sys.setenv(GROQ_API_KEY      = "your-groq-api-key")
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")

To persist across sessions, add these to your ~/.Renviron file.


Usage

1. Data Ingestion

library(RAGFlowChainR)

local_files <- c("tests/testthat/test-data/sprint.pdf", 
                 "tests/testthat/test-data/introduction.pptx",
                 "tests/testthat/test-data/overview.txt")
website_urls <- c("https://www.r-project.org")
crawl_depth <- 1

response <- fetch_data(
  local_paths = local_files,
  website_urls = website_urls,
  crawl_depth = crawl_depth
)
response
#>                                source                                      title ...
#> 1                 documents/sprint.pdf                                       <NA> ...
#> 2          documents/introduction.pptx                                       <NA> ...
#> 3               documents/overview.txt                                       <NA> ...
#> 4            https://www.r-project.org R: The R Project for Statistical Computing ...
#> ...

cat(response$content[1])
#> Getting Started with Scrum\nCodeWithPraveen.com ...

2. Vector Store & Semantic Search

con <- create_vectorstore("tests/testthat/test-data/my_vectors.duckdb", overwrite = TRUE)

docs <- data.frame(head(response))  # reuse from fetch_data()

insert_vectors(
  con = con,
  df = docs,
  embed_fun = embed_openai(),
  chunk_chars = 12000
)

build_vector_index(con, type = c("vss", "fts"))

response <- search_vectors(con, query_text = "Tell me about R?", top_k = 5)
response
#>    id page_content                                                dist
#> 1   5 [Home]\nDownload\nCRAN\nR Project...\n...                0.2183
#> 2   6 [Home]\nDownload\nCRAN\nR Project...\n...                0.2183
#> ...

cat(response$page_content[1])
#> [Home]\nDownload\nCRAN\nR Project\nAbout R\nLogo\n...

3. RAG Chain Querying

rag_chain <- create_rag_chain(
  llm = call_llm,
  vector_database_directory = "tests/testthat/test-data/my_vectors.duckdb",
  method = "DuckDB",
  embedding_function = embed_openai(),
  use_web_search = FALSE
)

response <- rag_chain$invoke("Tell me about R")
response
#> $input
#> [1] "Tell me about R"
#>
#> $chat_history
#> [[1]] $role: "human", $content: "Tell me about R"
#> [[2]] $role: "assistant", $content: "R is a programming language..."
#>
#> $answer
#> [1] "R is a programming language and software environment commonly used for statistical computing and graphics..."

cat(response$answer)
#> R is a programming language and software environment commonly used for statistical computing and graphics...

LLM Support

call_llm(
  prompt = "Summarize the capital of France.",
  provider = "groq",
  model = "llama3-8b",
  temperature = 0.7,
  max_tokens = 200
)

๐Ÿ“ฆ Related Package: chatLLM

The chatLLM package (now available on CRAN ๐ŸŽ‰) offers a modular interface for interacting with LLM providers including OpenAI, Groq, Anthropic, DeepSeek, DashScope, and GitHub Models.

install.packages("chatLLM")

Features:

  • ๐Ÿ”„ Uniform API across providers
  • ๐Ÿ—ฃ Multiโ€‘message context (system/user/assistant roles)
  • ๐Ÿ” Retries & backoff with clear timeout handling
  • ๐Ÿ”ˆ Verbose control (verbose = TRUE/FALSE)
  • โš™๏ธ Discover models via list_models()
  • ๐Ÿ— Factory interface for repeated calls
  • ๐ŸŒ Custom endpoint override and advanced tuning
  • ๐Ÿ”Œ Native integration with RAGFlowChainR
  • ๐Ÿ” .Renviron-based key management

License

MIT ยฉ Kwadwo Daddy Nyame Owusu Boakye.

Metadata

Version

0.1.5

License

Unknown

Platformsย (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows