MyNixOS website logo
Description

Literature-Based Discovery Tools for Biomedical Research.

A suite of tools for literature-based discovery in biomedical research. Provides functions for retrieving scientific articles from 'PubMed' and other NCBI databases, extracting biomedical entities (diseases, drugs, genes, etc.), building co-occurrence networks, and applying various discovery models including 'ABC', 'AnC', 'LSI', and 'BITOLA'. The package also includes visualization tools for exploring discovered connections.

LBDiscover

CRANstatus Lifecycle:experimental License: GPLv3 R-CMD-check Codecov testcoverage

Overview

LBDiscover is an R package for literature-based discovery (LBD) in biomedical research. It provides a comprehensive suite of tools for retrieving scientific articles, extracting biomedical entities, building co-occurrence networks, and applying various discovery models to uncover hidden connections in the scientific literature.

The package implements several literature-based discovery approaches including:

  • ABC model (Swanson’s discovery model)
  • AnC model (improved version with better biomedical term filtering)
  • Latent Semantic Indexing (LSI)
  • BITOLA-style approaches

LBDiscover also features powerful visualization tools for exploring discovered connections using networks, heatmaps, and interactive diagrams.

Installation

# Install from CRAN
install.packages("LBDiscover")

# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("chaoliu-cl/LBDiscover")

Key Features

LBDiscover provides a complete workflow for literature-based discovery:

  1. Data Retrieval: Query and retrieve scientific articles from PubMed and other NCBI databases
  2. Text Preprocessing: Clean and prepare text for analysis
  3. Entity Extraction: Identify biomedical entities in text (diseases, drugs, genes, etc.)
  4. Co-occurrence Analysis: Build networks of entity co-occurrences
  5. Discovery Models: Apply various discovery algorithms to find hidden connections
  6. Validation: Validate discoveries through statistical tests
  7. Visualization: Explore results through network graphs, heatmaps, and more

Quick Start Example

library(LBDiscover)

# Retrieve articles from PubMed
articles <- pubmed_search("migraine treatment", max_results = 100)

# Preprocess article text
preprocessed <- vec_preprocess(
  articles,
  text_column = "abstract",
  remove_stopwords = TRUE
)

# Extract biomedical entities
entities <- extract_entities_workflow(
  preprocessed,
  text_column = "abstract",
  entity_types = c("disease", "drug", "gene")
)

# Create co-occurrence matrix
co_matrix <- create_comat(
  entities,
  doc_id_col = "doc_id",
  entity_col = "entity",
  type_col = "entity_type"
)

# Apply the ABC model to find new connections
abc_results <- abc_model(
  co_matrix,
  a_term = "migraine",
  n_results = 50,
  scoring_method = "combined"
)

# Visualize the results
vis_abc_network(abc_results, top_n = 20)

Discovery Models

ABC Model

The ABC model is based on Swanson’s discovery paradigm. If concept A is related to concept B, and concept B is related to concept C, but A and C are not directly connected in the literature, then A may have a hidden relationship with C.

# Apply the ABC model
abc_results <- abc_model(
  co_matrix,
  a_term = "migraine",
  min_score = 0.1,
  n_results = 50
)

# Visualize as a network
vis_abc_network(abc_results)

# Or as a heatmap
vis_heatmap(abc_results)

AnC Model

The AnC model is an extension of the ABC model that uses multiple B terms to establish stronger connections between A and C.

# Apply the AnC model
anc_results <- anc_model(
  co_matrix,
  a_term = "migraine",
  n_b_terms = 5,
  min_score = 0.1
)

LSI Model

The Latent Semantic Indexing model identifies semantically related terms using dimensionality reduction techniques.

# Create term-document matrix
tdm <- create_term_document_matrix(preprocessed)

# Apply LSI model
lsi_results <- lsi_model(
  tdm,
  a_term = "migraine",
  n_factors = 100
)

Visualization

The package offers multiple visualization options:

# Network visualization
vis_abc_network(abc_results, top_n = 25)

# Heatmap of connections
vis_heatmap(abc_results, top_n = 20)

# Export interactive HTML network
export_network(abc_results, output_file = "abc_network.html")

# Export interactive chord diagram
export_chord(abc_results, output_file = "abc_chord.html")

Comprehensive Analysis

For an end-to-end analysis:

# Run comprehensive discovery analysis
discovery_results <- run_lbd(
  search_query = "migraine pathophysiology",
  a_term = "migraine",
  discovery_approaches = c("abc", "anc", "lsi"),
  include_visualizations = TRUE,
  output_file = "discovery_report.html"
)

Documentation

For more detailed documentation and examples, please see the package vignettes:

# View package vignettes
browseVignettes("LBDiscover")

Citation

If you use LBDiscover in your research, please cite:

Liu, C. (2025). LBDiscover: Literature-Based Discovery Tools for Biomedical Research. 
R package version 0.1.0. https://github.com/chaoliu-cl/LBDiscover

License

This project is licensed under the GPL-3 License - see the LICENSE file for details.

Metadata

Version

0.1.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows