MyNixOS website logo
Description

Text Mining and Topic Modeling for Sport Science Literature.

A comprehensive toolkit for mining, analyzing, and visualizing scientific literature in sport science domains. Provides functions for retrieving abstracts from 'Scopus', preprocessing text data, performing advanced topic modeling using Latent Dirichlet Allocation ('LDA'), Structural Topic Models ('STM'), and Correlated Topic Models ('CTM'), and creating publication-ready visualizations including keyword co-occurrence networks and topic trends. For methodological details see Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993> for 'LDA', Roberts et al. (2014) <doi:10.1111/ajps.12103> for 'STM', and Blei and Lafferty (2007) <doi:10.1214/07-AOAS114> for 'CTM'.

SportMiner

CRAN status

Overview

SportMiner is a comprehensive toolkit for mining, analyzing, and visualizing scientific literature in sport science domains. It provides an end-to-end workflow from data retrieval to publication-ready visualizations.

Key Features

  • πŸ” Data Retrieval: Seamlessly search and download abstracts from Scopus
  • πŸ“ Text Processing: Advanced preprocessing with stemming and stopword removal
  • 🎯 Topic Modeling: Multiple algorithms (LDA, STM, CTM) with automated model selection
  • πŸ“Š Visualizations: Publication-ready plots with a custom colorblind-friendly theme
  • πŸ•ΈοΈ Network Analysis: Keyword co-occurrence networks to reveal research connections
  • βœ… CRAN Compliant: Rigorous testing, proper API key handling, and offline-safe tests

Installation

# From CRAN
install.packages("SportMiner")

# Development version from GitHub
devtools::install_github("praveenmaths89/SportMiner", subdir = "SportMiner")

Quick Start

library(SportMiner)

# 1. Set your Scopus API key
sm_set_api_key("your_key_here")

# 2. Search for papers
papers <- sm_search_scopus(
  query = 'TITLE-ABS-KEY("sport science" AND "machine learning")',
  max_count = 100
)

# 3. Preprocess text
processed <- sm_preprocess_text(papers)

# 4. Create document-term matrix
dtm <- sm_create_dtm(processed)

# 5. Find optimal number of topics
k_selection <- sm_select_optimal_k(dtm, k_range = seq(5, 20, by = 5))

# 6. Train topic model
lda_model <- sm_train_lda(dtm, k = k_selection$optimal_k)

# 7. Visualize results
sm_plot_topic_terms(lda_model, n_terms = 10)
sm_plot_topic_frequency(lda_model, dtm)

# 8. Create keyword network
sm_keyword_network(papers, min_cooccurrence = 2)

Advanced Usage

Compare Multiple Models

# Compare LDA, STM, and CTM
comparison <- sm_compare_models(dtm, k = 10)

# View metrics
print(comparison$metrics)
#>   model coherence exclusivity combined_score
#> 1   LDA     0.542       0.678          0.321
#> 2   STM     0.589       0.712          0.854
#> 3   CTM     0.521       0.645         -0.175

# Recommendation
print(comparison$recommendation)
#> [1] "STM"

Topic Trends Over Time

papers$doc_id <- paste0("doc_", seq_len(nrow(papers)))

sm_plot_topic_trends(
  model = lda_model,
  dtm = dtm,
  metadata = papers,
  year_filter = 2015:2025
)

Custom Visualizations

library(ggplot2)

# All plots use theme_sportminer() by default
p <- sm_plot_topic_frequency(lda_model, dtm)

# Customize further
p + labs(
  title = "Your Custom Title",
  subtitle = "Based on N papers"
) + theme_sportminer(base_size = 14, grid = FALSE)

Getting Your Scopus API Key

  1. Visit Elsevier Developer Portal
  2. Create an account or log in
  3. Navigate to "API Keys" and create a new key
  4. Add to your .Renviron file:
usethis::edit_r_environ()
# Add this line:
# SCOPUS_API_KEY=your_key_here

Documentation

See the package vignette for detailed usage:

vignette("getting-started", package = "SportMiner")

Design Philosophy

CRAN Compliance

SportMiner adheres to strict CRAN standards:

  • No hardcoded keys: API keys via environment variables
  • Graceful failures: All API calls wrapped in tryCatch()
  • Proper messaging: Uses message() and warning(), not cat() or print()
  • Global variables: Uses .data pronoun from rlang to avoid R CMD check NOTEs
  • Offline tests: Mock API responses via httptest for CRAN checks

Visualization Standards

All plots use theme_sportminer(), which provides:

  • Clean, minimalist aesthetic
  • Colorblind-friendly palettes
  • Publication-ready fonts and spacing
  • Consistent styling across all functions

Bug Reports

For bug reports and feature requests, please contact the package maintainer.

Citation

If you use SportMiner in your research, please cite:

citation("SportMiner")

Related Packages

  • rscopus: Scopus API client (used by SportMiner)
  • topicmodels: LDA and CTM algorithms
  • stm: Structural Topic Models
  • tidytext: Text mining framework

License

MIT Β© 2026 Praveen D Chougale and Usha Ananthakumar

Acknowledgments

This package builds on the excellent work of:

  • The rscopus team for Scopus API access
  • The topicmodels and stm developers for topic modeling algorithms
  • The tidytext team for text mining infrastructure.
Metadata

Version

0.1.0

License

Unknown

PlatformsΒ (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows