MyNixOS website logo
Description

Metabolic Pathway Completeness and Abundance Calculation.

Provides tools for analyzing metabolic pathway completeness, abundance, and transcripts using KEGG Orthology (KO) data from (meta)genomic and (meta)transcriptomic studies. Supports both completeness (presence/absence) and abundance-weighted analyses. Includes built-in KEGG reference datasets. For more details see Li et al. (2023) <doi:10.1038/s41467-023-42193-7>.

GitHub Installs CRAN Downloads

mclink: Metabolic Pathway Completeness and Abundance Analysis

Overview

mclink provides comprehensive tools for analyzing metabolic pathway completeness and abundance using KEGG Orthology (KO) data from (meta)genomic and (meta)transcriptomic studies. Key features include:

  • Dual analysis modes: Completeness (presence/absence) and abundance-weighted scoring
  • Flexible input: Works with built-in KEGG references or custom datasets
  • Smart KO handling: Specialized methods for plus-separated (subunits) and comma-separated (isoforms) KOs
  • Publication-ready outputs: Pathway coverage metrics and detailed KO detection reports

Description

The distill analysis of metabolic pathway coverage is calculated based on the abundance or presence of KOs in a given module, as per the KEGG Module Definition. In detail, the coverage of a KEGG Module is determined by first dividing a set of KOs into distinct steps. The coverage for each step is then calculated separately, and summarized as the coverage of this KEGG Module. When calculating coverage, spaces or plus signs connecting KO numbers are interpreted as AND operators, while commas are interpreted as OR operators. For instance, to calculate the completeness for the module M00020: “K00058 K00831 (K01079, K02203, K22305)”:

  • 1: Convert the abundance table of KOs into a 0-1 matrix.
  • 2: Consider K00058 as step 1, K00831 as step 2, and (K01079, K02203, K22305) as step 3.
  • 3: Calculate the maximum (or minimum, mean) value of step 3 (K01079, K02203, K22305). If presence of any KO indicates completeness, calculate the maximum value; If all KOs must be resent for completeness, calculate the minimum value. For a moderate approach, calculate the mean value.
  • 4: Use this value along with the values of step 1 and step 2 to calculate the mean value, representing the completeness of the module M00020.

Key Features

Analysis Modes

  • Completeness analysis: Binary presence/absence scoring
  • Abundance analysis: Weighted by KO abundance levels

Specialized KO Handling: Plus-separated KOs (K1+K2+...)

Scaling MethodDescription
mean (default)Moderate approach - calculates average value of all components
min (conservative)Strict requirement - uses lowest value (all components must be present)
max (liberal)Lenient approach - uses highest value (any component indicates completeness)

Typical Use Cases: Protein complexes, Enzyme subunits

Specialized KO Handling: Comma-separated KOs (K1,K2,...)

Scaling MethodDescription
max (default)Completeness assessment - any component indicates functional pathway
sum (abundance analysis)Abundance quantification - sums all functionally equivalent variants

Typical Use Cases: Gene isoforms, Alternative pathways

Output Options

  • R list: Coverage and KO detection results as dataframes, along with a log string recording the analysis process.
  • File exports: Coverage and KO detection results in TSV format, along with a log file recording the analysis process.
  • Pathway-split exports: Optional per-pathway TSV files (if enabled).

Installation

# Install from CRAN
install.packages("mclink")

# Install from GitHub
if (!require("devtools")) install.packages("devtools")
devtools::install_github("LiuyangLee/mclink")

Quick Start

library(mclink)
# Using built-in datasets
data(KO_pathway_ref)
data(KO_Sample_wide)
# Analyze selected pathways
selected_modules <- c("M00176", "M00165", "M00173", "M00374")
results <- mclink(
  ref = KO_pathway_ref[KO_pathway_ref$Module_Entry %in% selected_modules, ],
  data = KO_Sample_wide,
  table_feature = "completeness"
)
# Access results
head(results$coverage)      # Pathway coverage metrics
head(results$detected_KOs)  # Detected KOs per pathway
head(results$log)           # log

Example for Input and Output Data

1 Input Data Preview (R dataframes KO_Sample_wide and KO_pathway_ref)

1.1 Genome/Sample KO abundance/transcript/prevalence dataframe (e.g., head(KO_Sample_wide))

KOMarinobacter_salariusPseudooceanicola_nanhaiensisAlteromonas_australicaHenriciella_pelagia
K0000161.4495416.923296.8546439.592472
K000020.000000.000005.9836550.000000
K0001449.8441020.2313117.34331224.083303
K000150.0000027.098200.0000000.000000
K0001843.101130.0000019.1150013.344125
K0001949.8441041.8897513.8795289.634129

1.2 (Optional) Pathway information from built-in or user-defined dataframe (e.g., head(KO_pathway_ref))

Orthology_EntryModule_TypeLevel_2Level_3Module_EntryModule_NameDefinitionOrthology_SymbolOrthology_NameKO_Symbol
K00855Pathway modulesEnergy metabolismCarbon fixationM00165Reductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783PRK, prkBphosphoribulokinase [EC:2.7.1.19]K00855; PRK, prkB
K01601Pathway modulesEnergy metabolismCarbon fixationM00165Reductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783rbcL, cbbLribulose-bisphosphate carboxylase large chain [EC:4.1.1.39]K01601; rbcL, cbbL
K01602Pathway modulesEnergy metabolismCarbon fixationM00165Reductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783rbcS, cbbSribulose-bisphosphate carboxylase small chain [EC:4.1.1.39]K01602; rbcS, cbbS

2 Output Data Preview (R list mc_list)

2.1 Pathway Coverage Matrix for Genomes or Samples (mc_list$coverage)

Module_EntryLevel_2Level_3Pathway NameDefinition (Key KOs)Marinobacter salariusPseudooceanicola nanhaiensisAlteromonas australicaHenriciella pelagia
M00165Energy metabolismCarbon fixationReductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K017830.45454550.63636360.63636360.6363636
M00173Energy metabolismCarbon fixationReductive citrate cycle (Arnon-Buchanan cycle)(K00169+K00170+K00171+K00172,K03737) ((K01007,K01006) K01595,K01959+K01960,K01958) K00024 (K01676,K01679,K01677+K01678) (K00239+K00240-K00241-K00242,K18556+K18557+K18558+K18559+K18560) (K01902+K01903) (K00174+K00175-K00177-K00176) K00031 (K01681,K27802,K01682) (K15230+K15231,K15232+K15233 K15234)0.50000000.70000000.50000000.5000000
M00376Energy metabolismCarbon fixation3-Hydroxypropionate bi-cycle(K02160+K01961+K01962+K01963) K14468 K14469 K15052 K05606 (K01847,K01848+K01849) (K14471+K14472) (K00239+K00240+K00241) K016790.22222220.55555560.00000000.3333333
M00375Energy metabolismCarbon fixationHydroxypropionate-hydroxybutylate cycle(K01964+K15037+K15036) K15017 K15039 K15018 K15019 K15020 K05606 (K01848+K01849) (K15038,K15017) K14465 (K14466,K18861,K25774) K14534 K15016 K006260.00000000.14285710.00000000.1428571
M00374Energy metabolismCarbon fixationDicarboxylate-hydroxybutyrate cycle(K00169+K00170+K00171+K00172) K01007 K01595 K00024 (K01677+K01678) (K00239+K00240-K00241-K18860) (K01902+K01903) (K15038,K15017) K14465 (K14467,K18861,K25774) K14534 K15016 K006260.23076920.30769230.23076920.3846154
M00377Energy metabolismCarbon fixationReductive acetyl-CoA pathway (Wood-Ljungdahl)K00198 (K05299-K15022,K22015+K25123+K25124) K01938 K01491-K01500 K00297-K25007-K25008 K15023 K14138+K00197+K001940.42857140.42857140.28571430.2857143
Interpretation Guide
  • Autotrophic potential: Genomes with >80% completeness in carbon fixation pathways may represent autotrophs. None of the tested genomes met the 80% completeness threshold for canonical carbon fixation pathways, suggesting they likely utilize heterotrophic metabolism.
  • Verification: Always check presence of pathway-specific marker genes (e.g., RuBisCO for Calvin cycle, aclAB for rTCA cycle)
  • Threshold adjustment: Some pathways have alternative enzyme implementations (e.g., Form II RuBisCO) requiring lower completeness thresholds

2.2 Detected KOs per Genome/Sample (mc_list$detected_KOs)

Module_EntryPathway NameMarinobacter salariusPseudooceanicola nanhaiensisAlteromonas australicaHenriciella pelagia
M00165Calvin cycleK00615 K00855 K01783 K01803 K01807K00134 K00615 K01623 K01783 K01803 K01808...K00134 K00615 K00855 K01623 K01783 K01803K00615 K01623 K01783 K01803...
M00173Arnon-Buchanan cycleK00031 K00239-K00242 K01007 K01595 K01676...K00024 K00239-K00242 K01006 K01679 K01902...K00024 K00239 K00241 K01007 K01595 K01676K00024 K00239-K00242 K01006...
M003763-Hydroxypropionate bi-cycleK00239-K00241 K01847 K01962-K01963 K02160K00239-K00241 K01679 K01847 K01961-K01963...K00239 K00241 K01962-K01963 K02160K00239-K00241 K01847 K01962...
M00375Hydroxypropionate-hydroxybutylate cycleK01848K00626 K05606-K00626 K05606
M00374Dicarboxylate-hydroxybutyrate cycleK00239-K00241 K01007 K01595 K01902K00024 K00239-K00241 K00626 K01902-K01903K00024 K00239 K00241 K01007 K01595 K01902K00024 K00239-K00241 K00626...
M00377Wood-Ljungdahl pathwayK00297 K01491 K01938 K22015K00297 K01491 K01938K00297 K01491K00297 K01491

2.3 Process Log (mc_list$log)

[1] "[2025-08-21 17:35:12] mclink started!"
[2] "[2025-08-21 17:35:12] Input Sample-KO table type: completeness"
[3] "[2025-08-21 17:35:12] Scale method for plus: min"
[4] "[2025-08-21 17:35:12] Scale method for comma: max"
[5] "[2025-08-21 17:35:12] Pathway information dataframe successfully imported."
[6] "[2025-08-21 17:35:12] There are 7 Modules in the Pathway information dataframe: M00165 M00173 M00376 M00375 M00374 M00377 M00176"
[7] "[2025-08-21 17:35:12] Genome-KO File successfully imported."
[8] "[2025-08-21 17:35:12] There are 50 intersect KOs between the KO list and input dataframe: K00024 K00031 K00134 K00239 K00240 K00241 K00242 K00297 K00380 K00381 K00390 K00615 K00626 K00855 K00860 K00955 K00956 K00957 K00958 K01006 K01007 K01491 K01595 K01623 K01676 K01679 K01681 K01682 K01783 K01803 K01807 K01808 K01847 K01848 K01902 K01903 K01938 K01958 K01960 K01961 K01962 K01963 K02160 K02446 K03841 K05606 K08691 K09709 K11532 K22015"
[9] "[2025-08-21 17:35:12] Converting abundance table to completeness table..."
[10] "[2025-08-21 17:35:12] Starting Module: M00165"
[11] "[2025-08-21 17:35:12] Processing module steps: K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783"
[12] "[2025-08-21 17:35:12] After omitting minus KOs: K00855 K01601 K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783"
[13] "[2025-08-21 17:35:12] Nested steps include: "
[14] "[2025-08-21 17:35:12]     (K05298,K00150,K00134)"
[15] "[2025-08-21 17:35:12]     (K01623,K01624)"
[16] "[2025-08-21 17:35:12]     (K03841,K02446,K11532,K01086)"
[17] "[2025-08-21 17:35:12]     (K01100,K11532,K01086)"
[18] "[2025-08-21 17:35:12]     (K01807,K01808)"
[19] "[2025-08-21 17:35:12] Start processing nested steps..."
[20] "[2025-08-21 17:35:12] Analyzing M00165_1: (K05298,K00150,K00134)"
[21] "[2025-08-21 17:35:12] Bracket level: 1"
[22] "[2025-08-21 17:35:12]    Running KOs comma: M00165_1 = K05298"
[23] "[2025-08-21 17:35:12]    Running KOs comma: M00165_1 = K00150"
[24] "[2025-08-21 17:35:12]    Running KOs comma: M00165_1 = K00134"
[25] "[2025-08-21 17:35:12] Analyzing M00165_2: (K01623,K01624)"
[26] "[2025-08-21 17:35:12] Bracket level: 2"
[27] "[2025-08-21 17:35:12]    Running KOs comma: M00165_2 = K01623"
[28] "[2025-08-21 17:35:12]    Running KOs comma: M00165_2 = K01624"
[29] "[2025-08-21 17:35:12] Analyzing M00165_3: (K03841,K02446,K11532,K01086)"
[30] "[2025-08-21 17:35:12] Bracket level: 3"
[31] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K03841"
[32] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K02446"
[33] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K11532"
[34] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K01086"
[35] "[2025-08-21 17:35:12] Analyzing M00165_4: (K01100,K11532,K01086)"
[36] "[2025-08-21 17:35:12] Bracket level: 4"
[37] "[2025-08-21 17:35:12]    Running KOs comma: M00165_4 = K01100"
[38] "[2025-08-21 17:35:12]    Running KOs comma: M00165_4 = K11532"
[39] "[2025-08-21 17:35:12]    Running KOs comma: M00165_4 = K01086"
[40] "[2025-08-21 17:35:12] Analyzing M00165_5: (K01807,K01808)"
[41] "[2025-08-21 17:35:12] Bracket level: 5"
[42] "[2025-08-21 17:35:12]    Running KOs comma: M00165_5 = K01807"
[43] "[2025-08-21 17:35:12]    Running KOs comma: M00165_5 = K01808"
[44] "[2025-08-21 17:35:12] Start processing final step..."
[45] "[2025-08-21 17:35:12] Analyzing M00165: K00855 K01601 K00927 M00165_1 K01803 M00165_2 M00165_3 K00615 M00165_4 M00165_5 K01783"
[46] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K00855"
[47] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K01601"
[48] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K00927"
[49] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_1"
[50] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K01803"
[51] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_2"
[52] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_3"
[53] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K00615"
[54] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_4"
[55] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_5"
[56] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K01783"
[57] "[2025-08-21 17:35:12] Completed Module: M00165"

Documentation

Full function reference:

?mclink::mclink

Citation

If you use mclink in your research, please cite:

Li, L., Huang, D., Hu, Y., Rudling, N. M., Canniffe, D. P., Wang, F., & Wang, Y. "Globally distributed Myxococcota with photosynthesis gene clusters illuminate the origin and evolution of a potentially chimeric lifestyle." Nature Communications (2023), 14, 6450. https://doi.org/10.1038/s41467-023-42193-7

Dependencies

  • R (≥ 3.5)
  • data.table (≥ 1.17.0)
  • dplyr (≥ 1.1.4)
  • stringr (≥ 1.5.1)
  • tibble (≥ 3.2.1)

License

GPL-3 © Liuyang Li

Contact

  • Maintainer: Liuyang Li [email protected]
  • Bug reports: https://github.com/LiuyangLee/mclink/issues.
Metadata

Version

1.1.1

License

Unknown

Platforms (76)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows