Description

Codon Usage Bias Analysis.

Description

A suite of functions for rapid and flexible analysis of codon usage bias. It provides in-depth analysis at the codon level, including relative synonymous codon usage (RSCU), tRNA weight calculations, machine learning predictions for optimal or preferred codons, and visualization of codon-anticodon pairing. Additionally, it can calculate various gene- specific codon indices such as codon adaptation index (CAI), effective number of codons (ENC), fraction of optimal codons (Fop), tRNA adaptation index (tAI), mean codon stabilization coefficients (CSCg), and GC contents (GC/GC3s/GC4d). It also supports both standard and non-standard genetic code tables found in NCBI, as well as custom genetic code tables.

README.md

cran.r-project.org

cubar

Overview

cubar is a package for codon usage bias analysis in R. Main features are as follows:

Codon level analyses
- Calculate codon weights based on gene expression, tRNA availability, and mRNA stability;
- Calculate relative synonymous codon usage (RSCU);
- Machine learning-based inference of optimal codons;
- Visualization codon-anticodon pairing relationships;
Gene level analyses
- Tabulate codon frequency of each coding sequence;
- Measure codon usage similarity to highly expressed genes with Codon Adaptation Index (CAI);
- Quantify the influence of codon usage on mRNA stability with Mean Codon Stabilization Coefficients (CSCg);
- Measure codon usage bias with the nonparametric index Effective number of codons (ENC);
- Measure the fraction of pre-determined optimal codons (Fop) in each sequence;
- Overall GC content (GC) or that of 3rd synonymous positions (GC3s) or 4-fold degenerate sites (GC4d);
- Quantify whether codon usage matches tRNA availability using tRNA Adaptation Index (tAI);
- Measure the deviation from porportionality (Dp) of viral synonymous codon usage from host tRNA supply;
Utilities
- Sliding window analysis of codon usage within a coding sequence;
- Optimize codon usage based on optimal codons for heterologous expression;
- Test differential usage of codons between two sets of sequences;

Main advantages of cubar are as follows:

Process large datasets (>10,0000 sequences) efficiently using the Biostrings and data.table backends;
Support genetic codes cataloged by NCBI as well as custom ones;
Integrate with other data analysis or bioinformatic packages in the R ecosystem;

Dependencies

Depends

R (>= 4.1.0)

Imports

Biostrings (>= 2.60.0),
IRanges (>= 2.34.0),
data.table (>= 1.14.0),
ggplot2 (>= 3.3.5),
rlang (>= 0.4.11)

Installation

The latest release of cubar can be installed with:

install.packages("cubar")

The latest developmental version of cubar can be installed with:

devtools::install_github("mt1022/cubar", dependencies = TRUE)

Usage

Documentation can be found within R (by typing ?function_name). The following tutorials are available from our website:

Get Started: A brief introduction demonstrating the basic usage of cubar;
Non-standard Genetic Code: How to use cubar with non-standard genetic codes;
Theories behind cubar: The mathematical details behind the core functions in cubar;

Getting help

Please use GitHub issues for bug reports, questions, and feature requests.

Suggests

Biostrings for sequence input/output and manipulation;
Peptides for peptide- or protein-related indices;

Acknowledgements

GitHub Copilot was used to suggest code snippets in the development of this package. Thanks the GitHub Education teacher program for providing free access to GitHub Copilot.

r-cubar

cubar

Overview

Dependencies

Installation

Usage

Getting help

Suggests

Acknowledgements

Version

License

Status

Source

Homepage

Platforms (77)

cubar

Overview

Dependencies

Installation

Usage

Getting help

Suggests

Acknowledgements

Version

License

Status

Source

Homepage

Platforms77 (77)

Platforms (77)