Description
Codon Usage Bias Analysis.
Description
A suite of functions for rapid and flexible analysis of codon usage bias. It provides in-depth analysis at the codon level, including relative synonymous codon usage (RSCU), tRNA weight calculations, machine learning predictions for optimal or preferred codons, and visualization of codon-anticodon pairing. Additionally, it can calculate various gene- specific codon indices such as codon adaptation index (CAI), effective number of codons (ENC), fraction of optimal codons (Fop), tRNA adaptation index (tAI), mean codon stabilization coefficients (CSCg), and GC contents (GC/GC3s/GC4d). It also supports both standard and non-standard genetic code tables found in NCBI, as well as custom genetic code tables.
README.md
cubar
Overview
cubar is a package for codon usage bias analysis in R. Main features are as follows:
- Codon level analyses
- Calculate tRNA weights;
- Calculate relative synonymous codon usage (RSCU);
- Machine learning-based inference of optimal codons;
- Visualization codon-anticodon pairing relationships;
- Gene level analyses
- Tabulate codon frequency of each coding sequence;
- Measure codon usage similarity to highly expressed genes with Codon Adaptation Index (CAI);
- Quantify the influnce of codon usage on mRNA stability with Mean Codon Stabilization Coefficients (CSCg);
- Measure codon usage bias with the nonparametric index Effective number of codons (ENC);
- Measure the fraction of pre-determined optimal codons (Fop) in each sequence;
- Overall GC content (GC) or that of 3rd synonymous positions (GC3s) or 4-fold degenerate sites (GC4d);
- Quantify whether codon usage matches tRNA availability using tRNA Adaptation Index (tAI);
- Utilities
- Sliding window analysis of codon usage within a coding sequence;
- Optimize codon usage based on optimal codons for heterologous expression;
- Test differential usage of codons between two sets of sequences;
Main advantages of cubar
are as follows:
- Process large datasets (>10,0000 sequences) efficiently using the
Biostrings
anddata.table
backends; - Support genetic codes cataloged by NCBI as well as custom ones;
- Integrate with other data analysis or bioinformatic packages in the R ecosystem;
Dependencies
Depends
R
(>= 4.1.0)
Imports
Biostrings
(>= 2.60.0),IRanges
(>= 2.34.0),data.table
(>= 1.14.0),ggplot2
(>= 3.3.5),rlang
(>= 0.4.11)
Installation
The latest release of cubar
can be installed with:
install.packages("cubar")
The latest developmental version of cubar
can be installed with:
devtools::install_github("mt1022/cubar", dependencies = TRUE)
Usage
Documentation can be found within R (by typing ?function_name
). The following tutorials are available from our website:
- Get Started: A brief introduction demonstrating the basic usage of
cubar
; - Non-standard Genetic Code: How to use
cubar
with non-standard genetic codes; - Theories behind cubar: The mathematical details behind the core functions in
cubar
;
Getting help
Please use GitHub issues for bug reports, questions, and feature requests.
Suggests
- Biostrings for sequence input/output and manipulation;
- Peptides for peptide- or protein-related indices;
Acknowledgements
GitHub Copilot was used to suggest code snippets in the development of this package. Thanks the GitHub Education teacher program for providing free access to GitHub Copilot.