MyNixOS website logo
Description

Predict Antimicrobial Peptides.

A toolkit to predict antimicrobial peptides from protein sequences on a genome-wide scale. It incorporates two support vector machine models ("precursor" and "mature") trained on publicly available antimicrobial peptide data using calculated physico-chemical and compositional sequence properties described in Meher et al. (2017) <doi:10.1038/srep42362>. In order to support genome-wide analyses, these models are designed to accept any type of protein as input and calculation of compositional properties has been optimised for high-throughput use. For best results it is important to select the model that accurately represents your sequence type: for full length proteins, it is recommended to use the default "precursor" model. The alternative, "mature", model is best suited for mature peptide sequences that represent the final antimicrobial peptide sequence after post-translational processing. For details see Fingerhut et al. (2020) <doi:10.1093/bioinformatics/btaa653>. The 'ampir' package is also available via a Shiny based GUI at <https://ampir.marine-omics.net/>.

Introduction to ampir

Travis buildstatus codecov License: GPLv2 CRAN_Release_Badge CRAN_Download_Badge

The ampir (short for antimicrobial peptide prediction in r ) package was designed to be a fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs. It incorporates two support vector machine classification models, “precursor” and “mature” that have been trained on publicly available antimicrobial peptide data. The default model, “precursor” is best suited for full length proteins and the “mature” model is best suited for small mature proteins (<60 amino acids). ampir also accepts custom (user trained) models based on the caret package. Please see the ampir“How to train your model”vignette for details.

ampir’s associated paper is published in the Bioinformatics journal as btaa653. Please cite this paper if you use ampir in your research.

ampir is also available via a Shiny based GUI at https://ampir.marine-omics.net/ where users can submit protein sequences in FASTA file format to be classified by either the “precursor” or “mature” model. The prediction results can then be downloaded as a csv file.

Installation

You can install the released version of ampir from CRAN with:

install.packages("ampir")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("Legana/ampir")
library(ampir)

Usage

Standard input to ampir is a data.frame with sequence names in the first column and protein sequences in the second column.

Read in a FASTA formatted file as a data.frame with read_faa()

my_protein_df <- read_faa(system.file("extdata/little_test.fasta", package = "ampir"))
seq_nameseq_aa
G1P6H5_MYOLUMALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT…
L5L3D0_PTEALMKPLLIVFVFLIFWDPALAGLNPISSEMYKKCYGNGICRLECYTS…
A0A183U1F1_TOXCALLRLYSPLVMFATRRVLLCLLVIYLLAQPIHSSWLKKTYKKLENS…
Q5F4I1_DROPSMNFYKIFIFVALILAISVGQSEAGWLKKLGKRLERVGQHTRDATI…
A7S075_NEMVEMFLKVVVVLLAVELSVAQSARQRVRPLDRKAGRKRFAPIFPRQCS…
F1DFM9_9CNIDMKVLVILFGAMLVLMEFQKASAATLLEDFDDDDDLLDDGGDFDLE…
Q5XV93_ARATHMSKREYERQLANEEDEQLRNFQAAVAARSAILHEPKEAALPPPAP…
Q2XXN9_POGBAMRFLYLLFAVAFLFSVQAEDAELEQEQQGDPWEGLDEFQDQPPDD…

Calculate the probability that each protein is an antimicrobial peptide with predict_amps(). Since these proteins are all full length precursors rather than mature peptides we use ampir’s built-in precursor model.

Note that amino acid sequences that are shorter than 10 amino acids long and/or contain anything other than the standard 20 amino acids are not evaluated and will contain an NA as their prob_AMP value.

my_prediction <- predict_amps(my_protein_df, model = "precursor")
seq_nameseq_aaprob_AMP
G1P6H5_MYOLUMALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT…0.612
L5L3D0_PTEALMKPLLIVFVFLIFWDPALAGLNPISSEMYKKCYGNGICRLECYTS…0.945
A0A183U1F1_TOXCALLRLYSPLVMFATRRVLLCLLVIYLLAQPIHSSWLKKTYKKLENS…0.088
Q5F4I1_DROPSMNFYKIFIFVALILAISVGQSEAGWLKKLGKRLERVGQHTRDATI…0.998
A7S075_NEMVEMFLKVVVVLLAVELSVAQSARQRVRPLDRKAGRKRFAPIFPRQCS…0.032
F1DFM9_9CNIDMKVLVILFGAMLVLMEFQKASAATLLEDFDDDDDLLDDGGDFDLE…0.223
Q5XV93_ARATHMSKREYERQLANEEDEQLRNFQAAVAARSAILHEPKEAALPPPAP…0.009
Q2XXN9_POGBAMRFLYLLFAVAFLFSVQAEDAELEQEQQGDPWEGLDEFQDQPPDD…0.733

Predicted proteins with a specified predicted probability value could then be extracted and written to a FASTA file:

my_predicted_amps <- my_protein_df[which(my_prediction$prob_AMP >= 0.8),]
seq_nameseq_aa
2L5L3D0_PTEALMKPLLIVFVFLIFWDPALAGLNPISSEMYKKCYGNGICRLECYTS…
4Q5F4I1_DROPSMNFYKIFIFVALILAISVGQSEAGWLKKLGKRLERVGQHTRDATI…

Write the data.frame with sequence names in the first column and protein sequences in the second column to a FASTA formatted file with df_to_faa()

df_to_faa(my_predicted_amps, "my_predicted_amps.fasta")
Metadata

Version

1.1.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD 13
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd13
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd13
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows