MyNixOS website logo
Description

Retrieve, Transform and Analyze the Barcode of Life Data Systems Data.

Facilitates retrieval, transformation and analysis of the data from the Barcode of Life Data Systems (BOLD) database <https://boldsystems.org/>. This package allows both public and private user data to be easily downloaded into the R environment using a variety of inputs such as: IDs (processid, sampleid), BINs, dataset codes, project codes, taxonomy, geography etc. It provides frictionless data conversion into formats compatible with other R-packages and third-party tools, as well as functions for sequence alignment & clustering, biodiversity analysis and spatial mapping.

BOLDconnectR is a package designed for retrieval, transformation and analysis of the data available in the Barcode Of Life Data Systems (BOLD) database. This package provides the functionality to obtain public and private user data available in the database in the Barcode Core Data Model (BCDM) format. Data include information on the taxonomy,geography,collection,identification and DNA barcode sequence of every submission. The manual is currently hosted here (https://github.com/boldsystems-central/BOLDconnectR_examples/blob/main/BOLDconnectR_1.0.0.pdf)

BOLDconnectR requires R version 4.0 or above to function properly. The versions of dependent packages have also been set such that they would work with R >= 4.0. In addition, there are a few suggested packages that are not mandatory for the package to download and install properly, but, are essential for a couple of functions to work. The names and exact versions of the dependencies/suggestions are given here (https://github.com/boldsystems-central/BOLDconnectR/blob/main/DESCRIPTION). More details on Suggested packages provided below.

Installation

The package can be installed using devtools::install_github function from the devtools package in R (which needs to be installed before installing BOLDConnectR).


devtools::install_github("https://github.com/boldsystems-central/BOLDconnectR")
library(BOLDconnectR)

BOLDconnectR has 11 functions currently:

  1. bold.fields.info
  2. bold.apikey
  3. bold.fetch
  4. bold.public.search
  5. bold.full.search
  6. bold.data.summarize
  7. bold.analyze.align
  8. bold.analyze.tree
  9. bold.analyze.diversity
  10. bold.analyze.map
  11. bold.export

Note on Suggested packagesFunction 6: bold.data.summarize requires the packages Biostrings to be installed and imported in R session beforehand for generating the barcode_summary. Function 7: bold.analyze.align requires the packages msa and Biostrings to be installed and imported in the R session beforehand. Function 8 also uses the output generated from function 7. msa and Biostrings can be installed using the BiocManager package.


if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")

BiocManager::install("msa")
BiocManager::install("Biostrings")

library(msa)
library(Biostrings)

Note on API key

The function bold.fetch requires an api key internally to access and download all public + private user data. The API key needed to retrieve BOLD records is found in the BOLD ‘Workbench’ https://bench.boldsystems.org/index.php/Login/page?destination=MAS_Management_UserConsole. After logging in, navigate to ‘Your Name’ (located at the top left-hand side of the window) and click ‘Edit User Preferences’. You can find the API key in the ‘User Data’ section. Please note that to have an API key available in the workbench, a user must have uploaded at least 10,000 records to BOLD. API key can be saved in the R session using bold.apikey() function. Please note that the API keys are regenerated periodically and will be updated in the user's workbench account. Using old keys will result in a HTTP 401 error.

# Substitute ‘00000000-0000-0000-0000-000000000000’ with your key
# bold.apikey(‘00000000-0000-0000-0000-000000000000’)

Basic usage of BOLDConnectR functionality

API key function must be run prior to using the fetch function (Please see above).

Fetch data

BCDM_data<-bold.fetch(get_by = "processid",
                      identifiers = test.data$processid)
#> Initiating download
#>  Downloading data in a single batch  
#> Download complete & BCDM dataframe generated

knitr::kable(head(BCDM_data,4))
processidrecord_idinsdc_acssampleidspecimenidtaxidshort_noteidentification_methodmuseumidfieldidcollection_codeprocessid_minted_dateinstfunding_srcsexlife_stagereproductionhabitatcollectorssite_codespecimen_linkoutcollection_event_idsampling_protocoltissue_typecollection_date_startcollection_timeassociated_taxaassociated_specimensvoucher_typenotestaxonomy_notescollection_notesgeoidmarker_codekingdomphylumclassorderfamilysubfamilytribegenusspeciessubspeciesidentificationidentification_rankspecies_referenceidentified_bysequence_run_sitenucnuc_basecountsequence_upload_datebin_uribin_created_dateelevdepthcoordcoord_sourcecoord_accuracyelev_accuracydepth_accuracyrealmbiomeecoregionregionsectorsitecountry_isocountry.oceanprovince.statebold_recordset_code_arrcollection_date_end
SSWLD6460-13SSWLD6460-13.COI-5PKM825932BIOUG06662-C0134357159199Waterton Lakes NPBOLD ID Engine: top hitsBIOUG06662-C01L#12BIOBUS-1587BIOUG2013-07-04Centre for Biodiversity GenomicsiBOL:WG1.9NANANAForestBIOBus 2012BIOUG:WATERTON-NP:2NANASweep NetWhole Voucher2012-08-08NANANAVouchered:Registered CollectionNANA5 min sweep x4 collectors (2)|Sunny with slight haze, 23C|montane forest, douglas fir and lodgepole pine stand with aspen and birch understory533COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANAErisEris militarisNAEris militarisspecies(Hentz, 1845)Monica R. YoungCentre for Biodiversity GenomicsAACGTTATATTTAATTTTTGGAGCTTGATCAGCTATAGTTGGTACTGCTATAAGAGTATTAATTCGAATAGAATTAGGACAAACT—GGATCATTTTTAGGT————AATGATCATATATATAATGTAATTGTAACTGCTCATGCTTTTGTAATGATTTTTTTTATAGTAATACCAATTATAATTGGGGGATTTGGTAATTGGTTAGTTCCTTTAATGTTAGGGGCTCCGGATATAGCTTTTCCTCGAATAAATAATTTAAGTTTTTGATTATTACCTCCTTCTTTATTTTTATTGTTTATTTCTTCTATAGCTGAAATAGGGGTT—GGAGCTGGATGAACAGTATATCCTCCTTTGGCATCTATTGTTGGACATAATGGTAGATCAGTAGATTTTGCTATTTTTTCTTTACATTTAGCTGGTGCTTCATCAATTATAGGAGCTATTAATTTTATTTCTACTATTATTAATATACGA—TCAGTAGGAATATCTTTAGATAAAATTCCTTTATTTGTTTGATCTGTAATAATTACTGCTGTATTATTATTGTTATCATTACCTGTTTTAGCAGGAGCTATTACTATATTATTAACTGAT———————5892013-09-16BOLD:AAA56542010-07-151562NA49.065,-113.778GPSmap 60CxNANANANearcticNANorthern_Rockies_conifer_forestsWaterton Lakes National ParkEast of 2 Flags LookoutHighway 6 pulloffCACanadaAlbertaSSWLD,DS-MOB113,DS-BICNP02,DS-SOC2014,DS-ARANCCYH,DATASET-BBWLNP1,DS-SPCANADA,DS-JUMPGLOB,DS-MOB112,DS-CANSSNA
SPITO327-14SPITO327-14.COI-5PKP654265BIOUG12602-G1146101969162NANABIOUG12602-G11L#14BLITZ-001BIOUG2014-06-10Centre for Biodiversity GenomicsiBOL:WG1.9MASNAGergin BlagoevNANANAFree Hand CollectionNA2014-05-24NANANAmuseum voucherCollected May 24-14, as part of Humber Watershed BioBlitzNANA528COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANAPhidippusPhidippus audaxNAPhidippus audaxspecies(Hentz, 1845)Gergin A. BlagoevCentre for Biodiversity Genomics-ACATTATATTTGATTTTTGGAGCTTGGGCTGCAATAGTTGGTACTGCAATA—AGTGTATTGATTCGAATAGAATTGGGTCAAACTGGATCATTTATAGGAAAT—GATCATATATATAATGTAATTGTGACTGCTCATGCTTTTGTTATAATTTTTTTTATAGTAATACCTATTATGATTGGAGGATTTGGAAACTGATTAGTTCCTTTAATA—TTAGGTGCTCCTGATATGGCTTTTCCTCGTATAAATAATTTGAGATTTTGATTATTACCCCCTTCTTTATTTTTATTATTTATTTCTTCCATAGCTGAGGTAGGTGTAGGGGCTGGTTGGACAGTTTATCCACCTTTGGCCTCTATTGTTGGGCATAATGGAAGATCAGTAGATTTT—GCTATTTTTTCATTACATTTAGCTGGTGCTTCATCAATTATAGGAGCTATTAATTTTATTTCTACAATTATTAATATACGTTCTTTAGGAATGTCTTTAGATAAAATTCCTTTGTTTGTTTGATCTGTAATAATTACTGCAGTTTTGTTATTACTTTCTCTTCCTGTATTAGCTGGG—GCTATTACTATATTGTTGACTGAT——————————————————————————————————————————————————————————————————————————————————————-5882014-06-27BOLD:AAC68912010-07-15380NA43.933,-79.928NANANANANearcticNAEastern_Great_Lakes_lowland_forestsHumber WatershedNAGlen Haffy Conservation AreaCACanadaOntarioSPITO,DS-SOC2014,DS-ARANCCYH,DS-TMPSRCH,DS-SPCANADA,DS-OLOCC2,DS-JUMPGLOBNA
ARONT071-09ARONT071-09.COI-5PGU68283609ONTGAB-183122996630494SPIOH09-1 F11NA09ONTGAB-183090816FHBIOUG2009-09-23Centre for Biodiversity GenomicsiBOL:WG1.9MISNAG.A.BlagoevNANANANA2009-08-16NANANANANANA528COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANANaphrysNaphrys pulexNANaphrys pulexspecies(Hentz, 1846)Gergin A. BlagoevCentre for Biodiversity GenomicsAACATTATATTTGATTTTTGGTGCTTGATCAGCTATAGTAGGTACGGCTATAAGAGTTTTGATTCGAATAGAGTTGGGACAGACTGGTAATTTTTTGGGAAATGATCATTTATATAATGTCATTGTAACTGCTCATGCTTTTGTTATGATTTTTTTTATAGTAATACCTATTTTGATTGGTGGTTTTGGTAATTGATTAGTGCCATTAATATTAGGGGCTCCTGATATAGCTTTTCCTCGGATGAATAATTTGAGATTTTGGTTATTACCCCCTTCATTAATACTCTTATTTATATCTTCAATAGTGGAGATAGGGGTAGGAGCAGGGTGAACAGTGTATCCCCCATTAGCTTCTGTTGTAGGTCATAATGGAAGATCTGTTGATTTTGCTATTTTTTCTTTACATTTAGCGGGGGCTTCTTCTATTATAGGAGCAGTTAATTTTATTTCTACTATTATTAATATACGTGTATTAGGAATGAGAATAGATAAGATTCCTTTGTTTGTTTGGTCAGTTGGGATTACTGCTGTATTATTATTATTATCACTACCAGTGTTGGCTGGTGCTATTACAATATTGTTGACTGATCGTAATTTTAATACCTCTTTTTTTGATCCTGCGGGAGGAGGGGATCCGGTTTTGTTTCAGCATTTATTT6582009-10-29BOLD:AAC24332010-07-15300NA43.691,-80.414NANANANearcticNAEastern_Great_Lakes_lowland_forestsWellington Co.EloraBeachCACanadaOntarioARONT,DS-MOB113,DS-SOC2014,DS-MYBCA,DS-ARANCCYH,DS-SPCANADA,DS-OLOCC1,DS-JUMPGLOB,DS-MOB112,DS-JALPHANA
SPIRU1237-11SPIRU1237-11.COI-5PKF368796BIOUG00629-G031982513842900ocean beach|AP|HCNABIOUG00629-G03L#10PROBE-651010PROBE2011-05-16Centre for Biodiversity GenomicsiBOL:WG1.10ISNAV. JuneaBIOUG:ChurchillNANANANA2010-07-30NANANAwhole specimenNANANA531COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANASittisaxSittisax ranieriNASittisax ranierispecies(G. W. Peckham & E. G. Peckham, 1909)Gergin A. BlagoevCentre for Biodiversity GenomicsTACGTTATATTTAGTTTTTGGAGCTTGGTCTGCTATAGTTGGTACGGCTATAAGAGTTTTAATTCGTATAGAATTAGGTCAAACTGGTCATTTTTTAGGAAATGATCATTTGTATAATGTAATTGTTACTGCACATGCATTTGTTATAATTTTTTTTATAGTAATACCTATTTTGATTGGAGGTTTTGGTAATTGATTAGTCCCTCTAATGTTAGGAGCTCCGGATATAGCTTTTCCTCGTATAAATAATTTAAGTTTTTGATTATTACCTCCTTCATTATTTTTATTATTTATTTCATCTATAGCTGAGATAGGAGTAGGGGCAGGGTGAACTGTTTATCCTCCATTAGCTTCTATTGTAGGTCATAATGGAAGTTCGGTAGATTTTGCTATTTTTTCTCTTCATTTGGCTGGGGCTTCATCAATTATAGGTGCTATTAATTTTATTTCAACTGTTATTAATATACGATCGGTGGGTATATCAATAGATAAGATTCCATTGTTTGTTTGGTCTGTTGTAATTACTGCTGTATTATTGTTATTGTCTTTACCTGTTTTAGCGGGTGCAATTACTATGCTATTGACTGATCGAAATTTTAATACGTCTTTTTTTGATCCTGCTGGAGGAGGGGATCCAATTTTATTTCAACATTTATTT6582012-11-23BOLD:AAC20612010-07-15NANA58.772,-93.843GPS WGS84NANANANearcticNASouthern_Hudson_Bay_taigaChurchill16 km E Churchill, Bird Cove, Rock Bluff ABeachCACanadaManitobaCHSPI,DATASET-CHURCH12,DS-MOB113,DS-SOC2014,DS-ARANCCYH,DS-TMPSRCH,DS-SPCANADA,DS-ATBIB,DS-JUMPGLOB,DS-MOB112,DS-ARA43210NA

Similarly, sampleids or dataset_codes or project_codes can also be used to fetch data. The data can also be filtered on different parameters such as Geography, Attributions and DNA Sequence information using the _filt arguments available in the function

Summarize downloaded data

Downloaded data can then be summarized in different ways. Options currently include a concise summary of all the data, detailed taxonomic counts, data completeness and a barcode-based summary

BCDM_data_summary<-bold.data.summarize(bold_df = BCDM_data,
                               summary_type = "concise_summary")

BCDM_data_summary$concise_summary
#>                        Category   Value
#> 1                 Total_records    1336
#> 2     Total_records_w_sequences    1336
#> 3                Unique_species      80
#> 4                   Unique_BINs     117
#> 5              Unique_countries       1
#> 6             Unique_institutes       6
#> 7          Unique_identified_by       7
#> 8  Unique_specimen_depositories       3
#> 9                Unique_markers  COI-5P
#> 10        Amplicon_length_range 508-658

A concise summary providing a high level overview of the data

Export the downloaded data

Downloaded data can also be exported to the local machine either as a flat file or as a FASTA file for any third party sequence analysis tools.The flat file contents can be modified as per user requirements (entire data or specific presets or individual fields).

# Preset dataframe
# bold.export(bold_df = BCDM_data,
#             export_type = "preset_df",
#             presets = 'taxonomy',
#             cols_for_fas_names = NULL,
#             export = "file_path_with_intended_name.csv")

# Unaligned fasta file
# bold.export(bold_df = BCDM_data,
#             export_type = "fas",
#             cols_for_fas_names = c("bin_uri","genus","species"),
#             export = "file_path_with_intended_name.fas")

Other functions

The package also has functions that provide sequence alignment, NJ clustering, biodiversity analysis and occurrence mapping using the downloaded BCDM data. Additionally, some of these functions also output objects that are commonly used by other R packages (Ex. ‘sf’ dataframe, occurrence matrix for ‘vegan’ and ‘betapart’). Please go through the help manual (Link provided above) for detailed usage of all the functions of BOLDConnectR with examples.

BOLDconnectR can retrieve data very fast (~100k records in a minute on a fast wired connection).

Citation: Padhye SM, Ballesteros-Mejia CL, Agda TJA, Agda JRA, Ratnasingham S. BOLDconnectR: An R package for streamlined retrieval, transformation and analysis of BOLD DNA barcode data (MS in prep).

Metadata

Version

1.0.0

License

Unknown

Platforms (76)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows