Description
Toolkit for the 'Entrez' API.
Description
Interact with the 'Entrez' API hosted by the National Center for Biotechnology Information (NCBI), <https://www.ncbi.nlm.nih.gov/books/NBK25499/>. This package is focused on working with sequence metadata and links. It handles pagination and compensates for some API limitations to simplify these tasks. API calls are printed to the console to highlight how high-level queries are translated into individual HTTP requests.
README.md
jentre
jentre is a client for the NCBI’s Entrez API.
The Entrez API has many quirks. jentre attempts to deal with those while presenting a convenient interface. It is designed for bulk metadata fetching and link traversal, though also provides full access to other parts of Entrez, albeit with fewer helpers.
Features of jentre:
- Provides objects representing sets of Entrez identifiers to avoid mixing them up
- Batches requests behind the scenes when needed
- Based on {httr2} and {xml2}
The {rentrez} package is more mature and might suit a broader set of applications.
Installation
You can install jentre like so:
# CRAN release
install.packages('jentre')
# development version
install.packages('jentre', repos = c('https://cidm-ph.r-universe.dev', 'https://cloud.r-project.org'))
Example
library(jentre)
# Searches by default use the Entrez history server for efficiency:
results <- esearch("Corynebacterium diphtheriae[orgn]", "biosample")
#> ℹ eSearch query "\"Corynebacterium diphtheriae\"[Organism]" has 4124 results
results
#> <entrez@/biosample[1]>
#> [1] MCID_68491.#1[4124]
# The returned object keeps track of which database the UIDs belong to,
# and whether the UIDs are local or on the history server. This makes
# them easier and less error-prone to use:
links <- elink(results, "sra")
links
#> # A tibble: 1 × 3
#> from linkname to
#> <list> <chr> <list>
#> 1 <idlst [4,124]> biosample_sra <wbhst [1]>
# You can pull a list of UIDs from the history server:
ids <- as_id_list(links$to[[1]])
# This is a vector with some extra metadata attached, but can be
# subsetted normally:
head(ids, n = 10)
#> <entrez/sra[10]>
#> [1] 38889263 38768719 38768704 38641044 38501020 38428896 38428225 38427762
#> [9] 38401944 38401943
as.character(ids[4:8])
#> [1] "38641044" "38501020" "38428896" "38428225" "38427762"
# For endpoints with richer data, you can provide an function to
# extract data you care about from the XML document. The output is
# combined if multiple API requests are needed, so you can end up
# with a single combined data frame, list, or vector:
esummary(
ids[1:20],
.process = function(doc) {
xml2::xml_find_all(doc, "//DocumentSummary/CreateDate") |> xml2::xml_text()
}
)
#> [1] "2025/05/29" "2025/05/21" "2025/05/21" "2025/05/14" "2025/05/05"
#> [6] "2025/04/30" "2025/04/30" "2025/04/30" "2025/04/29" "2025/04/29"
#> [11] "2025/04/29" "2025/04/29" "2025/04/29" "2025/04/29" "2025/04/29"
#> [16] "2025/04/29" "2025/04/29" "2025/04/21" "2025/04/21" "2025/04/21"
# If needed, you can construct an arbitrary request:
req <- entrez_request("esearch.fcgi", db = "nucleotide", term = "biomol+trna[prop]")
# You'll need to execute and parse it yourself:
httr2::req_perform(req) |> httr2::resp_body_xml()
#> {xml_document}
#> <eSearchResult>
#> [1] <Count>1012</Count>
#> [2] <RetMax>20</RetMax>
#> [3] <RetStart>0</RetStart>
#> [4] <IdList>\n <Id>2737963026</Id>\n <Id>2586967820</Id>\n <Id>2274792564< ...
#> [5] <TranslationSet/>
#> [6] <TranslationStack>\n <TermSet>\n <Term>biomol+trna[prop]</Term>\n ...
#> [7] <QueryTranslation>biomol+trna[prop]</QueryTranslation>