MyNixOS website logo
Description

Download and Process Public Domain Works from Project Gutenberg.

Download and process public domain works in the Project Gutenberg collection <https://www.gutenberg.org/>. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.

gutenbergr gutenbergr website

CRANversion CRANchecks rOpenScipeer-review Project Status:Active R-CMD-check IntegrationTests Codecov testcoverage MonthlyDownloads TotalDownloads

Search, download, and process public domain texts from the Project Gutenberg collection.

Installation

Install the released version from CRAN:

install.packages("gutenbergr")

Install the development version from GitHub:

# install.packages("pak")
pak::pak("ropensci/gutenbergr")

Quick Start

Load the package and any other required libraries:

library(gutenbergr)
library(dplyr)

We’ll get and set our Project Gutenberg mirror:

gutenberg_get_mirror()
#> [1] "https://aleph.pglaf.org"

Search through the metadata to find Jane Austen’s Persuasion:

gutenberg_works(title == "Persuasion")
#> # A tibble: 1 × 8
#>   gutenberg_id title      author       gutenberg_author_id language
#>          <int> <chr>      <chr>                      <int> <fct>   
#> 1          105 Persuasion Austen, Jane                  68 en      
#>   gutenberg_bookshelf                           rights                    has_text
#>   <chr>                                         <fct>                     <lgl>   
#> 1 Category: Novels/Category: British Literature Public domain in the USA. TRUE

Persuasion’s gutenberg_id is 105. We’ll use this ID to download it and also set our cache option to "persistent" so that we don’t have to re-download it later.

options(gutenbergr_cache_type = "persistent")
persuasion <- gutenberg_download(105)
persuasion
#> # A tibble: 8,357 × 2
#>    gutenberg_id text            
#>           <int> <chr>           
#>  1          105 "Persuasion"    
#>  2          105 ""              
#>  3          105 ""              
#>  4          105 "by Jane Austen"
#>  5          105 ""              
#>  6          105 "(1818)"        
#>  7          105 ""              
#>  8          105 ""              
#>  9          105 ""              
#> 10          105 ""              
#> # ℹ 8,347 more rows

Multiple works can be downloaded at once. We’ll also download Edna St. Vincent Millay’s Renascence and Other Poems (gutenberg_id 161) and throw in title data from the metadata.

books <- gutenberg_download(c(105, 161), meta_fields = "title")
books |> count(title)
#> # A tibble: 2 × 2
#>   title                           n
#>   <chr>                       <int>
#> 1 Persuasion                   8357
#> 2 Renascence, and Other Poems  1222

Vignettes

See the following vignettes for more advanced usage of gutenbergr.

FAQ

How were the metadata files generated?

See the data-raw directory for scripts. Metadata was generated from the Project Gutenberg catalog on 13 March 2026.

Do you respect robot access rules?

Yes! The package follows Project Gutenberg’s rules:

  • Retrieves books directly from mirrors using the authorized link format
  • Prioritizes .zip files to minimize bandwidth
  • Supports session and persistent caching
  • This package is designed for downloading individual works or small collections, not the entire corpus. For bulk downloads, set up a mirror.

See their Terms of Use for details.

Contributing

See CONTRIBUTING.md.

Note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

ropensci_footer

Metadata

Version

0.5.0

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows