MyNixOS website logo
Description

Identify Text Written by Large Language Models using 'GPTZero'.

An R interface to the 'GPTZero' API (<https://gptzero.me/docs>). Allows users to classify text into human and computer written with probabilities. Formats the data into data frames where each sentence is an observation. Paragraph-level and document-level predictions are organized to align with the sentences.

gptzeror

Lifecycle:experimental R-CMD-check

gptzeror provides an R interface to GPTZero API. GPTZero predicts if text was generated by “AI” like ChatGPT. It splits documents by paragraph and sentence, allowing for detection when text is partially written by “AI” and partially by humans.

Installation

You can install the development version of gptzeror from GitHub with:

# install.packages('remotes')
remotes::install_github('christopherkenny/gptzeror')

Example

Below is an example using the abstract of Kenny, McCartan, Simko, Kuriwaki, and Imai (2023).

abstr <- 'Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To separate the partisan effects of redistricting from the effects of other factors including geography and redistricting rules, we compare possible party compositions of the U.S. House under the enacted plan to those under a set of alternative simulated plans that serve as a non-partisan baseline. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the electoral bias it creates cancels at the national level, giving Republicans two additional seats on average. Geography and redistricting rules separately contribute a moderate pro-Republican bias. Finally, we find that partisan gerrymandering reduces electoral competition and makes the partisan composition of the U.S. House less responsive to shifts in the national vote.'

We can pass text directly via gptzero_predict_text().

library(gptzeror)
gptzero_predict_text(abstr)
#> # A tibble: 5 × 10
#>   doc_average_generated_prob doc_completely_generated_p…¹ doc_overall_burstiness
#>                        <dbl>                        <dbl>                  <dbl>
#> 1                        0.2                      0.00228                   101.
#> 2                        0.2                      0.00228                   101.
#> 3                        0.2                      0.00228                   101.
#> 4                        0.2                      0.00228                   101.
#> 5                        0.2                      0.00228                   101.
#> # ℹ abbreviated name: ¹​doc_completely_generated_prob
#> # ℹ 7 more variables: par_completely_generated_prob <dbl>,
#> #   par_num_sentences <int>, par_start_sentence_index <int>,
#> #   sentence_index <int>, generated_prob <int>, perplexity <int>,
#> #   sentence <chr>

The API also accepts common file types as uploads, including .txt, .docx, and .pdf. To access this endpoint, use gptzero_predict_file().

temp_file <- tempfile(fileext = '.txt')
cat(abstr, file = temp_file)

gptzero_predict_file(temp_file)
#> # A tibble: 5 × 10
#>   doc_average_generated_prob doc_completely_generated_p…¹ doc_overall_burstiness
#>                        <dbl>                        <dbl>                  <dbl>
#> 1                        0.2                      0.00228                   101.
#> 2                        0.2                      0.00228                   101.
#> 3                        0.2                      0.00228                   101.
#> 4                        0.2                      0.00228                   101.
#> 5                        0.2                      0.00228                   101.
#> # ℹ abbreviated name: ¹​doc_completely_generated_prob
#> # ℹ 7 more variables: par_completely_generated_prob <dbl>,
#> #   par_num_sentences <int>, par_start_sentence_index <int>,
#> #   sentence_index <int>, generated_prob <int>, perplexity <int>,
#> #   sentence <chr>

Additional Information

Documentation for the GPTZero API is available here.

Metadata

Version

0.0.1

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows