Description
Identify Text Written by Large Language Models using 'GPTZero'.
Description
An R interface to the 'GPTZero' API (<https://gptzero.me/docs>). Allows users to classify text into human and computer written with probabilities. Formats the data into data frames where each sentence is an observation. Paragraph-level and document-level predictions are organized to align with the sentences.
README.md
gptzeror
gptzeror
provides an R interface to GPTZero API. GPTZero predicts if text was generated by “AI” like ChatGPT. It splits documents by paragraph and sentence, allowing for detection when text is partially written by “AI” and partially by humans.
Installation
You can install the development version of gptzeror
from GitHub with:
# install.packages('remotes')
remotes::install_github('christopherkenny/gptzeror')
Example
Below is an example using the abstract of Kenny, McCartan, Simko, Kuriwaki, and Imai (2023).
abstr <- 'Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To separate the partisan effects of redistricting from the effects of other factors including geography and redistricting rules, we compare possible party compositions of the U.S. House under the enacted plan to those under a set of alternative simulated plans that serve as a non-partisan baseline. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the electoral bias it creates cancels at the national level, giving Republicans two additional seats on average. Geography and redistricting rules separately contribute a moderate pro-Republican bias. Finally, we find that partisan gerrymandering reduces electoral competition and makes the partisan composition of the U.S. House less responsive to shifts in the national vote.'
We can pass text directly via gptzero_predict_text()
.
library(gptzeror)
gptzero_predict_text(abstr)
#> # A tibble: 5 × 10
#> doc_average_generated_prob doc_completely_generated_p…¹ doc_overall_burstiness
#> <dbl> <dbl> <dbl>
#> 1 0.2 0.00228 101.
#> 2 0.2 0.00228 101.
#> 3 0.2 0.00228 101.
#> 4 0.2 0.00228 101.
#> 5 0.2 0.00228 101.
#> # ℹ abbreviated name: ¹doc_completely_generated_prob
#> # ℹ 7 more variables: par_completely_generated_prob <dbl>,
#> # par_num_sentences <int>, par_start_sentence_index <int>,
#> # sentence_index <int>, generated_prob <int>, perplexity <int>,
#> # sentence <chr>
The API also accepts common file types as uploads, including .txt
, .docx
, and .pdf
. To access this endpoint, use gptzero_predict_file()
.
temp_file <- tempfile(fileext = '.txt')
cat(abstr, file = temp_file)
gptzero_predict_file(temp_file)
#> # A tibble: 5 × 10
#> doc_average_generated_prob doc_completely_generated_p…¹ doc_overall_burstiness
#> <dbl> <dbl> <dbl>
#> 1 0.2 0.00228 101.
#> 2 0.2 0.00228 101.
#> 3 0.2 0.00228 101.
#> 4 0.2 0.00228 101.
#> 5 0.2 0.00228 101.
#> # ℹ abbreviated name: ¹doc_completely_generated_prob
#> # ℹ 7 more variables: par_completely_generated_prob <dbl>,
#> # par_num_sentences <int>, par_start_sentence_index <int>,
#> # sentence_index <int>, generated_prob <int>, perplexity <int>,
#> # sentence <chr>
Additional Information
Documentation for the GPTZero API is available here.