Description
Cora Data for Entity Resolution.
Description
Duplicated publication data (pre-processed and formatted) for entity resolution. This data set contains a total of 1879 records. The following variables are included in the data set: id, title, book title, authors, address, date, year, editor, journal, volume, pages, publisher, institution, type, tech, note. The data set has a respective gold data set that provides information on which records match based on id.
README.md
cora
Package Description
This package provides cleaned and formatted data for for entity resolution (record linkage or de-duplication) from the Cora data set. The Cora data set contains 1879 records with citation information on published papers, which includes features such as titles, authors, year published, and other information. The data set has a respective "gold" data set that provides information on which records are a match based on the id.
Package Installation
# Install the development version from GitHub
devtools::install_github(“resteorts/cora”)