Description
Wrappers Around Stanford CoreNLP Tools.
Description
Provides a minimal interface for applying annotators from the 'Stanford CoreNLP' java library. Methods are provided for tasks such as tokenisation, part of speech tagging, lemmatisation, named entity recognition, coreference detection and sentiment analysis.
README.md
coreNLP
To download the R library and corresponding CoreNLP java library, run the following in R:
devtools::install_github("statsmaths/coreNLP")
coreNLP::downloadCoreNLP()
Then, in order to run the package, the following code initializes rJava correctly (if run from the same directory as the above):
library(coreNLP)
initCoreNLP()
As a simple example of usage, this how to annotate the opening lines to "The Cat in the Hat":
catInHat = c("the sun did not shine.", "it was too wet to play.",
"so we sat in the house all that cold, cold, wet day.")
output = annotateString(catInHat)
Which gives the following information via getToken
, getDependency
and getSentiment
:
> getToken(output)[,c(1:3,6:7)]
sentence word lemma POS NER
1 1 the the DT O
2 1 sun sun NN O
3 1 did do VBD O
4 1 not not RB O
5 1 shine shine VB O
6 1 . . . O
7 2 it it PRP O
8 2 was be VBD O
9 2 too too RB O
10 2 wet wet JJ O
11 2 to to TO O
12 2 play play VB O
13 2 . . . O
14 3 so so IN O
15 3 we we PRP O
16 3 sat sit VBD O
17 3 in in IN O
18 3 the the DT O
19 3 house house NN O
20 3 all all DT O
21 3 that that DT O
22 3 cold cold JJ O
23 3 , , , O
24 3 cold cold JJ O
25 3 , , , O
26 3 wet wet JJ O
27 3 day day NN DURATION
28 3 . . . O
> getDependency(output)
sentence governor dependent type governorIdx dependentIdx
1 1 ROOT shine root 0 5
2 1 sun the det 2 1
3 1 shine sun nsubj 5 2
4 1 shine did aux 5 3
5 1 shine not neg 5 4
6 2 ROOT wet root 0 4
7 2 wet it nsubj 4 1
8 2 wet was cop 4 2
9 2 wet too advmod 4 3
10 2 play to aux 6 5
11 2 wet play xcomp 4 6
12 3 ROOT sat root 0 3
13 3 sat so dep 3 1
14 3 sat we nsubj 3 2
15 3 sat in prep 3 4
16 3 house the det 6 5
17 3 in house pobj 4 6
18 3 sat all tmod 3 7
19 3 day that det 14 8
20 3 day cold amod 14 9
21 3 day cold amod 14 11
22 3 day wet amod 14 13
23 3 all day dep 7 14
> getSentiment(output)
id sentimentValue sentiment
1 1 1 Negative
2 2 1 Negative
3 3 1 Negative