MyNixOS website logo
Description

An Interface to Google's Cloud Natural Language API.

Interact with Google's Cloud Natural Language API <https://cloud.google.com/natural-language/> (v1) via R. The API has four main features, all of which are available through this R package: syntax analysis and part-of-speech tagging, entity analysis, sentiment analysis, and language identification.

googlenlp

Travis-CI Build Status


The googlenlp package provides an R interface to Google's Cloud Natural Language API.

"Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app." [source]

There are four main features of the API, all of which are available through this R package [source]:

  • Syntax Analysis: "Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence."
  • Entity Analysis: "Identify entities and label by types such as person, organization, location, events, products and media."
  • Sentiment Analysis: "Understand the overall sentiment expressed in a block of text."
  • Multi-Language: "Enables you to easily analyze text in multiple languages including English, Spanish and Japanese."

Resources

Installation

You can install the development version from GitHub:

devtools::install_github("BrianWeinstein/googlenlp")

Authentication

To use the API, you'll first need to create a Google Cloud project and enable billing, and get an API key.

Configuration

Load the package and set your API key. There are two ways to do this.

Method A (preferred)

Method A (preferred method) adds your API key as a variable to your .Renviron file. Under this method, you only need to do this setup process one time.

library(googlenlp)

configure_googlenlp() # follow the instructions printed to the console
googlenlp setup instructions:
 1. Your ~/.Renviron file will now open in a new window/tab.
    *** If it doesn't open, run:  file.edit("~/.Renviron") ***
 2. To use the API, you'll first need to create a Google Cloud project and enable billing (https://cloud.google.com/natural-language/docs/getting-started).
 3. Next you'll need to get an API key (https://cloud.google.com/natural-language/docs/common/auth).
 4. In your  ~/.Renviron  file, replace the ENTER_YOUR_API_KEY_HERE with your Google Cloud API key.
 5. Save your ~/.Renviron file.
 6. *** Restart your R session for changes to take effect. ***

Method B

Method B defines your API key as a session-level variable. Under this method, you'll need to set your API key at the beginning of each R session.

library(googlenlp)

set_api_key("MY_API_KEY") # replace this with your API key

Getting started

Define the text you'd like to analyze.

text <- "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.
         Sundar Pichai said in his keynote that users love their new Android phones."

The annotate_text function analyzes the text's syntax (sentences and tokens), entities, sentiment, and language; and returns the result as a five-element list.

analyzed <- annotate_text(text_body = text)
#> Warning: package 'bindrcpp' was built under R version 3.4.4

str(analyzed, max.level = 1)
#> List of 5
#>  $ sentences        :Classes 'tbl_df', 'tbl' and 'data.frame':   2 obs. of  4 variables:
#>  $ tokens           :Classes 'tbl_df', 'tbl' and 'data.frame':   32 obs. of  17 variables:
#>  $ entities         :Classes 'tbl_df', 'tbl' and 'data.frame':   10 obs. of  8 variables:
#>  $ documentSentiment:'data.frame':   1 obs. of  2 variables:
#>  $ language         : chr "en"

Sentences

"Sentence extraction breaks up the stream of text into a series of sentences." [API Documentation]

  • beginOffset indicates the (zero-based) character index of where the sentence begins (wtih UTF-8 encoding).
  • The magnitude and score fields quantify each sentence's sentiment — see the Document Sentiment section for more details.
analyzed$sentences
contentbeginOffsetmagnitudescore
Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.00.00.0
Sundar Pichai said in his keynote that users love their new Android phones.1130.60.6

Tokens

"Tokenization breaks the stream of text up into a series of tokens, with each token usually corresponding to a single word. The Natural Language API then processes the tokens and, using their locations within sentences, adds syntactic information to the tokens." [API Documentation]

  • lemma indicates the token's "root" word, and can be useful in standardizing the word within the text.
  • tag indicates the token's part of speech.
  • Additional column definitions are outlined here and here.
analyzed$tokens
contentbeginOffsetlemmatagaspectcaseformgendermoodnumberpersonproperreciprocitytensevoicedependencyEdge_headTokenIndexdependencyEdge_label
Google0GoogleNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN7NSUBJ
,6,PUNCTASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN0P
headquartered8headquarterVERBASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNPASTVOICE_UNKNOWN0VMOD
in22inADPASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN2PREP
Mountain25MountainNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN5NN
View34ViewNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN3POBJ
,38,PUNCTASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN0P
unveiled40unveilVERBASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNINDICATIVENUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNPASTVOICE_UNKNOWN7ROOT
the49theDETASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN11DET
new53newADJASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN11AMOD
Android57AndroidNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN11NN
phone65phoneNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN7DOBJ
at71atADPASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN7PREP
the74theDETASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN16DET
Consumer78ConsumerNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN16NN
Electronic87ElectronicNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN16NN
Show98ShowNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN12POBJ
.102.PUNCTASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN7P
Sundar113SundarNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN19NN
Pichai120PichaiNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN20NSUBJ
said127sayVERBASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNINDICATIVENUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNPASTVOICE_UNKNOWN20ROOT
in132inADPASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN20PREP
his135hisPRONASPECT_UNKNOWNGENITIVEFORM_UNKNOWNMASCULINEMOOD_UNKNOWNSINGULARTHIRDPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN23POSS
keynote139keynoteNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN21POBJ
that147thatADPASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN26MARK
users152userNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNPLURALPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN26NSUBJ
love158loveVERBASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNINDICATIVENUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNPRESENTVOICE_UNKNOWN20CCOMP
their163theirPRONASPECT_UNKNOWNGENITIVEFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNPLURALTHIRDPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN30POSS
new169newADJASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN30AMOD
Android173AndroidNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNSINGULARPERSON_UNKNOWNPROPERRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN30NN
phones181phoneNOUNASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNPLURALPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN26DOBJ
.187.PUNCTASPECT_UNKNOWNCASE_UNKNOWNFORM_UNKNOWNGENDER_UNKNOWNMOOD_UNKNOWNNUMBER_UNKNOWNPERSON_UNKNOWNPROPER_UNKNOWNRECIPROCITY_UNKNOWNTENSE_UNKNOWNVOICE_UNKNOWN20P

Entities

"Entity Analysis provides information about entities in the text, which generally refer to named 'things' such as famous individuals, landmarks, common objects, etc... A good general practice to follow is that if something is a noun, it qualifies as an 'entity.'" [API Documentation]

  • entity_type indicates the type of entity (i.e., it classifies the entity as a person, location, consumer good, etc.).
  • mid provides a "machine-generated identifier" correspoding to the entity's Google Knowledge Graph entry.
  • wikipedia_url provides the entity's Wikipedia URL.
  • salience indicates the entity's importance to the entire text. Scores range from 0.0 (less important) to 1.0 (highly important).
  • Additional column definitions are outlined here.
analyzed$entities
nameentity_typemidwikipedia_urlsaliencecontentbeginOffsetmentions_type
GoogleORGANIZATION/m/045c7bhttps://en.wikipedia.org/wiki/Google0.2557206Google0PROPER
usersPERSONNANA0.1527633users152COMMON
phoneCONSUMER_GOODNANA0.1311989phone65COMMON
AndroidCONSUMER_GOOD/m/02wxtgwhttps://en.wikipedia.org/wiki/Android_(operating_system)0.1224526Android57PROPER
AndroidCONSUMER_GOOD/m/02wxtgwhttps://en.wikipedia.org/wiki/Android_(operating_system)0.1224526Android173PROPER
Sundar PichaiPERSON/m/09gds74https://en.wikipedia.org/wiki/Sundar_Pichai0.1141411Sundar Pichai113PROPER
Mountain ViewLOCATION/m/0r6c4https://en.wikipedia.org/wiki/Mountain_View,_California0.1019596Mountain View25PROPER
Consumer Electronic ShowEVENT/m/01p15whttps://en.wikipedia.org/wiki/Consumer_Electronics_Show0.0703438Consumer Electronic Show78PROPER
phonesCONSUMER_GOODNANA0.0338317phones181COMMON
keynoteOTHERNANA0.0175884keynote139COMMON

Document sentiment

"Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical score and magnitude values." [API Documentation]

  • score ranges from -1.0 (negative) to 1.0 (positive), and indicates to the "overall emotional leaning of the text".
  • magnitude "indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes)."

A note on how to interpret these sentiment values is posted here.

analyzed$documentSentiment
magnitudescore
0.60.3

Language

language indicates the detected language of the document. Only English ("en"), Spanish ("es") and Japanese ("ja") are currently supported by the API.

analyzed$language
#> [1] "en"
Metadata

Version

0.2.0

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows