Description
FlashText Algorithm for Finding and Replacing Words.
Description
Implementation of the FlashText algorithm, by Singh (2017) <arXiv:1711.00046>. It can be used to find and replace words in a given text with only one pass over the document.
README.md
rflashtext
rflashtextcan be used to find and replace words in a given text with only one pass over the document.
It’s a R implementation of the FlashText algorithm and it’s inspired on the python library flashtext.
Installation
You can install the released version of rflashtext from CRAN with:
install.packages("rflashtext")
And the development version from GitHub with:
install.packages("devtools")
devtools::install_github("AbrJA/rflashtext")
Example
This is a basic example which shows you how to use the API:
New processor
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$show_trie()
#> [1] "{\"L\":{\"A\":{\"_word_\":\"Los Angeles\"}},\"N\":{\"Y\":{\"_word_\":\"New York\"}}}"
Add keys-words to processor
processor$add_keys_words(keys = c("TX", "CA"), words = c("Texas", "California"))
processor$show_trie()
#> [1] "{\"C\":{\"A\":{\"_word_\":\"California\"}},\"L\":{\"A\":{\"_word_\":\"Los Angeles\"}},\"N\":{\"Y\":{\"_word_\":\"New York\"}},\"T\":{\"X\":{\"_word_\":\"Texas\"}}}"
Find keys in a sentence
words_found <- processor$find_keys(sentences = c("I live in LA and I like NY", "Have you been in TX?"))
words_found
#> [[1]]
#> [[1]]$word
#> [1] "Los Angeles" "New York"
#>
#> [[1]]$start
#> [1] 11 25
#>
#> [[1]]$end
#> [1] 12 26
#>
#>
#> [[2]]
#> [[2]]$word
#> [1] "Texas"
#>
#> [[2]]$start
#> [1] 18
#>
#> [[2]]$end
#> [1] 19
data.table::rbindlist(words_found)
#> word start end
#> 1: Los Angeles 11 12
#> 2: New York 25 26
#> 3: Texas 18 19
Replace keys in a sentence
processor$replace_keys(sentences = c("I live in LA and I like NY", "Have you been in TX?"))
#> [1] "I live in Los Angeles and I like New York"
#> [2] "Have you been in Texas?"
To see more details about the performance of the algorithm, click here.