MyNixOS website logo
Description

Extract Drug Dosages from Free-Text Prescriptions.

Utilities for converting unstructured electronic prescribing instructions into structured medication data. Extracts drug dose, units, daily dosing frequency and intervals from English-language prescriptions. Based on Karystianis et al. (2015) <doi:10.1186/s12911-016-0255-x>.

R package doseminer

David Selby and Belay Birlie

An R implementation of the text mining algorithm of Karystianis et al. (2015) for extracting drug dosage information from electronic prescription data (especially from CPRD). The aim of this project is to provide a complete replacement for the algorithm, entirely written in R with no external dependencies (unlike the original implementation, which depended on Python and Java). This should make the tool more portable, extensible and suitable for use across different platforms (Windows, Mac, Unix).

Installation

You can install doseminer from CRAN using

install.packages('doseminer')

or get the latest development version via GitHub:

# install.packages('remotes')
remotes::install_github('Selbosh/doseminer')

Usage

The workhorse function is called extract_from_prescription. Pass it a character vector of freetext prescriptions and it will try to extract the following variables:

  • Dose frequency (the number of times per day a dose is administered)
  • Dose interval (the number of days between doses)
  • Dose unit (how individual doses are measured, e.g. millilitres, tablets)
  • Dose number (how many of those units comprise a single dose, e.g. 2 tablets)
  • Optional (should the dose only be taken ‘if required’ / ‘as needed’?)
library(doseminer)
extract_from_prescription('take two and a half tablets every two to three days as needed')
rawoutputfreqitvldoseunitoptional
take two and a half tablets every two to three days as needed2.5 tab12-32.5tab1

Anything not matched is returned as NA, though some inferences are also made. For instance: if a dosage is specified as multiple times per day, with no explicit interval between days, it’s inferred the interval is one day. Similarly, if an interval is specified (e.g. every 3 days) but not a daily frequency, it’s presumed the dose is taken only once during the day.

To see the package in action, a small vector of example prescriptions is included in the variable example_prescriptions.

extract_from_prescription(example_prescriptions)
rawoutputfreqitvldoseunitoptional
1 tablet to be taken daily1 tab to be taken111tab0
2.5ml four times a day when required2.5 ml412.5ml1
1.25mls three times a day1.25 ml311.25ml0
take 10mls q.d.s. p.r.n.10 ml1110ml1
take 1 or 2 4 times/day1 - 2411-2NA0
2x5ml spoon 4 times/day2 x 5 ml spoonful4110ml spoonful0
take 2 tablets every six hours max eight in twenty four hours2 tab 0 - 8 in 24 hours412tab0
1 tab nocte twenty eight tablets1 tab 28 tab111tab0
1-2 four times a day when required1 - 2411-2NA1
take one twice daily1211NA0
1 q4h prn1611NA1
take two every three days2132NA0
five every week5175NA0
every 72 hours13NANA0
1 x 5 ml spoon 4 / day for 10 days1 x 5 ml spoonful for 10 days415ml spoonful0
two to three times a day2-31NANA0
three times a week12-3NANA0
three 5ml spoonsful to be taken four times a day after food3 x 5 ml spoonful to be taken after food4115ml spoonful0
take one or two every 4-6 hrs1 - 24-611-2NA0
5ml 3 hrly when required5 ml815ml1
one every morning to reduce bp1 to reduce bp111NA0
take 1 or 2 6hrly when required1 - 2411-2NA1
take 1 or 2 four times a day as required for pain1 - 2 for pain411-2NA1
take 1 or 2 4 times/day if needed for pain1 - 2 for pain411-2NA1
1-2 tablets up to four times daily1 - 2 tab0-411-2tab1
take one or two tablets 6-8 hrly every 2-3 days1 - 2 tab3-42-31-2tab0
one and a half tablets every three hours1.5 tab811.5tab0

The column output represents the ‘residual’ text after other features have been extracted. It can be ignored for most applications, but is useful for debugging prescriptions that have not been parsed as expected.

English words to numbers

Built into this package is a series of functions for extracting and parsing natural language English numbers into their digit-based numeric form. This could be spun out into its own package for more general use.

replace_numbers(c('Thirty seven bottles of beer on the wall',
                  'Take one down, pass it around',
                  'Thirty-six bottles of beer on the wall!',
                  'One MILLION dollars.',
                  'We do not take any half measures'))
## [1] "37 bottles of beer on the wall"  "Take 1 down, pass it around"    
## [3] "36 bottles of beer on the wall!" "1e+06 dollars."                 
## [5] "We do not take any 0.5 measures"

Inspired by Ben Marwick’s words2number (https://github.com/benmarwick/words2number).

Contributors

Maintained by David Selby ([email protected]) and Belay Birlie.

References

Karystianis, G., Sheppard, T., Dixon, W.G. et al. Modelling and extraction of variability in free-text medication prescriptions from an anonymised primary care electronic medical record research database. BMC Med Inform Decis Mak16, 18 (2015).
https://doi.org/10.1186/s12911-016-0255-x.

Metadata

Version

0.1.2

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows