MyNixOS website logo
Description

Convert Addresses to Standard Inputs.

Efficient tools for parsing and standardizing Australian addresses from textual data. It utilizes optimized algorithms to accurately identify and extract components of addresses, such as street names, types, and postcodes, especially for large batched data in contexts where sending addresses to internet services may be slow or inappropriate. The core functionality is built on fast string processing techniques to handle variations in address formats and abbreviations commonly found in Australian address data. Designed for data scientists, urban planners, and logistics analysts, the package facilitates the cleaning and normalization of address information, supporting better data integration and analysis in urban studies, geography, and related fields.

healthyAddress

Intelligent and fast parsing of Australian addresses

A common problem when dealing with Australian addresses is that they are often recorded as strings as they appear on an envelope. For example,

1408/170 The Esplanade St Kilda VIC 3182

In order to match with data, such as the PSMA, we often want to extract the components of this address. For example, we want to extract the flat number (1408) and the postcode (3182). Problems arise in both performance and intelligently parsing this address. In the above, we want to recognize that 'St' refers to 'Saint Kilda' not 'Street'. The package healthyAddress attempts to provide fast and intelligent parsing of Australian addresses.

The main function is standardize_address:

library(healthyAddress)
standardize_address("1408/170 The Esplanade St Kilda VIC 3182")
#>    FLAT_NUMBER NUMBER_FIRST NUMBER_LAST NUMBER_SUFFIX   STREET_NAME
#>          <int>        <int>       <int>         <raw>        <char>
#> 1:        1408          170           0            00 THE ESPLANADE
#>    STREET_TYPE_CODE POSTCODE STREET_TYPE
#>               <int>    <int>      <char>
#> 1:                0     3182        <NA>

Created on 2024-01-31 by the reprex package (v2.0.1)

There are two arguments to the function that affect performance,

  • hash_StreetName: instead of returning the street name as a string, return an integer. This can be useful when performing merges (which are faster on integer vectors), by applying HashStreetName to the foreign table's street name.
  • integer_StreetType: instead of returning the street type as a string, return an integer.
  • check performs a check on the input. Setting to zero can improve performance on input that has already checked.
Metadata

Version

0.4.3

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows