R Client for the Statistics Denmark Databank API.
Installation
From CRAN:
install.packages("danstat")
From Github:
# install.packages("devtools")
devtools::install_github("ValeriVoev/danstat")
Overview
The danstat
package provides an R interface to Danmarks Statistik Statistikbank API to enable an easier access to the wealth of data in the data bank for research and the general community. The documentation of the API can be found here: Databank API.
Usage
The API has 4 endpoints which are mimicked by four main functions of the package:
get_subjects()
(SUBJECTS endpoint) retrieves information about subjects around which the data tables in the data bank are organized. The subjects are arranged hierarchically highest level like “Labour and income”, “Transport”, etc.get_subjects()
retrieves the highest level of the hierarchy. See the function documentation for more details.get_tables()
(TABLES endpoint) retrieves a list of tables associated with a given subject code. For exampleget_tables(subjects = "2")
retrieves all tables related to the subject “Labour and income” with table id, description, variables in the table, etc.get_table_metadata()
(TABLEINFO endpoint) returns information about a particular table - description, time of last update, whether or not it is actively updated, and most importantly (for practical purposes) the variable names and id’s which are needed whenever you request actual data from the table. Setvariables_only = TRUE
if you only want to get information on the table variables.get_data()
(DATA endpoint) - returns data from a selected table. It is required to include avariables
argument as a list. Each element of the list should itself be a named list (with elementscode
andvalues
) wherecode
is the variable id for which data is requested, andvalues
is a vector of values for this variable. If all values are requested, specifyvalues = NA
. For example:
library(danstat)
user_input = list(list(code = "ieland", values = c(5100, 5128)),
list(code = "køn", values = c(1,2)),
list(code = "tid", values = NA))
get_data(table_id = "folk1c", variables = user_input)
#> # A tibble: 192 x 4
#> IELAND KØN TID INDHOLD
#> <chr> <chr> <chr> <dbl>
#> 1 Denmark Men 2008Q1 2465810
#> 2 Denmark Men 2008Q2 2466036
#> 3 Denmark Men 2008Q3 2467712
#> 4 Denmark Men 2008Q4 2469977
#> 5 Denmark Men 2009Q1 2470457
#> # … with 187 more rows
Note that while default language is set to English and variable values are indeed returned in English, e.g. “Men”, column names are returned in Danish, e.g. “KØN”, “INDHOLD”, etc. Unfortunately, the API doesn’t currently provide an option to return column names (variable names) in English. However, you can get the English translation using get_table_metadata
. For example, for the above table
library(dplyr)
get_table_metadata(table_id = "folk1c", variables_only = TRUE) %>%
select(id, text)
#> id text
#> 1 OMRÅDE region
#> 2 KØN sex
#> 3 ALDER age
#> 4 HERKOMST ancestry
#> 5 IELAND country of origin
#> 6 Tid time
we can see that “Område” translates to “region”, “Køn” to “sex”, “Alder” to “age”, etc. “Indhold” is always the “value” column whenever data is returned with the get_data
function.
See also
There are (as far as I know) two other packages with similar functionality:
In the packages above, the API is called with a GET
request, while POST
is the prefrerred option of the API developers and is also what is used in this package. Also, I think that using POST
requests makes the package code more readable compared to the long url-encoded queries needed for GET
requests. Also, as of this moment, the rOpenGov package seems to not have been maintained for the past 3 years. In any case, users can consider the above 2 packages as alternatives to this one.