Access Brazilian Public Health Data.
healthbR
Overview
healthbR provides easy access to Brazilian public health data directly from R. The package downloads, caches, and processes data from official sources, returning clean, analysis-ready tibbles following tidyverse conventions.
Surveys (IBGE / Ministry of Health)
| Module | Description | Years |
|---|---|---|
| VIGITEL | Surveillance of Risk Factors for Chronic Diseases by Telephone Survey | 2006--2024 |
| PNS | National Health Survey (microdata + SIDRA API) | 2013, 2019 |
| PNAD Continua | Continuous National Household Sample Survey | 2012--2024 |
| POF | Household Budget Survey (food security, consumption, anthropometry) | 2002--2018 |
| Censo | Population denominators via SIDRA API | 1970--2022 |
DATASUS (Ministry of Health FTP)
| Module | Description | Granularity | Years |
|---|---|---|---|
| SIM | Mortality Information System (deaths) | Annual/UF | 1996--2024 |
| SINASC | Live Birth Information System | Annual/UF | 1996--2024 |
| SIH | Hospital Information System (admissions) | Monthly/UF | 2008--2024 |
| SIA | Outpatient Information System (13 file types) | Monthly/type/UF | 2008--2024 |
DATASUS modules download .dbc files (compressed DBF) and decompress them internally using vendored C code -- no external dependencies required.
Installation
You can install the development version of healthbR from GitHub:
# install.packages("pak")
pak::pak("SidneyBissoli/healthbR")
Quick start
library(healthbR)
# see all available data sources
list_sources()
DATASUS modules
All DATASUS modules follow a consistent API: *_years(), *_info(), *_variables(), *_dictionary(), *_data(), *_cache_status(), *_clear_cache().
# mortality data -- deaths in Acre, 2022
obitos <- sim_data(year = 2022, uf = "AC")
# filter by cause of death (CID-10 prefix)
obitos_cardio <- sim_data(year = 2022, uf = "AC", cause = "I")
# live births in Acre, 2022
nascimentos <- sinasc_data(year = 2022, uf = "AC")
# hospital admissions in Acre, January 2022
internacoes <- sih_data(year = 2022, month = 1, uf = "AC")
# filter by diagnosis (CID-10 prefix)
intern_resp <- sih_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")
# outpatient production in Acre, January 2022
ambulatorial <- sia_data(year = 2022, month = 1, uf = "AC")
# different file type (e.g., high-cost medications)
medicamentos <- sia_data(year = 2022, month = 1, uf = "AC", type = "AM")
Survey modules
# VIGITEL telephone survey
vigitel <- vigitel_data(year = 2024)
# PNS national health survey
pns <- pns_data(year = 2019)
# PNAD Continua
pnadc <- pnadc_data(year = 2023, quarter = 1)
# POF household budget survey
pof <- pof_data(year = 2018, register = "morador")
# Census population
pop <- censo_populacao(year = 2022, territorial_level = "state")
Explore variables and dictionaries
# list variables for any module
sim_variables()
sia_variables(search = "sexo")
# data dictionary with category labels
sim_dictionary("SEXO")
sia_dictionary("PA_RACACOR")
Caching
All modules cache downloaded data automatically. Install arrow for optimized Parquet caching:
install.packages("arrow")
Each module provides cache management functions:
# check what is cached
sim_cache_status()
sih_cache_status()
sia_cache_status()
# clear cache for a module
sim_clear_cache()
Data sources
All data is downloaded from official Brazilian government repositories:
- VIGITEL: https://svs.aids.gov.br/daent/cgdnt/vigitel/
- PNS / PNAD Continua / POF: https://www.ibge.gov.br/
- Censo: SIDRA API (https://apisidra.ibge.gov.br/)
- SIM / SINASC / SIH / SIA: DATASUS FTP (
ftp://ftp.datasus.gov.br/dissemin/publicos/)
Citation
If you use healthbR in your research, please cite it:
citation("healthbR")
Contributing
Contributions are welcome! Please open an issue to discuss proposed changes or submit a pull request.
Code of Conduct
Please note that the healthbR project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
MIT © Sidney da Silva Pereira Bissoli.