MyNixOS website logo
Description

Korean National Assembly Data for Political Science Education.

Provides ready-to-use datasets from the Korean National Assembly (assemblies 20 through 22, 2016-2026) for teaching quantitative methods in political science. Includes legislator metadata, bill proposals, roll call votes, asset declarations, and policy seminar records. Designed as a Korean politics counterpart to packages like 'palmerpenguins', enabling students to practice regression, panel data analysis, text analysis, and network analysis with real legislative data. Roll call vote data and spatial voting models are described in Poole and Rosenthal (1985) <doi:10.2307/2111172>. Legislative data is sourced from the Korean National Assembly Open API.

assemblykor

CRAN status CRAN downloads R-CMD-check

Korean National Assembly data for political science education.

Documentation: https://kyusik-yang.github.io/assemblykor/

Overview

The goal of assemblykor is to provide a curated collection of Korean National Assembly datasets for teaching quantitative methods in political science. Think of it as a Korean politics counterpart to palmerpenguins.

The package includes seven built-in datasets covering legislators, bills, asset declarations, policy seminars, committee speeches, plenary votes, and roll call records, all drawn from public data of the Korean National Assembly (2000-2026).

Why tidyverse?

This package and all its tutorials follow tidyverse conventions. We use dplyr, ggplot2, tidyr, and the pipe operator (%>%) throughout.

Why tidyverse-first for teaching?

  • Readability: tidyverse syntax reads like a sequence of data operations, reducing cognitive load for beginners (Cetinkaya-Rundel et al., 2022).
  • Consistency: Packages share a common grammar and data structure, so learning one makes the next easier (Wickham et al., 2019).
  • Doing things quickly: Students can perform meaningful data analysis from day one, rather than spending weeks on base R syntax (Robinson, 2017).
  • Community standard: Most modern R textbooks, including R for Data Science, adopt tidyverse as the default, making it easier for students to find help and resources.

Note: This package is designed for educational purposes only. The datasets have been processed and curated for classroom use and may not reflect the most up-to-date or complete records. For research or policy analysis, please verify against the original data sources listed in the Data sources section below.

Meet the data

legislators 947 records

20-22대 국회의원 메타데이터 (이름, 정당, 선거구, 성별, 선수, 발의 건수)

bills 60,925 records

법안 메타데이터 (제목, 위원회, 발의일, 처리 결과, 대표발의자)

wealth 2,928 records

의원 재산신고 패널 (순자산, 부동산, 예금, 주식, 13개 시점)

seminars 5,962 records

정책세미나 활동 패널 (개최 건수, 초당적 비율, 여당 여부)

speeches 15,843 records

22대 과학기술정보방송통신위원회 상임위 회의록 전수 (발언 전문, 화자 역할 포함)

votes 8,050 records

본회의 표결 결과 (20-22대, 찬성/반대/기권 수, 의결 결과)

roll_calls 383,739 records

22대 개별 의원 표결 기록 (의원별 찬성/반대/기권/불참, 1,286개 법안)

Installation

# Install from CRAN
install.packages("assemblykor")

# Or install the development version from GitHub
# install.packages("remotes")
# remotes::install_github("kyusik-yang/assemblykor")

Usage

library(assemblykor)

data(legislators)
data(bills)
data(wealth)
data(seminars)
data(speeches)
data(votes)
data(roll_calls)

Example: party composition by gender

library(dplyr)

legislators %>%
  filter(assembly == 22) %>%
  count(party, gender) %>%
  filter(n >= 3)

Example: wealth distribution

library(ggplot2)

ggplot(wealth, aes(x = net_worth / 1e6)) +
  geom_histogram(bins = 50, fill = "steelblue") +
  labs(x = "Net worth (billion KRW)", y = "Count",
       title = "Distribution of legislator wealth")

Example: bill outcomes

bills %>%
  count(result, sort = TRUE) %>%
  head(5)
#>               result     n
#> 1     임기만료폐기 30678
#> 2     대안반영폐기 13692
#> 3         수정가결  2222
#> 4         원안가결  1048
#> 5             철회   578

Korean font setup

ggplot2 plots with Korean text (axis labels, titles) may show broken characters. Run this once per session to fix it:

set_ko_font()
#> Korean font set: Apple SD Gothic Neo

This auto-detects a Korean font on your system (macOS, Windows, Linux). The interactive tutorials (run_tutorial()) handle this automatically.

Downloadable datasets

Larger datasets are available via download functions (requires the arrow package):

# Bill propose-reason texts (60,925 texts, ~40 MB download)
texts <- get_bill_texts()

# Co-sponsorship records (769,773 rows, ~25 MB download)
proposers <- get_proposers()

Tutorials (한국어 수업 자료)

The package includes nine Korean-language tutorials designed for classroom use in political science methods courses:

#TutorialTopicLevel
1R 기초와 tidyversedplyr 핵심 함수, 파이프, 데이터 결합Beginner
2ggplot2 시각화막대, 히스토그램, 산점도, 박스플롯, facetBeginner
3회귀분석OLS, 다중회귀, 로그 변환, 상호작용, 계수 시각화Intermediate
4패널 데이터와 고정효과합동 OLS vs FE, 양방향 FE, DiD, 군집 표준오차Intermediate
5텍스트 분석 입문키워드 빈도, TF-IDF, 발언 분석, 위원회별 비교Intermediate
6네트워크 분석공동발의 네트워크, 중심성, 커뮤니티 탐지, 초당적 분석Advanced
7기명투표 분석Rice Index, 이탈투표, 투표 히트맵, 정당 응집력Advanced
8법안 가결 요인이항 변수, 로지스틱 회귀, 승산비, 예측 확률Intermediate
9발언 패턴 분석발언 빈도/길이, 발언 순서, 의제 키워드, 다양성Advanced

Each tutorial is available in two formats:

Option A: Interactive browser (recommended for self-study)

# Launch an interactive tutorial with exercises in the browser
run_tutorial(1)  # or run_tutorial("01-tidyverse-basics")

Requires the learnr package (install.packages("learnr")). Students can type and run code directly in the browser with hints and solutions.

Option B: Plain R Markdown (for editing in RStudio)

# List available tutorials
list_tutorials()

# Copy an Rmd file to your working directory
open_tutorial(1)  # or open_tutorial("01-tidyverse-basics")

Students can edit and knit the Rmd file in RStudio at their own pace.

Joining datasets

All datasets share the member_id and/or assembly columns for easy joining:

# Merge legislators with wealth data
leg_wealth <- legislators %>%
  inner_join(wealth, by = "member_id", relationship = "many-to-many")

CSV access

CSV versions of the smaller datasets are available for teaching file I/O:

# Find the file path
path_to_file("legislators.csv")

# Read directly
legislators_csv <- read.csv(path_to_file("legislators.csv"), fileEncoding = "UTF-8")

Data sources

All data in this package are derived from publicly available sources.

DatasetSourceLicense
Legislators, bills, proposers, votes, roll callsOpen National Assembly Information API (open.assembly.go.kr)Public domain (Korean government open data)
Speeches (committee minutes)Open National Assembly Information API (open.assembly.go.kr)Public domain (Korean government open data)
Asset declarationsOpenWatchCC BY-SA 4.0
Policy seminarsNational Assembly Seminar DatabasePublic data

Inspired by: palmerpenguins (Horst, Hill & Gorman, 2020), which demonstrates how domain-specific datasets can make methods teaching more engaging.

Disclaimer

This package is intended for educational use only (teaching quantitative methods in political science courses). The datasets have been processed, filtered, and in some cases sampled for classroom convenience. They should not be treated as authoritative records.

For research or policy analysis, please consult the original data sources listed above and verify the data independently.

Citation

citation("assemblykor")
Yang, Kyusik (2026). assemblykor: Korean National Assembly Data for
Political Science Education. R package version 0.1.1.
https://CRAN.R-project.org/package=assemblykor

License

MIT (package code). Data licenses as noted above.

Metadata

Version

0.1.2

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows