MyNixOS website logo
Description

Two-Level Sample Selection with Optimal Site Replacement.

Carries out a two-level sample selection where the possibility of an initially selected site not wanting to participate is anticipated, and the site is optimally replaced. The procedure aims to reduce bias (and/or loss of external validity) with respect to the target population. In selecting units and sub-units, 'sitepickR' uses the cube method developed by 'Deville & Tillé', (2004) <http://www.math.helsinki.fi/msm/banocoss/Deville_Tille_2004.pdf> and described in Tillé (2011) <https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2011002/article/11609-eng.pdf?st=5-sx8Q8n>. The cube method is a probability sampling method that is designed to satisfy criteria for balance between the sample and the population. Recent research has shown that this method performs well in simulations for studies of educational programs (see Fay & Olsen (2021, under review). To implement the cube method, 'sitepickR' uses the sampling R package <https://cran.r-project.org/package=sampling>. To implement statistical matching, 'sitepickR' uses the 'MatchIt' R package <https://cran.r-project.org/package=MatchIt>.

sitepickR: A Generalizability R package

Elena Badillo-Goicoechea, Robert Olsen, and Elizabeth A. Stuart (2022)

Introduction

sitepickR is designed to select a representative sample of sites for a prospective impact evaluation, such as a randomized controlled trial (RCT). This package is designed for selecting schools but can be used to select any type of site defined by geography or administrative responsibility (e.g., county, job training center, health clinic).

In addition, sitepicker is designed to select sites in two stages —first “units” that contain multiple sites located nearby and/or under the same administration and then “sub-units” containing individual sites. Two-stage sampling may be necessary when the cost of the study depend on the number of units (e.g., school districts), and a one-stage sample of sub-units (e.g., schools) would likely contain more units than the study can afford. Further, the main function in this package, selectMatch, lets the user carry out a two-level sample selection where the possibility of an initially selected participant not wanting to participate is anticipated. The procedure aims to reduce the bias —and/or loss of generalizability to the population— this could introduce. This enhances the functionality of related packages, such as generalizeR.

In selecting units and sub-units, sitepickR uses the cube method (e.g., Deville & Tillé, 2004; Tillé 2011). The cube method is a probability sampling method that is designed to satisfy criteria for balance between the sample and the population. Recent research has shown that this method performs well in simulations for studies of educational programs (Fay & Olsen, under review). To implement the cube method, sitepickerR uses the sampling R package. Users have the option to select units with equal probabilities or with probabilities proportional to their “size” measured in terms of the number of sub-units nested within units.

In addition, sitepickR uses statistical matching to select possible replacement units. In education RCTs, the share of selected districts that agrees to participate tends to be low. To address this challenge, sitepickR selects and ranks up to K replacement districts for each districts selected using the cube method. Replacement districts are selected using statistical matching based on propensity score methods. To implement statistical matching, sitepickR uses the MatchIt R package.

Usage

Install seamlessly using the devtools package:

install.packages("sitepickR", repos = "http://cran.us.r-project.org")

sitepickR's sampling & matching procedure, implemented in its selectMatch() function, consists of four main steps:

  • Study sample design: identify: 1) a target population that has a nested structure (e.g. [school district:school]); 2) observable covariates on interest at the unit level; 3) observable covariates of interest at the sub-unit level.
  • Randomly select an initial sample of units from the target population.
  • Obtain a list of 'best' matches (e.g. replacement candidates) for each initially selected unit, in terms of the covariates of interest.
  • Assess balance and match quality in terms of the covariates of interest:
    • Balance between initially selected ('original') units and the target population.
    • Balance between original units and each group of matches (1 to K, from closest to further).
    • Balance between sub-units associated to each unit replacement group and the original sub-units in the population, in terms of available covariates of interest, both at the unit and sub-unit level.

References

Deville, J. C., & Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91, 893-912.

Tillé, Y. (2011). Ten years of balanced sampling with the cube method: An appraisal. Survey Methodology, 37, 215-226.

Questions/troubleshooting?

Reach out at: - [email protected] - [email protected] - [email protected]

	

Acknowledgements

Noah Greifer, maintainer of the MatchIt package for his valuable input.

Metadata

Version

0.0.1

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows