MyNixOS website logo
Description

Tools for Easily Combining and Cleaning Data Sets.

Tools for combining and cleaning data sets, particularly with grouped and time series data.

DataCombine

Christopher Gandrud

Version 0.2.21 Build Status CRAN Version CRAN Downloads CRAN Total Downloads

Please report any bugs or suggestions at: https://github.com/christophergandrud/DataCombine/issues.

Motivation and Functions

DataCombine is a set of miscellaneous tools intended to make combining data sets--especially time-series cross-section data--easier. The package is continually being developed as I turn lines of code that I frequently use into single functions. It currently includes the following functions:

  • CasesTable function added to report cases after listwise deletion of missing values for time-series cross-sectional data.

  • change: calculates the absolute, percentage, and proportion change from a specified lag, including within groups.

  • CountSpell: function that returns a variable counting the spell number for an observation. Works with grouped data.

  • dMerge: merges 2 data frames and report/drop/keeps only duplicates.

  • DropNA: drops rows from a data frame when they have missing (NA) values on a given variable(s).

  • FillDown: fills in missing (NA) values with the previous non-missing value

  • FillIn: fills in missing values of a variable from one data frame with the values from another variable.

  • FindDups: find duplicated values in a data frame and subset it to either include or not include them.

  • FindReplace: replaces multiple patterns found in a character string column of a data frame.

  • grepl.sub: subsets a data frame if a specified pattern is found in a character string.

  • InsertRow: allows user to insert a row into a data frame. Largely implements: Ari B. Friedman's function.

  • MoveFront: moves variables to the front of a data frame. This can be useful if you have a data frame with many variables and want to move a variable or variables to the front.

  • NaVar: create new variable(s) indicating if there are missing values in other variable(s).

  • shift: creates lag and lead variables, including for time-series cross-sectional data. The shifted variable is returned to a new vector. This function is largely based on TszKin Julian's shift function.

  • slide: creates lag and lead variables, including for time-series cross-sectional data. The slid variable are added to the original data frame. This expands the capabilities of shift.

  • slideMA: creates a moving average for a period before or after each time point for a given variable.

  • SpreadDummy: spread a dummy variable (1's and 0') over a specified time period and for specified groups.

  • StartEnd: finds the starting and ending time points of a spell, including for time-series cross-sectional data.

  • rmExcept: removes all objects from a workspace except those specified by the user.

  • TimeExpand: expands a data set so that it includes an observation for each time point in a sequence. Works with grouped data.

  • TimeFill: creates a continuous Unit-Time-Dummy data frame from a data frame with Unit-Start-End times.

  • VarDrop: drops one or more variables from a data frame.

Updates

I will continue to add to the package as I build data sets and run across other pesky tasks I do repeatedly that would be simpler if they were completed by a single function.

Installation

DataCombine is on CRAN.

You can also install the most recent stable version with install_github from the devtools:

devtools::install_github('christophergandrud/DataCombine')
Metadata

Version

0.2.21

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows