Universal Messy Panel Data Cleaner.
TidyPanel
TidyPanel is an industrial-grade parser designed to extract clean, standardized data frames from heavily malformed, human-readable Excel reports. If you have ever struggled to parse financial statements, ERP exports, or complex tables with N-dimensional headers, decoy rows, and embedded subtotals, TidyPanel is built for you.
Installation
You can install the development version of TidyPanel from GitHub with:
# install.packages("devtools")
devtools::install_github("TonyL/TidyPanel")
Why TidyPanel?
Real-world commercial data is rarely tidy. It contains multi-line merged headers, embedded subtotal rows, empty "ghost" columns, and categorical hierarchies defined purely by visual indentation (e.g., in a Profit & Loss statement).
TidyPanel uses a multi-phase heuristic engine to:
- Bypass Decoy Rows: Skips irrelevant metadata at the top of the sheet.
- N-Dimensional Header Stitching: Intelligently identifies multiline headers and forward-fills merged cells to create flat, readable column names.
- Indentation Hierarchy Extraction: Uses leading spaces to identify parent-child categorical relationships and creates a
parent_categorycolumn. - Smart Data Amputation: Automatically removes decorative page breaks, mid-table subtotals, and ghost aggregate rows.
- Auto Pivot: Detects temporal columns (e.g., Q1, 2021, FY23) and pivots them into a tidy, long format.
- Semantic Cleaners: Automatically detects and normalizes accounting dashes, non-standard scientific notation, and financial multipliers (e.g., "1.5M" -> 1500000).
100% Transparency: The Audit Log
Data parsing should never be a "black box". TidyPanel features a unique Audit Log that explicitly records every transformation applied to your data. By setting return_audit = TRUE, you get both the cleaned data and a precise ledger of what was modified.
library(TidyPanel)
# Read data and get an audit trail
result <- read_messy_panel("data_raw/financial_report.xlsx", return_audit = TRUE)
# View the audit trail
print(result$audit)
=== Algorithm Modification Audit Log ===
Operation Count
1 Decoy Rows Bypassed 5
2 Indentation Hierarchy Extracted 3
3 Ghost Bottom Rows Dropped 4
License
MIT © Tony Lu.