Description

Constructing Joint Distributions with Control Over Statistical Properties.

Description

Synthesizing joint distributions from marginal densities, focusing on controlling key statistical properties such as correlation for continuous data, mutual information for categorical data, and inducing Simpson's Paradox. Generate datasets with specified correlation structures for continuous variables, adjust mutual information between categorical variables, and manipulate subgroup correlations to intentionally create Simpson's Paradox. Joe (1997) <doi:10.1201/b13150> Sklar (1959) <https://en.wikipedia.org/wiki/Sklar%27s_theorem>.

README.md

cran.r-project.org

covalchemy

This package provides functions for manipulating data, including techniques for:

Modifying correlations for continuous variables: The get_target_corr function allows you to adjust the Kendall's tau correlation between two continuous variables using copula-based methods. You can specify the target correlation, copula type (Gaussian or t), and inverse CDF transformation method.
Controlling entropy in categorical variables: The get_target_entropy function helps you achieve a desired level of entropy (mutual information) between two categorical variables. It works by iteratively adjusting the contingency table using simulated annealing.
Simulating Simpson's Paradox with continuous variables: The get_simpsons_paradox_c function allows you to explore the Simpson's Paradox phenomenon in continuous data. It transforms data using Gaussian copulas and simulated annealing to create a scenario where the overall trend contradicts subgroup trends.
Simulating Simpson's Paradox-like effects in categorical data: The get_simpsons_paradox_d function enables you to modify contingency tables for categorical data to create or highlight a Simpson's Paradox-like effect. It employs simulated annealing to adjust log-odds values while respecting specific constraints.

Installation

To install the package, you can use the devtools package:

# Install devtools package if not already available
if (!requireNamespace("devtools")) install.packages("devtools")
devtools::install_github("namanlab/covalchemy")

Once installed, you can load the package and use its functions in your R scripts:

library(covalchemy)

# Example 1: Modifying correlation
x <- rnorm(100)
y <- rnorm(100)
target_corr <- 0.5
res <- get_target_corr(x, y, target_corr)
modified_x <- res$x1
modified_y <- res$x2

# Example 2: Controlling entropy
df <- data.frame(x = sample(c("A", "B", "C"), 1000, replace = TRUE),
                  y = sample(c("D", "E", "F"), 1000, replace = TRUE))
target_entropy <- 1.5
result <- get_target_entropy(df$x, df$y, target_entropy)
final_df <- result$final_df

Additional notes

This readme provides a general overview of the package's functionalities. Refer to the function documentation within the package for detailed information on arguments, return values, and specific usage examples.

Acknowledgments

This package was developed as part of the DSA42288S Final Year Project. I would like to express my gratitude to my supervisor, Dr. Vikneswaran Gopal, for his invaluable guidance, support, and mentorship throughout this project. I am also grateful to the faculty and staff at NUS for their continuous support, and to my family for their encouragement along the way.

Contact

For any questions or inquiries, please contact me at [email protected].

r-covalchemy

covalchemy

Installation

Additional notes

Acknowledgments

Contact

Version

License

Status

Source

Homepage

Platforms (76)

covalchemy

Installation

Additional notes

Acknowledgments

Contact

Version

License

Status

Source

Homepage

Platforms76 (76)

Platforms (76)