Description
Jump Q-Learning for Individualized Interval-Valued Dose Rule.
Description
We provide tools to estimate the individualized interval-valued dose rule (I2DR) that maximizes the expected beneficial clinical outcome for each individual and returns an optimal interval-valued dose, by using the jump Q-learning (JQL) method. The jump Q-learning method directly models the conditional mean of the response given the dose level and the baseline covariates via jump penalized least squares regression under the framework of Q learning. We develop a searching algorithm by dynamic programming in order to find the optimal I2DR with the time complexity O(n2) and spatial complexity O(n). To alleviate the effects of misspecification of the Q-function, a residual jump Q-learning is further proposed to estimate the optimal I2DR. The outcome of interest includes the best partition of the entire dosage of interest, the regression coefficients of each partition, and the value function under the estimated I2DR as well as the Wald-type confidence interval of value function constructed through the Bootstrap.