Title | Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes. |
Publication Type | Journal Article |
Year of Publication | 2013 |
Authors | Pang, Herbert, and Sin-Ho Jung |
Journal | Genet Epidemiol |
Volume | 37 |
Issue | 3 |
Pagination | 276-82 |
Date Published | 2013 Apr |
ISSN | 1098-2272 |
Keywords | Adenocarcinoma, Adenocarcinoma of Lung, Computer Simulation, Genome-Wide Association Study, Human Genome Project, Humans, Lung Neoplasms, Microarray Analysis, Mortality, Multiple Myeloma, Proportional Hazards Models, Research Design, Sample Size, Validation Studies as Topic |
Abstract | A variety of prediction methods are used to relate high-dimensional genome data with a clinical outcome using a prediction model. Once a prediction model is developed from a data set, it should be validated using a resampling method or an independent data set. Although the existing prediction methods have been intensively evaluated by many investigators, there has not been a comprehensive study investigating the performance of the validation methods, especially with a survival clinical outcome. Understanding the properties of the various validation methods can allow researchers to perform more powerful validations while controlling for type I error. In addition, sample size calculation strategy based on these validation methods is lacking. We conduct extensive simulations to examine the statistical properties of these validation strategies. In both simulations and a real data example, we have found that 10-fold cross-validation with permutation gave the best power while controlling type I error close to the nominal level. Based on this, we have also developed a sample size calculation method that will be used to design a validation study with a user-chosen combination of prediction. Microarray and genome-wide association studies data are used as illustrations. The power calculation method in this presentation can be used for the design of any biomedical studies involving high-dimensional data and survival outcomes. |
DOI | 10.1002/gepi.21721 |
Alternate Journal | Genet Epidemiol |
Original Publication | Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes. |
PubMed ID | 23471879 |
PubMed Central ID | PMC3763900 |
Grant List | P01 CA142538 / CA / NCI NIH HHS / United States P01CA142538 / CA / NCI NIH HHS / United States |
Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes.
Project: