Innovative Methods Program for Advancing Clinical Trials (IMPACT)

Probability-enhanced sufficient dimension reduction for binary classification.

Title	Probability-enhanced sufficient dimension reduction for binary classification.
Publication Type	Journal Article
Year of Publication	2014
Authors	Shin, Seung Jun, Yichao Wu, Hao Helen Zhang, and Yufeng Liu
Journal	Biometrics
Volume	70
Issue	3
Pagination	546-55
Date Published	2014 Sep
ISSN	1541-0420
Keywords	Algorithms, Biometry, Computer Simulation, Data Interpretation, Statistical, Models, Statistical, Pattern Recognition, Automated, Regression Analysis
Abstract	In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991) Journal of the American Statistical Association 86, 316-327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.
DOI	10.1111/biom.12174
Alternate Journal	Biometrics
Original Publication	Probability-enhanced sufficient dimension reduction for binary classification.
PubMed ID	24779683
PubMed Central ID	PMC4670268
Grant List	R01 CA-085848 / CA / NCI NIH HHS / United States R01 CA085848 / CA / NCI NIH HHS / United States R01 CA149569 / CA / NCI NIH HHS / United States R01 CA-149569 / CA / NCI NIH HHS / United States P01 CA142538 / CA / NCI NIH HHS / United States

Project: