Title | A general framework for integrative analysis of incomplete multiomics data. |
Publication Type | Journal Article |
Year of Publication | 2020 |
Authors | Lin, Dan-Yu, Donglin Zeng, and David Couper |
Journal | Genet Epidemiol |
Volume | 44 |
Issue | 7 |
Pagination | 646-664 |
Date Published | 2020 Oct |
ISSN | 1098-2272 |
Keywords | Algorithms, Data Analysis, Genomics, Genotype, Humans, Linear Models, Models, Genetic, Phenotype, Proteomics, Sequence Analysis, DNA, Sequence Analysis, RNA |
Abstract | There is a tremendous current interest in measuring multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation profiles, metabolic profiles, protein expressions) on a large number of subjects. Although genotypes are typically available for all study subjects, other data types may be measured only on a subset of subjects due to cost or other constraints. In addition, quantitative omics measurements, such as metabolite levels and protein expressions, are subject to detection limits in that the measurements below (or above) certain thresholds are not detectable. In this article, we propose a rigorous and powerful approach to handle missing values and detection limits in integrative analysis of multiomics data. We relate quantitative omics variables to genetic variants and other variables through linear regression models and relate phenotypes to quantitative omics variables and other variables through generalized linear models. We derive the joint-likelihood for the two sets of models by allowing arbitrary patterns of missing values and detection limits for quantitative omics variables. We carry out maximum-likelihood estimation through computationally fast and stable algorithms. The resulting estimators are approximately unbiased and statistically efficient. An application to a major study on chronic obstructive lung disease yielded new biological insights. |
DOI | 10.1002/gepi.22328 |
Alternate Journal | Genet Epidemiol |
Original Publication | A general framework for integrative analysis of incomplete multiomics data. |
PubMed ID | 32691502 |
PubMed Central ID | PMC7951090 |
Grant List | HHSN268200900019C / HL / NHLBI NIH HHS / United States R01 HG009974 / HG / NHGRI NIH HHS / United States HHSN268200900015C / HL / NHLBI NIH HHS / United States HHSN268200900016C / HL / NHLBI NIH HHS / United States U01 HL137880 / HL / NHLBI NIH HHS / United States HHSN268200900018C / HL / NHLBI NIH HHS / United States HHSN268200900013C / HL / NHLBI NIH HHS / United States P01 CA142538 / CA / NCI NIH HHS / United States HHSN268200900017C / HL / NHLBI NIH HHS / United States HHSN268200900020C / HL / NHLBI NIH HHS / United States HHSN268200900014C / HL / NHLBI NIH HHS / United States |