Improved detection of epigenomic marks with mixed-effects hidden Markov models.

TitleImproved detection of epigenomic marks with mixed-effects hidden Markov models.
Publication TypeJournal Article
Year of Publication2019
AuthorsBaldoni, Pedro L., Naim U. Rashid, and Joseph G. Ibrahim
JournalBiometrics
Volume75
Issue4
Pagination1401-1413
Date Published2019 Dec
ISSN1541-0420
KeywordsBinding Sites, Computer Simulation, DNA, DNA-Binding Proteins, Epigenomics, High-Throughput Nucleotide Sequencing, Humans, Markov Chains, Sequence Analysis, DNA
Abstract

Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is a technique to detect genomic regions containing protein-DNA interaction, such as transcription factor binding sites or regions containing histone modifications. One goal of the analysis of ChIP-seq experiments is to identify genomic loci enriched for sequencing reads pertaining to DNA bound to the factor of interest. The accurate identification of such regions aids in the understanding of epigenomic marks and gene regulatory mechanisms. Given the reduction of massively parallel sequencing costs, methods to detect consensus regions of enrichment across multiple samples are of interest. Here, we present a statistical model to detect broad consensus regions of enrichment from ChIP-seq technical or biological replicates through a class of zero-inflated mixed-effects hidden Markov models. We show that the proposed model outperforms existing methods for consensus peak calling in common epigenomic marks by accounting for the excess zeros and sample-specific biases. We apply our method to data from the Encyclopedia of DNA Elements and Roadmap Epigenomics projects and also from an extensive simulation study.

DOI10.1111/biom.13083
Alternate JournalBiometrics
Original PublicationImproved detection of epigenomic marks with mixed effects hidden Markov models.
PubMed ID31081192
PubMed Central IDPMC6851437
Grant ListP30 ES010126 / ES / NIEHS NIH HHS / United States
P30 CA016086 / CA / NCI NIH HHS / United States
P50 CA058223 / CA / NCI NIH HHS / United States
R01 GM070335 / GM / NIGMS NIH HHS / United States
P01 CA142538 / CA / NCI NIH HHS / United States