TitleAn Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters.
Publication TypeJournal Article
Year of Publication2015
AuthorsRamsey, SA
JournalBioinform Biol Insights
IssueSuppl 4
Date Published2015

A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5' regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS (conditioned on the observed sequence) is sampled using Metropolis-Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on improves accuracy for estimating the number of TFBS within a set of promoter sequences.

Alternate JournalBioinform Biol Insights
PubMed ID27812284
PubMed Central IDPMC5081247
Grant ListK25 HL098807 / HL / NHLBI NIH HHS / United States