TitleA DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.
Publication TypeJournal Article
Year of Publication2015
AuthorsYang, J, Ramsey, SA
Date Published2015 Nov 01
KeywordsBinding Sites, Computational Biology, DNA, Gene Expression Regulation, Humans, Models, Theoretical, Position-Specific Scoring Matrices, Protein Binding, Software, Transcription Factors

MOTIVATION: The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites.

RESULTS: We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs.

AVAILABILITY AND IMPLEMENTATION: The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/.

CONTACT: stephen.ramsey@oregonstate.edu

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Alternate JournalBioinformatics
PubMed ID26130577
PubMed Central IDPMC4838056
Grant ListHL098807 / HL / NHLBI NIH HHS / United States