printable pdf
比利时vs摩洛哥足彩 ,
university of california san diego

****************************

bioinformatics colloquium

chiara sabatti

university of california, los angeles

genomewide motif recognition with a dictionary model

abstract:

bussemaker et al. (2000, pnas) proposed the simple idea ofmodeling dna non coding sequence as a concatenation of words and gavean algorithm to reconstruct deterministic words from an observedsequence. moving from the same premises, we consider words that canbe spelled in a variety of forms (hence accounting for varying degreesof conservation of the same motif across genome locations).these ``words'' correspond to binding sites of regualtory proteins. theoverall frequency of occurrence of each word in the sequence and theparameters describing the random spelling of words are estimated in amaximum-likelihood framework using an e-m gradient algorithm. once these parameters are estimated, it is possible toevaluate the probability with which each motif occurs at a givenlocation in the sequence. these conditional probabilities can be used to predict whichgenes experience similar transcription regulations. gene expression data can be used tovalidate/refine such predictions.

host: ian abramson

november 14, 2002

3:00 pm

ap&m 6438

****************************