比利时vs摩洛哥足彩
,
university of california san diego
****************************
bioinformatics colloquium
chiara sabatti
university of california, los angeles
genomewide motif recognition with a dictionary model
abstract:
bussemaker et al. (2000, pnas) proposed the simple idea ofmodeling dna non coding sequence as a concatenation of words and gavean algorithm to reconstruct deterministic words from an observedsequence. moving from the same premises, we consider words that canbe spelled in a variety of forms (hence accounting for varying degreesof conservation of the same motif across genome locations).these ``words'' correspond to binding sites of regualtory proteins. theoverall frequency of occurrence of each word in the sequence and theparameters describing the random spelling of words are estimated in amaximum-likelihood framework using an e-m gradient algorithm. once these parameters are estimated, it is possible toevaluate the probability with which each motif occurs at a givenlocation in the sequence. these conditional probabilities can be used to predict whichgenes experience similar transcription regulations. gene expression data can be used tovalidate/refine such predictions.
host: ian abramson
november 14, 2002
3:00 pm
ap&m 6438
****************************