10 likes | 233 Views
2D-CGR/USM representation of DNA. Each iteration goes half the distance towards the corner representing the next symbol. Each point x i corresponds to one symbol in its context. 1. CGR/USM representation of DNA. C. T. Chaos Game Representation/Universal Sequence Map (CGR/USM)
E N D
2D-CGR/USM representation of DNA Each iteration goes half the distance towards the corner representing the next symbol Each point xi corresponds to one symbol in its context 1. CGR/USM representation of DNA C T Chaos Game Representation/Universal Sequence Map (CGR/USM) Maps discrete sequences onto continuous maps. The CGR/USM mapping of a N-length DNA sequence is: Suffix property – strings ending in a specific suffix are in the sub-square labeled with that suffix A G -ATC- Motif detected Fractal kernel Gaussian kernel ATC 0 x 1 Rényi continuous quadratic entropy for the sequence DNA datasetRepresentation of entropies for the dataset described in the Table above as a function of the logarithm of the Gaussian kernel variance used in the Parzen’s Method. The lower the value of entropy H2, the less random or more structured the sequence is. The graph has theoretically demonstrated asymptotes for given by line and for , line Rényi entropic profiles of DNA sequences and statistical significance of motifs Susana Vinga(a,b), Jonas S Almeida(a,c) • b)INESC-IDInstituto de Engenharia de Sistemas e Computadores: Investigação Desenvolvimento - Lisboa, Portugal • c) Dept. Biostatistics, Bioinformatics and Epidemiology - Medical Univ. South Carolina - Charleston SC 29425, USA • a) Biomathematics Group ITQB/UNL Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa - Oeiras, Portugal 1. Abstract 2. Methods and Algorithms In a recent report [1] the authors presented a new measure of Rényi continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of the probability density estimation (pdf) using the Parzen’s window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). This work extends those concepts of continuous entropy by defining DNA sequence entropic profiles using the pdf estimations obtained. These profiles are applied to the study of a sequence dataset constituted by artificial and real DNA and a new fractal-kernel function, more adjusted to the estimation, is explored, instead of the Gaussians functions previously used. This work shows that the entropic profiles are directly related to the statistical significance of motifs, allowing the study of under and over-representation of sub-strings. Furthermore, by spanning the parameters of the fractal-kernel function, it is possible to extract important information about the scale of each DNA region, which can have future applications in the recognition of biologically significant segments of the genome. Keywords: Rényi entropy, DNA, Information Theory, kernel functions, CGR/USM. 2. Rényi continuous entropy of DNA sequences Definition of DNA entropy based on CGR/USM and Parzen’s Method with parameter s - variance of Gaussian function used. Simplification! Simplification: Integral Sum Convolution of two Gaussians is Gaussian CGR/USM estimation -ATC- Motif detected where All pairwise squared Euclidean distances between CGR/USM coordinates xi http://bioinformatics.musc.edu/renyi svinga@itqb.unl.pt 3. Results Example DNA testset Rényi entropic profiles vs. 4. Conclusions and Future work • Method provides new tools for the study of motifs and repeatability in biological sequences • Explore theoretical properties of the entropic profiles • Optimize algorithm to accommodate longer sequences • Rényi entropic profiles provide local information about motifs and their statistical significance • Continuous quadratic entropy H2 is a good measure of DNA sequence randomness Acknowledgments S.Vinga and J.S.Almeida thankfully acknowledge the financial support by grants SFRH/BPD/24254/2005 and POCTI/BIO/48333/2002 from Fundação para a Ciência e a Tecnologia (FCT) of the Portuguese Ministério da Ciência, Tecnologia e Ensino Superior. References [1] Vinga, S. and Almeida, J. S. (2004) Rényi continuous entropy of DNA sequences J Theor Biol, 231(3):377-388.