50 likes | 228 Views
Gene & Transcripts group July 16, 3PM progress update. ENCODE, BETHESDA. Analysis Subjects. characterization of known functional domains given genome wide transcriptional surveys (transfrags, cages and ditags) pseudogenes protein coding genes noncoding RNAs
E N D
Gene & Transcripts group July 16, 3PM progress update ENCODE, BETHESDA
Analysis Subjects • characterization of known functional domains given genome wide transcriptional surveys (transfrags, cages and ditags) • pseudogenes • protein coding genes • noncoding RNAs • transcription on unannotated regions of the genome
Protein Coding Genes • The complexity of loci: transcript vs protein • how often do we find different first exons (CAGE, ditags, gencode) • a list of confident first exons [france] • Tissue/cell line specific transcript annotation (see Strategy slide) • List of exons, transcripts, loci with expression levels per cell • Variation of exon expression within a gene • Association between repeats and transcription • correlation between density of repeats (alus, lines) and expression level of transcripts • distance between transfrags and alus. Can transfrags be explained by runoff transcription of alus. • Characteristics of genes depending on cell line / condition • alternative splicing • number of transcritps • number of exons per gene/ quality of splice sites Non coding RNAs • a catalogue of the known non-coding RNAs in the ENCODE regions Pseudogenes Classification of Pseudogenes duplicated vs processed
Gene Locus (defined by the boundaries of the longest isoform IsoformsCell TypeCluster Class a) X 1 1 Transfrags b) Y 2 2 Transfrags c) Z 3 Transfrags None • Four Characteristics of each isoform transcript • Mean of exons per transcript • Range (Max-Min exons per transcript) • Max (number of exons per transcript) • SD SOM analysis K-Means Cluster
Transcription on Unannotated Regions of the Genome 1) Classification of unannotated transcribed regions into intronic/ intergenic, proximal/distal, conserved mammalian/deeper • Relation to conserved secondary structure (overlap transfrags and evafold, rnaz, …). compensatory SNPs 3) Motifs for transfrags related to exiting motifs and then see if remaining TFRs cluster into new motifs 4) Cell specificity of unannotated TFRs -TFRs present in all 11 samples -TFRs present in only one sample -TFRs present in more than 1 sample 5) Number of 5’ ends based on CAGE and ditags 6) Number of sense/antisense unannotated TFRs.