290 likes | 342 Views
RAD is a versatile database for gene expression data, offering accurate experiment description, data preprocessing techniques, and gene network analysis. It supports multiple platforms, labs, and biological systems.
E N D
RAD (RNA Abundance Database) Stoeckert, C.J.Jr., Pizarro, A., Manduchi, E., Gibson, M., Brunk, B., Crabtree, J., Schug, J., Shen-Orr, S., Overton, G.C. A relational schema for array and non-array based gene expression data. Bioinformatics. In press. 2001 The Computational Biology and Informatics Laboratory
Issues • Accurate experiment description • Data preprocessing issues • clean-up • calibration • normalization • other transformations
Multiple labs Multiple biological systems Multiple platforms Multiple image quantification software RAD Expressed genes Differentially-expressed genes Class discovery Class prediction Gene networks
RAD versatility • Platforms • 2-channel microarrays • Filter arrays • Affymetrix • SAGE • Image quantification software • ScanAlyze • GEMTools • BioImage …
Views • A “view” renames attributes of a low-level generic table for specific implementations. • Common fields are specified as the same attributes for all implementations and implementation-specific fields rename generic attributes of the appropriate data type. • These views are not the same as materialized views that provide precalculated values to improve database query performance.
Information to be captured Figure from: David J. Duggan et al. (1999)Expression Profiling using cDNA microarrays. Nature Genetics21: 10-14
Raw Data Metadata Algorithm Categories of tables Experiment Platform Processed Data
Experiment Tables B A Figure from: David J. Duggan et al. (1999)Expression Profiling using cDNA microarrays. Nature Genetics21: 10-14
Devel. Stage Disease Treatment Taxon Sample Anatomy Label ExperimentSample Exp.ControlGenes Hybridization Conditions ControlGenes Groups ExpGroups RelExperiments ExperimentTables (A) Experiment
Views ExpImageImp PhosphorImager, ScanAlyzeImage, GEMImage, StanfordScanner, AffymetrixScanner, SAGESequence, … ExpResultImp Experiment Tables (B) Experiment BioImage, ScanAlyzeAnalysis, GEMResult, StanfordAnalysis, AffymetrixAnalysis, SAGEAnalysis, …
Platform Tables Figure from: David J. Duggan et al. (1999)Expression Profiling using cDNA microarrays. Nature Genetics21: 10-14
SpotFamilyImp SpotImp Platform Tables Array
SAGESpotFamily spot_family_id tag ext_db_id cluster_id … … SpotFamily views (comparisons) GEMSpotFamily spot_family_id ext_db_id source_id plate_id plate_row plate_column … … AffymetrixSpotFamily spot_family_id ext_db_id accession … … Each is a view of SpotFamily table Link to data with spot_family_id Integrate through gene index (http://www.allgenes.org) GUS EST assemblies mRNA
SpotImp SpotFamilyImp SpotResultImp SpotFamilyResult ExpResultImp Raw Data Tables
SpotResultImp raw spot value SpotFamilyResult summary of raw values AlgoInvocation usage of the algorithm AlgParam value used SpotResAnalysis processed spot result SpotFamResAnalysis processed spot fam res AlgImplementation actual program used AlgParamKey parameter description AnalysisType type of processing Algorithm type of program used AlgParamKeyType parameter data type Processed Data/Algorithm Tables
What genes are expressed in the top 20% of normal B-lymphocytes and mapped to Chromosome 19?
The allgenes (GUS) index provides annotation of array elements in RAD EST clustering and assembly Different representations of the same RNA are identified. EST/mRNA annotations are combined. Consensus sequence is annotated (e.g., gene function).
Ontologies • GO • Species • Tissue • Dev. Stage under development GUS: Genomics Unified Schema • Genes, gene models • STSs, repeats, etc • Cross-species analysis Genomic Sequence RAD RNA Abundance DB • Characterize transcripts • RH mapping • Library analysis • Cross-species analysis • DOTS Transcribed Sequence Special Features • Arrays • SAGE • Conditions Transcript Expression • Ownership • Protection • Algorithm • Evidence • Similarity • Versioning • Domains • Function • Structure • Cross-species analysis Protein Sequence Pathways Networks • Representation • Reconstruction
Different Views of RAD Focused annotation of specific organisms and biological systems: organisms biological systems Endocrine pancreas Human Mouse CNS RAD RAD Plasmodium falciparum Hematopoiesis *not drawn to scale*
Continuing Work and Future Issues Analysis perspective: • ontologies • data preprocessing • cross-platform comparisons • utilize other types of high-throughput data (e.g. protein expression) • DB perspective: • capture conclusions from analyses in a structured way • integrate other types of high-throughput data
RAD: www.cbil.upenn.edu/RAD2 Elisabetta Manduchi Angel Pizarro Shannon McWeeney Allgenes: www.allgenes.org Brian Brunk Ed Uberbacher, ORNL Jonathan Crabtree Doug Hyatt. ORNL Sharon Diskin Joan Mazzarelli Jonathan Schug EPConDB: www.cbil.upenn.edu/EPConDB Greg Grant Klaus Kaestner, Penn Phillip Le Marie Scearce, Penn Debbie Pinney Doug Melton, Harvard Alan Permutt, Wash U MGED:www.mged.org