550 likes | 568 Views
Learn about MIAME document specifying minimum information for interpretable microarray experiments and its history. Acknowledgements, outline of talk, sample and gene annotations, gene expression database conceptual view, issues and solutions for database usability. Challenges regarding sample annotations, measurement units, and data interpretation, as well as recommendations for data recording and annotation standards. Comparison of raw, intermediate, and final data, importance of measurement units, principles of MIAME, and balancing data annotation burden.
E N D
Minimum Information About a Microarray Experiment - MIAME Alvis Brazma European Bioinformatics Institute European Molecular Biology Laboratory
What is MIAME? • A document, the goal of which is to specify the minimum information that must be reported about a microarray experiment in order to ensure its interpretability, as well as potential verification of the results • Underlying motivation – • to enable the establishment of public repositories for microarray data • to serve as a basis for designing a microarray data exchange format
Acknowledgements • MIAME working group • MAML working group • MGED steering committee • John Aach, Wilhelm Ansorge, Pascal Hingamp, Frank Holstege, Alex Lash, John Quackenbush, Alan Robinson, Paul Spellman, Criss Stoeckert, Martin Vingron
MIAME history • A need to establish a public repository or repositories for microarray gene expression data became apparent in 1998 • That requires data standards • MGED 1 meeting in Cambridge in November, 1999 establishes five working groups, including the microarray data annotation group (MIAME) • Several MIAME drafts produced by the group • MGED steering committee meeting in November 2000 in Bethesda endorses a MIAME draft • Last revision yesterday in MIAME working group meeting
Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what • Future developments and use of MIAME – how
How to think about MIAME What minimum information about a microarray gene expression measuring experiment should be recorded in a database for the database entries to be usable on stand-alone basis: • the users may not know any background information that is not recorded • the database should be usable for automated data analysis and mining, i.e. not only on record-by-record basis • the data may be coming from different laboratories and different technology platforms
Sample annotations Gene annotations Gene expression database – a conceptual view: Samples Gene expression matrix Genes Gene expression levels
Three parts of a gene expression database • Gene annotation – might be given by links to gene sequence databases and GO – not perfect state of art, but lets not worry about it • Sample annotation – we do not have any external databases for sample description (except species taxonomy) – problem 1 • Gene expression matrix – what are the measurement units for gene expression levels? – problem 2
Problem/consideration 1 – sample annotation • Gene expression data have any meaning only in the context of detailed description of the sample • If the data is going to be interpreted by independent parties, the information about the sample has to be in the database • Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description, if it has to be queried
Sample annotation – what can be done • Some use of free text descriptions are unavoidable • Controlled vocabularies and ontologies should be used wherever available • Externally defined controlled vocabularies and ontologies should be used whenever they exist
Problem/consideration 2 – the lack of gene expression measurement units • What we would like to have • gene expression levels expressed in some standard units (e.g. molecules per cell) • reliability measure associated with each value (e.g. standard deviation) • What we do have • each experiment using different units • no reliability information
cm inc Comparing expression data
? ? Comparing expression data
Raw data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations From microarray images to gene expression data
What to do in the absence of standard measurement units? • Record raw, intermediate and final analysis data together with the detailed annotation how the analysis has been performed • This effectively passes on the responsibility about interpreting the final analysis data to the user
Measurement units • In perspective: • standard controls for experiments (on chips and in the samples) should be introduced • replicate measurements will become a norm • Temporary solution: • storing intermediate analysis results (including the images) and annotations of how they were obtained - i.e., the evidence
Problem/consideration - 3 • We need to find a compromise found between the burden on the data producers to annotate and provide the data and the need of data to be sufficiently annotated for the database users • Too much detail may turn away the potential data providers and complicate the data submission and storage • Too little detail may limit the usability of the data • The current draft is a compromise between these two
Some more general principles • MIAME is aimed at a cooperative data provider, not as a legal document designed to close all loop-holes • MIAME is an informal specification • The concept of ‘qualifier, value, source’ triplets, e.g., • qualifier – cell type • value – epithelial • source – Human Anatomy (author, edition) • The concept of ‘experimental protocol’
General principles - continued • MIAME is not designed as a ‘questionnaire’ that can be filled in, but only as an informal specification based on which such a questionnaire, in fact, an annotation tool, can be based • Although MIAME is conceptually independent on databases, the aim of establishing a microarray database should be kept in mind then reading MAIME
Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what
Experiment Hybridisation Analysis Sample Array Source (e.g., Taxonomy) Gene (e.g., EMBL) A microarray experiment Publication (e.g. , PubMedCentral) External links ArrayExpress Normalisation Annotation of an experiment - a major challenge
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications www.mged.org
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole
Part 1 - Experimental design: the set of the hybridisation experiments as a whole • Normally ‘an experiment’ should consist of one or more hybridisations that are in some way related and performed in a limited number of time, e.g. all related to the same publication • Author, contact information, citations • Type of experiment (e.g., time course, normal vs diseased comparison) • Experimental factors – i.e. tested parameters in the experiment (e.g. time, dose, genetic variation, response to a compound) • List of organisms used in the experiment • List of platforms used
Experimental design - continued • List of samples, arrays and hybridisations and their relationships, e.g.: • Samples: S1, S2, S3 • Arrays: A1, A2, A3 • Hybridisations: • H1 is S1 and S2 on A1 • H2 is S2 and S3 on A2 • H3 is S1 and S2 on A3 • Which hybridisations are replicates, • e.g. H1 and H3 are replicates
Experimental design – continued 2 • Quality related indicators • Optional user defined ‘qualifier, value, source’ triplet – e.g.: • qualifier – survival data • value – given • source – user defined • Description of the experiment or link to a publication
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array
Part 2 - Array design: each array used and each element (spot) on the array • This part is separate for each type of array used in the experiment • For the database, the array description should be normally submitted only once • For each physical array used in the experiment a unique ID and the array type are given
Array design – continued • Array design related information (e.g. platform type – insitu synthesized or spotted, array provider, surface type – glass, membrane, other, etc) • Properties of each type of elements on the array, that are generated by similar protocols (e.g. synthesized oligos, PCR products, plasmids, colonies, others) – may be simple or composite (Affymetrix) • Each element (spot) on the array
Array design – continued • Each element (spot) on the array • Elements may be simple or composite • Each element must be identified by either the sequence, clone ID, PCR primer pair, or in any other unambiguous way • Composite elements may be identified by a reference sequence • May be linked to genes (preferably) • Will normally be provided in a separate file (e.g. spreadsheet)
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling
Part 3 - Samples: samples used, the extract preparation and labeling • Sample source and treatment • Organism (NCBI taxonomy) • Additional ‘qualifier, value, source’ list • cell source and type • developmental sage • organism part (tissue) • animal/plant strain or line • genetic variation • disease state or normal • … Typically only some of these qualifiers are relevant – an ontology tree is needed to implement the annotation tool for sample source and treatment
Sample - continued • Hybridisation extract preparation • Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method • Labelling • Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)
Experiment Hybridisation Analysis Sample Array Source (e.g., Taxonomy) Gene (e.g., EMBL) A microarray experiment Publication (e.g. , PubMedCentral) External links ArrayExpress Normalisation Annotation of an experiment - a major challenge
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications
Part 4 - Hybridizations: procedures and parameters • Laboratory protocol including • The solution (e.g. concentration of solutes) • Blocking agent • Wash procedure • Quantity of labelled target used • Time, concentration, volume, temperature • Description of the hybridisation instruments • Optional additional ‘qualifier, value, source’ list
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications
Raw data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations Raw, intermediate and final data
Part 5 - Measurements: images, quantitation, specifications • Hybridisation scan raw data – image • Intermediate data – image analysis and quantiation • Final data – summarised information from possible replicates
Raw data Array scans From microarray images to gene expression data
Measurements continued • Image data • The scanner image file (e.g. TIFF, DAT) • Scanning information • Scan parameters, including laser power, spatial resolution, pixel space, PMT voltage • Laboratory protocol for scanning, including scanning hardware and software used
Raw data Intermediate data Array scans Images Spots Spot/Image quantiations From microarray images to gene expression data
Measurements continued • Image analysis and quantitation • Complete image analysis output (of the particular image analysis software) for each element – normally given as separate file (e.g. spreadsheet) • Image analysis information • Image analysis software specification • All parameters
Row data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations From microarray images to gene expression data
Measurements continued • Summarised information from possible replicates • Derived measurement values summarising related elements as used by the author • Reliability information for these values, as used by the author (may be ‘unknown’) (these will be typically given in a spreadsheet) • Specifications of these two (e.g., median value of the replicates, standard deviation)
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications
Part 6 - Controls: types, values, specifications • Normalisation strategy (spiking, housekeeping genes, total array, other) • Normalisation algorithm • Control array elements • Hybridisation extract preparation
Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what • Future developments and use of MIAME – why
How to use MIAME • Data exchange format (MAML) allowing to communicate MIAME information • Establishing MIAME compliant databases (e.g. ArrayExpress) • Developing annotation tools for generating MIAME compliant information • Journals and public funding agencies may establish MIAME related policies