250 likes | 289 Views
Suite of Python MadMapper scripts for quality control of genetic markers, group analysis and inference of linear order of markers on linkage groups Visualization and validation of genetic maps using two-dimensional CheckMatrix heat-plots
E N D
Suite of Python MadMapper scripts for quality control of genetic markers,group analysis and inference of linear order of markers on linkage groups Visualization and validation of genetic mapsusing two-dimensional CheckMatrix heat-plots Alexander Kozik and Richard Michelmore, UC Davis Genome Center http://cgpdb.ucdavis.edu/XLinkage/MadMapper/ 1
; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A Locus file Mapping using Recombinant Inbred Lines Genetic Cross Genotyping Raw Marker Scores Mapping – Inference of Linear Order of Markers 2
MadMapper and CheckMatrix are Python scripts and can be used on any computer platform: UNIX, Windows, Mac OS-X. Grouping can be done on a set of ~2,000 markers; map construction works in reasonable timeframe with up to ~500 markers http://cgpdb.ucdavis.edu/XLinkage/MadMapper/ MadMapper_RECBIT – - quality control of genetic markers and group analysis MadMapper_XDELTA – - inference of linear order of markers on linkage groups CheckMatrix (py_matrix_2D_V248_RECBIT.py ) – - visualization and validation of genetic maps using two-dimensional heat-plots and graphical genotyping 3
Recombination Distance Scores: [ *.pairs_all ] ................... GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48 GM01 GM10 0.52 GM01 GM11 0.60 GM01 GM12 0.68 GM02 GM01 0.04 GM02 GM02 0.00 GM02 GM03 0.08 GM02 GM04 0.16 GM02 GM05 0.20 GM02 GM06 0.24 ................... Python_MadMapper_V248_RECBIT_012.py INPUT:Locus file ; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A Group Info: [ *.group_info ] one file per iteration 16 iterations with different cutoff values Adjacency List: [ *.adj_list ] one file per iteration 16 iterations with different cutoff values MadMapper_RECBIT input and output files Trio Analysis: [ *.z_trio_good ] [ *.z_trio_best ] [ *.z_trios_bad ] analysis of all trios (triplets) for non-redundant set of markers one input file - locus file with raw marker scores 82 output files LOG file: ( *.x_log_file ) information about run parameters Marker Scores Info: [ *.x_scores_stat ] detailed information about scores and linkage Marker summary: [ *.z_marker_sum ] for each marker ‘quality class’ is assigned - - useful for selection of ‘core’ markers Group Info Summary: file [ *.x_tree_clust ] Summary for clustering results for all 16 iterations Distinct linkage groups can be inferred by analysis of this clustering / grouping information Non-Redundant Marker Scores: [ *.z_nr_scores.loc ] locus file with non-redundant raw marker scores 4
MadMapper BIT scoring system ################################################################# # +-------+ GENOTYPES: # # | BIT | A – 1st; B – 2nd # # SCORING SYSTEM: | | C - NOT A ( H or B ) # # | REC | D - NOT B ( H or A ) # # +-------+ H - A and B # # # # . +-------+-------+-------+-------+-------+-------+ # # . | | | | | | | # # . | A | B | C | D | H | - | # # .| | | | | | | # # +-------*-------+-------+-------+-------+-------+-------+ # # | | 6 | -6 | -4 | 4 | -2 | 0 | # # | A | | | | | | | # # | | 0 | 1 | 1 | 0 | 0.5 | 0 | # # +-------+-------*-------+-------+-------+-------+-------+ # # | | -6 | 6 | 4 | -4 | -2 | 0 | # # | B | | | | | | | # # | | 1 | 0 | 0 | 1 | 0.5 | 0 | # # +-------+-------+-------*-------+-------+-------+-------+ # # | | -4 | 4 | 4 | -4 | 0 | 0 | # # | C | | | | | | | # # | | 1 | 0 | 0 | 1 | 0 | 0 | # # +-------+-------+-------+-------*-------+-------+-------+ # # | | 4 | -4 | -4 | 4 | 0 | 0 | # # | D | | | | | | | # # | | 0 | 1 | 1 | 0 | 0 | 0 | # # +-------+-------+-------+-------+-------*-------+-------+ # # | | -2 | -2 | 0 | 0 | 2 | 0 | # # | H | | | | | | | # # | | 0.5 | 0.5 | 0 | 0 | 0 | 0 | # # +-------+-------+-------+-------+-------+-------*-------+ # # | | 0 | 0 | 0 | 0 | 0 | 0 | # # | - | | | | | | | # # | | 0 | 0 | 0 | 0 | 0 | 0 | # # +-------+-------+-------+-------+-------+-------+-------*. # # # ################################################################# ################################################################# # # # EXAMPLES OF SCORING: # # # # # # POSITIVE LINKAGE: # # # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*20 = 120 # # AAAAAAAAAAAAAAAAAAAA REC SCORE = 0 (0.0) # # .. # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*18 - 6*2 = 96 # # AAAAAAAAAAAAAAAAAABB REC SCORE = 2 (2/20 = 0.1) # # # # AAAAAAAAAABBBBBBBBBB BIT SCORE = 6*10 + 6*10 = 120 # # AAAAAAAAAABBBBBBBBBB REC SCORE = 0 (0.0) # # .. # # AAAAAAAAABABBBBBBBBB BIT SCORE = 6*18 - 6*2 = 96 # # AAAAAAAAAABBBBBBBBBB REC SCORE = 2 (2/20 = 0.1) # # # # # # NO LINKAGE: # # .......... # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*10 - 6*10 = 0 # # AAAAAAAAAABBBBBBBBBB REC SCORE = 10 (10/20 = 0.5) # # . . . . . . . . . . # # BBBAABBAAAAAAABAABBB BIT SCORE = 6*10 - 6*10 = 0 # # BABBAABBABABABBBAABA REC SCORE = 10 (10/20 = 0.5) # # # # # # NEGATIVE LINKAGE: # # .................. # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*2 - 6*18 = -96 # # AABBBBBBBBBBBBBBBBBB REC SCORE = 18 (18/20 = 0.9) # # .................. # # ABABABABABABABABABAB BIT SCORE = 6*2 - 6*18 = -96 # # ABBABABABABABABABABA REC SCORE = 18 (18/20 = 0.9) # # # ################################################################# 5
Arabidopsis Genetic Map: Comparison of Different Scoring Systems JoinMap LOD scores JoinMap REC scores MadMapper BIT scores MadMapper REC scores 6
MadMapper_RECBIT Clustering: Group Info Summary [ *.x_tree_clust file ] 7
M_2 M_1 M_3 M_4 MadMapper_RECBIT BIN Analysis M_1 A A AB B BA A A AB B B BA A - A AB B B BAB BA ABA A AB B B B M_2 A A AB B BA - A AB B B BA A A A AB B B BAB BA ABA A AB B B B M_3 A A AB B BA A A AB B - BA A A A AB B B BAB BA ABA A AB B B B M_4 A A AB B BA A A AB B A BA A A A AB B - BAB BA ABA A AB B B B Linked Group Diluted Node Saturated Node Example of Complete Graph: all nodes are ‘saturated’ 8
MadMapper_RECBIT Trio (Triplet) Analysis M_1 A A AB B BA A A AB B B BA A - A AB B B BAB BA ABA A AB B B B X X X X M_M A A AB A BA - A AB B B BA B A A AB B A BAB BA ABA A AB B A B X X X X M_2 A A AB B BA A A AB B - BA A A A AB B B BAB BA ABA A AB B B B flanking marker 1 ‘middle’ marker flanking marker 2 Bad Trios Number of double crossovers is high for bad trios Good Trios Number of double crossovers should be low for good trios 10
LG GM01 0 LG GM02 1 LG GM03 2 LG GM04 3 LG GM05 4 LG GM06 5 LG GM07 6 LG GM08 7 LG GM09 8 LG GM10 9 LG GM11 10 LG GM12 11 ................... GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48 GM01 GM10 0.52 GM01 GM11 0.60 GM01 GM12 0.68 GM02 GM01 0.04 GM02 GM02 0.00 GM02 GM03 0.08 GM02 GM04 0.16 GM02 GM05 0.20 GM02 GM06 0.24 ................... ; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A Locus file Map file Matrix file CheckMatrix Usage: three input files are required CheckMatrix (py_matrix_2D_V248_RECBIT.py ) upon program execution three output files will be generated: HEAT PLOT – it assists to validate the quality of constructed genetic map and identify markers with wrong position GRAPHICAL GENOTYPING: visualization of haplotypes per recombinant line (suspicious double crossovers are highlighted) CIRCULAR GRAPH – it assists to validate genetic map and identify markers with spurious linkage 11
Genetic Map Visualization using CheckMatrix [ good map ] 12
Genetic Map Visualization using CheckMatrix [ wrong map ] 13
Genetic Map Visualization using CheckMatrix [ disordered markers ] 14
CheckMatrix 2D plot: Minimum Entropy Approach to Infer Linear Order Using MadMapper_XDELTA program random order high ‘entropy’ MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest ‘entropy’). partially wrong order right order low ‘entropy’ Visualization of numerical data using ChekMatrix 15
MINIMUM ENTROPY APPROACH TO INFER LINEAR ORDER OF MARKERS: CheckMatrix Color Scheme Two-dimensional matrix of recombination pairwise scores adjacent cells (values) Numerical data generated by MadMapper Visualization of numerical data using CheckMatrix 16
MadMapper_XDELTA Usage: • MadMapper_XDELTA takes as input three files: • Matrix (pairwise distances between markers) • List of ‘frame’ markers • List of markers to map First step: finding of the best map for ‘frame’ markers by checking all possible combinations (up to 10 markers) optionally: unlimited list of ‘frame’ markers with fixed order Best-Fit extension Take one marker from the list of markers to map and insert it into 2-dimensional matrix of the current best map. Check for all possible positions. Calculate ‘delta’ and find the map with lowest ‘delta’ value (lowest ‘entropy’) Move to the next marker to map until all markers are mapped. Optional shuffling (ripple) after several steps 17
Example of Best-Fit Extension: ============================================= MATRIX (ALL PAIRS) : madmapper_test_small.out.pairs_all MARKERS TO MAP : madmapper_test_small.list FRAME MARKERS LIST : madmapper_test_small.frame OUTPUT MAP FILE : madmapper_test_small.xdelta MAX FRAME LENGTH : 12 FIXED FRAME ORDER : FALSE LINKAGE GROUP ID : LG DUMMY DEBUG : TRUE ============================================= ======= GM02 GM06 GM10 *** 1.52 *** 0.5067 *** 1 GM02 GM10 GM06 *** 1.92 *** 0.64 *** 2 GM06 GM02 GM10 *** 1.68 *** 0.56 *** 3 ======= GM03 GM02 GM06 GM10 *** 2.16 *** 0.54 *** 1 GM02 GM03 GM06 GM10 *** 2.0 *** 0.5 *** 2 GM02 GM06 GM03 GM10 *** 2.64 *** 0.66 *** 3 GM02 GM06 GM10 GM03 *** 3.2 *** 0.8 *** 4 ======= GM08 GM02 GM03 GM06 GM10 *** 3.64 *** 0.728 *** 1 GM02 GM08 GM03 GM06 GM10 *** 4.32 *** 0.864 *** 2 GM02 GM03 GM08 GM06 GM10 *** 3.28 *** 0.656 *** 3 GM02 GM03 GM06 GM08 GM10 *** 2.56 *** 0.512 *** 4 GM02 GM03 GM06 GM10 GM08 *** 3.16 *** 0.632 *** 5 ======= GM09 GM02 GM03 GM06 GM08 GM10 *** 4.8 *** 0.8 *** 1 GM02 GM09 GM03 GM06 GM08 GM10 *** 5.92 *** 0.9867 *** 2 GM02 GM03 GM09 GM06 GM08 GM10 *** 4.72 *** 0.7867 *** 3 GM02 GM03 GM06 GM09 GM08 GM10 *** 3.76 *** 0.6267 *** 4 GM02 GM03 GM06 GM08 GM09 GM10 *** 3.12 *** 0.52 *** 5 GM02 GM03 GM06 GM08 GM10 GM09 *** 3.52 *** 0.5867 *** 6 map calculated by checking of all possible combinations marker GM03 was inserted marker GM08 was inserted marker GM09 was inserted 18
MadMapper_XDELTA Map Output A– marker above B– middle marker Distance [A-B] Distance [B-C] Distance [A-C] ([A-B] + [B-C]) - [A-C] [A-B] + [B-C] C – marker below A B C 19
Physical order of markers (based on BLAST search) Side-by-side comparison of linear order of markers on Arabidopsis genome inferred by three different approaches (mapping programs) and comparison with physical order of markers (Col- 0 genomic sequence): MadMapper_XDELTA (minimum entropy approach), JoinMap (maximum likelihood) and RECORD (minimum number of recombination events) [Diagonal dot-plot was created using GenoPix_2D_Plotter http://www.atgc.org/GenoPix_2D_Plotter/ ] Inferred order of markers by mapping programs MadMapper JoinMap RECORD 21
Arabidopsis Genetic Map constructed by MadMapper and visualized with CheckMatrix: 2D Heat Plot Linkage group I Regions with Negative Linkage Main Diagonal with Linked Markers Linkage group II Linkage group III Regions with Quasi Linkage High Density of Markers Linkage group IV Low Density of Markers Allele Composition per Marker Linkage group V Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V 22
Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V Arabidopsis Genetic Map constructed by MadMapper and visualized with CheckMatrix: Graphical Genotyping 23
REFERENCES AND DATA SOURCES: 1. Dean and Lister Arabidopsis Genetic Map and Raw Data: http://www.arabidopsis.info/new_ri_map.html 2. MadMapper: http://cgpdb.ucdavis.edu/XLinkage/MadMapper/ 3. JoinMap: http://www.kyazma.nl/index.php/mc.JoinMap 4. RECORD: http://www.dpw.wau.nl/pv/pub/recORD/index.htm 5. GenoPix_2D_Plotter http://www.atgc.org/GenoPix_2D_Plotter/ CREDITS: This work was funded by NSF grant # 0421630 to Compositae Genome Consortium http://compgenomics.ucdavis.edu/ PAG-14 POSTERS WITH EXAMPLES OF MADMAPPER USAGE: #P751 High-Density Haplotyping With Microarray-Based Single Feature Polymorphism Markers In Arabidopsis #P761 Gene Expression Markers: Using Transcript Levels Obtained From Microarrays To Genotype A Segregating Population #P957 MadMapper And CheckMatrix - Python Scripts To Infer Orders Of Genetic Markers And For Visualization And Validation Of Genetic Maps And Haplotypes 24