1 / 40

Recombination based population genomics

Recombination based population genomics. Jaume Bertranpetit Marta Melé Francesc Calafell . Asif Javed Laxmi Parida. Recall: IRiS. Identification of Recombinations in Sequences IRiS is a computational method developed with biological insight detects evidence of historical recombinations

jacob
Download Presentation

Recombination based population genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

  2. Recall: IRiS Identification of Recombinations in Sequences IRiS • is a computational method developed with biological insight • detects evidence of historical recombinations • minimizes number of recombinations in Ancestral Recombinational Graph (ARG)

  3. Recotypes Two chromosomes share a recombination if the junction is co-inherited. mutation edge recombinationedge extantsequence

  4. Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 a b

  5. Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 r2 c a b

  6. Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 r2 c a b

  7. Validity of inferred recombinations • Comparison with sperm typing • Computer simulated recombinations

  8. in vitro Chr 1 near MS32 minisatellite Jeffreys et al. 2005 80 UK semen donor of North European origin - Sperm typing- LDhat and Phase (200 SNPs) spermtyping LDhatPhase HapMap 2 CEU populationsimilar SNP density IRiS

  9. in silico Chromosomes • HapMap 3 X chromosome data • Select 2 chromosomes at random. • Pick a random breakpoint. • Create a new chromosome. • Check if it is unique, add to the dataset. • Run IRiS on the dataset to see if the breakpoint is detected.

  10. in silico Chromosomes • HapMap 3 X chromosome data • Select 2 chromosomes at random. • Pick a random breakpoint. • Create a new chromosome. • Check if it is unique, add to the dataset. • Run IRiS on the dataset to see if the breakpoint is detected.

  11. in silico Chromosomes • HapMap 3 X chromosome data • Select 2 chromosomes at random. • Pick a random breakpoint. • Create a new chromosome. • Check if it is unique, add to the dataset. • Run IRiS on the dataset to see if the breakpoint is detected.

  12. in silico Chromosomes • HapMap 3 X chromosome data • Select 2 chromosomes at random. • Pick a random breakpoint. • Create a new chromosome. • Check if it is unique, add to the dataset. • Run IRiS on the dataset to see if the breakpoint is detected.

  13. in silico Chromosomes • HapMap 3 X chromosome data • Select 2 chromosomes at random. • Pick a random breakpoint. • Create a new chromosome. • Check if it is unique, add to the dataset. • Run IRiS on the dataset to see if the breakpoint is detected.

  14. in silico Chromosomes • HapMap 3 X chromosome data • Select 2 chromosomes at random. • Pick a random breakpoint. • Create a new chromosome. • Check if it is unique, add to the dataset. • Run IRiS on the dataset to see if the breakpoint is detected. IRiS recombination detected?

  15. in silico Chromosomes • HapMap 3 X chromosome data • Select 2 chromosomes at random. • Pick a random breakpoint. • Create a new chromosome. • Check if it is unique, add to the dataset. • Run IRiS on the dataset to see if the breakpoint is detected. IRiS recombination detected? 69% recombinations detectedAll detected recombinations detect the correct sequenceNo false positives

  16. Recombinomics • Strong population structure • Agreement with traditional methods • FST vs. recombinational distance • More informative than SNPs • STRUCTURE • PCA

  17. Regions 18 regions selected from HapMap 3 • X-chromosome in males (to avoid phasing errors) • 50 KB away from known CNV and SD(to avoid genotyping errors) • 50 KB away from genes(to avoid selection) • at least 80 SNPs Chromosomes: LWK(43), MKK (88), YRI (88), ASW (42), GIH (42), CHB (40), CHD (21), JPT(25), MEX(21), CEU (74), TSI (40)

  18. Analysis For each region IRiS inferred recotypes for each chromosome • 5166 recombinations were inferred • 3459 co-occurred in at least two chromosomes Recombination Chromosome

  19. Analysis For each region IRiS inferred recotypes for each chromosome • 5166 recombinations were inferred • 3459 co-occurred in at least two chromosomes Recombination Chromosome Recotype

  20. Agreement with LDhat Each point represents a short haplotype segment in HapMap CEU population Spearman correlation= 0.711pvalue <10-30 recombination rate inferred by LDhat number of recombinations inferred by IRiS

  21. Agreement with LDhat Each point represents a short haplotype segment in HapMap CEU population Spearman correlation= 0.711pvalue <10-30 recombination rate inferred by LDhat Correlation in hotspots c2 = 38.39 pvalue<6x10-10 number of recombinations inferred by IRiS

  22. Recombinational distance between populations Two populations genetically closer will share a higher number of recombinations Recombinational distance Correlation between FST distance and recombinational distance for the 18 region [0.35 – 0.75 ] with pvalues < 0.025 RAB DAB = 1 - RA + RB -RAB MDS All regions combined stress=6.1%

  23. PCA of population data Recall recotypes

  24. PCA of population data Recall recotypes

  25. PCA of population data The first two PCs capture 66.4% of the variance

  26. PCA of recotypes • more on this later

  27. Recotypes vs. SNPs Due to ascertainment bias gene diversity does not reflect population structure results similar to Conrad 07 Percentage of variance Normalized comparison linearly scaled to [0,1] using 21 samples per population in agreement with Lewontin 72

  28. from SNPs to haplotypes to recotypes(a STRUCTURE comparison) K=2 SNPs haplotypes recotypes

  29. from SNPs to haplotypes to recotypes(a STRUCTURE comparison) K=3 SNPs haplotypes recotypes

  30. from SNPs to haplotypes to recotypes(a STRUCTURE comparison) K=4 SNPs haplotypes recotypes

  31. from SNPs to haplotypes to recotypes(a STRUCTURE comparison) K=5 SNPs haplotypes recotypes

  32. Africa within global genetic variation Structure k=4 minority African specific component Avg. Number of recombinations in 21 random chromsomes Out of Africa hypothesisFounder’s effect

  33. Genetic variation within Africa Structure k=5 Maasai specificminor component • Subsaharan Maasai are distinct among Africans. • African-American exhibit stronger recombinational affinity with African populations than European populations. (Parra 98)

  34. Genetic variation outside Africa Structure k=5 Avg. Number of recombinations in 21 random chromsomes • Outside Africa, Gujarati and Japanese exhibit the highest and lowest number of recombinations respectively. • Gujarati Indians show intermediate position between Europeans and East Asians.

  35. Venturing outside the X-chromosome • Benefits • The bigger picture • More regions and hence more information • Challenges • Higher number of recombinations makes the picture murkier • Phasing errors

  36. Regions 81 regions selected from HapMap 3 • 50 KB away from known CNV and SD(to avoid genotyping errors) • 50 KB away from genes(to avoid selection) • at least 200 SNPs • 25 samples per population(each sample has twochromosomes)

  37. Analysis • For each region IRiS inferred recotypes for each chromosome • 34140 recombinations were inferred • For each sample the two recotypes were merged. SNPs recotypes PCA plots

  38. Quantifying population structure • PCA and by k nearest neighbors is used to predict population of every sample Perfectly classified Africans Non- Africans classifiedwith errors MKK GIH E. Asian MEX European (0,7) (3,13) (8,13) LKK (4,3) CHB+CHD JPT CEU TSI ASW YRI Misclassification by (recotypes, SNPs)

  39. East Asian population Recotypes are more informative of underlying population structure. SNPs recotypes PCA plots

  40. in conclusion … Recotypes • show strong agreement with in silico and in vetro recombination rates estimates • are highly informative of the underlying population structure • provide a novel approach to study the recombinational dynamics

More Related