280 likes | 388 Views
Hepatitis C. Analysis of Sequence Data from ARUP and NCBI databases. By Ian Odell. What Information can we get from ARUP sequencing data?. Data is from January 2002 – July 2004. 5’ Un-translated Region of types 1 – 6: Number of unique sequences by type.
E N D
Hepatitis C Analysis of Sequence Data from ARUP and NCBI databases By Ian Odell
What Information can we get from ARUP sequencing data? Data is from January 2002 – July 2004. 5’ Un-translated Region of types 1 – 6: • Number of unique sequences by type. • Frequency of unique sequences for each type. • Frequency of each base in each type seen in a position weight matrix. • Regions of high and low variation seen in graphs of a Position Weight Matrix.
Unique Sequences by type: HCV Total Unique Unambiguous Type Sequences Sequences Unique Sequences 1 16151 1320 750 2 2862 585 373 3 2430 404 232 4 284 99 68 5 7 5 4 6 44 20 17 total 21778 2434 1444 ***Ambiguous bases causes unique sequences to be overrepresented.
Conclusions 1. Each type has a ‘profile’ sequence. 2. Do the log v log graphs give us insight into the distribution of mutations within the Hepatitis C population? NEXT: Look for variation between and within types from the unique sequences that are highly represented in the population (i.e. those that have many duplicates). Open Profiles
Stuyver et al. 1996. “Second-generation line probe assay for hepatitis C virus genotyping.” J. Clin. Microbiol. 34:2259-2266. In R5, the six selected probes were used for types 1 (line 4), 3 (line 15), 4 and 10 (line 18), and 5 (line 20), as well as for subtypes 2a/2c (line 11), 2b (line 12), and 3b (line 18).
Weight Matrices • From Profiles, we can see areas of variation between types and their conservation within each type. • Next, we want to see what these look like for all sequences in each type.
Example Weight Matrix This allows us to see the variation within a type at each nucleotide. First 10 base positions of Type 2 HCV
Graphical Type 1 Weight Matrix Sum of all points at each x-value = 1. Y-value tells us percentage each base is found at that index. We are looking for a region of conservation in all types; later we can look for variation between types. [ R5 ] ] [ R5 ] ]
What information can we get from NCBI data? • Look at Complete HCV Genome publications because blasting 5’ UTR primers biases towards what those primers amplify (i.e. Blast returns most similar hits and we want to look for variation). • Are there mismatches under the ARUP primers? Do ARUP primers bias the sequence data by not amplifying a certain group? • Regions of low and high variation in the complete genome. Compare to 5’ UTR. alignment not good enough for an accurate analysis.
Graphical Weight Matrix of ARUP (5’ UTR) Amplicon Data is from 239 aligned complete HCV genomes downloaded from GenBank. [ Rev Primer ] [ For Primer ]] [ For Primer ]] [ Rev Primer ]
SNP’s and insertions under ARUP Forward Primer Graphical Weight MatrixARUP forward primer region in Blast complete genome alignment 2 Ins 7 1 5 3 1 SNP’s / 239 Sequences
SNP’s and insertions under ARUP Reverse Primer Graphical Weight MatrixARUP reverse primer inBlast complete genome alignment 3 SNP’s / 239 Sequences