1 / 27

Sequence File Formats

Sequence File Formats. Sequencing – the old way. G. A. T. C. di - deoxy chain terminators (G, A, T, C) 4 different reactions 35 S dCTP Electrophoresis through an acrylamide gel Transfer gel to blotting paper Expose to X-ray film Develop film and read sequence.

dima
Download Presentation

Sequence File Formats

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence File Formats

  2. Sequencing – the old way G A T C • di-deoxy chain terminators (G, A, T, C) • 4 different reactions • 35S dCTP • Electrophoresis through an acrylamide gel • Transfer gel to blotting paper • Expose to X-ray film • Develop film and read sequence

  3. Chromatograms – Sanger sequencing • Fluorescent di-deoxy chain terminators • Four nucleotides, four “colors” • Electrophoresis through a polymer • Read the colors as they pass through a laser/detector

  4. Flowgrams, Ionograms • Flow nucleotides through a reaction cell – one at a time • Detect byproducts of incorporation • 454 sequencing, pyrophosphate (light) • Ion Torrent, hydrogen ions (pH)

  5. Colorspace – SOLiD sequencing • Sequence by ligation (detects 2 bases/cycle) • Flow 4 pools of 4 oligonucleotides over the reaction wells (each pool is labeled with a different fluorescent dye) • Detect dye, cleave off oligo-dye adaptor and repeat

  6. Process is repeated using nested primers

  7. SOLiD color codes AT

  8. Colorspace csfasta 2nd base 1st base

  9. fasta, multifasta • .mfa, .mpfa • Fasta (.fasta, .fa, .fas, .fsa, .fna) >Sequence1 CAATCATAGAGACAGCTGTTGTATCGTTACGTCATTCATGCAAGACCGCATTTAACGGCCAAGGCATTTCGCTACCTTAG • Multifasta (.mfa, .mpfa) >Sequence1 CAATCATAGAGACAGCTGTTGTATCGTTACGTCATTCATGCAAGACCGCATTTAACGGCCAAGGCATTTCGCTACCTTAG >Sequence2 ACCAGGAAGGTGGCCGACGCCAGCCGCTGATGCCACTCCACCCGCCGCGCACCGAGTCCAGGAGCGCGGACAAGGGGATT

  10. Colorspacefasta • .csfasta >Sequence1 T0123020301120301012020212330213230 >Sequence2 T2130322221303120001320310030123123 2nd base A C G T 0 1 2 3 A 1 0 3 2 C 2 3 0 1 1st base G 3 2 1 0 T

  11. Sequence Quality • Some sequence calls have better quality than others

  12. Quality values • Q = -10 log10P Q = qualityvalue P = probabilityoferror

  13. .sff files .sff^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^Vp<D8>^B0^@^D^B^H^ATACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCAG^@^@^@^@^@^@ ^@^K^@^@^@4^@^E^@%^@^@^@^@PXDEO:13:42^@^@^@^@^@^@4^@^A^@q^@^@^@^A^@f^@^H^@<E0>^@^A^@p^@ESC^@^@^@j^@j^@^@^@k^@<C3>^@^Z^@K^@^@^@l^@^A^@c^@x^@^@^@^@^@^A^@^@^@g^@^A^@i^@^@^@^@^@j ^@^A^@<E2>^@^@^@^@^@u^@^@^@i^@^@^@^@^@|^@b^@r^@^M^@T^@^@^@O^@^@^@^@^@B^@^@^@3^@^@^@^@^@^@^@^B^@^B^AS^@<F1>^@ <B1>^@L^@^B^@b^@3^@^@^@^B^@3^@3^@^@^@^@^@^B^@^C^@^B^@w^A<E3>^@^@^@^H^@`^@^A^@^@^@a^@^@^@^B^A^E^@^@^@^@^@^@^@^B^@^@^@^G^@^B^@^@^@^B^@^B^@^B^@^B^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^B^C^B^@^B^C^A^B^A^@^B^B^B^A^E^B^C^B^@^C^B^C^A^A^B^B^C^B^F^@^@^A^@^A^@^A^B^A^C^A^F^A^@^@^@^@^C^C^C^@^@TCAGGCAGATTGTGACGAGGCTGAGACTGCCCAAGGCACACAGGGGGTAGGG^M^L^Q^Z^S^YESC^]^]^_^Y^T^Y^W^\^]^_ESCESC^VESC^Z^Z^U^X^T^W^O^L^L^L^F ^G ^P^L ^L ^L^P^P^P^R^F^R ^L^L^L^F^@^@^@^@^@ ^@^K^@^@^@'^@^E^@^X^@^@^@^@

  14. Roche 454 .sff Files – common header Magic Number: 0x2E736666
Version: 0001
Index Offset: 110544
Index Length: 3173
# of Reads: 35
Header Length: 840
Key Length: 4
# of Flows: 800
Flowgram Code: 1
Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG
Key Sequence: TCAG

  15. .sff files - sequence specific information • >F7K88GK01BMPI0
Run Prefix: R_2009_12_18_15_27_42_
Region #: 1
XY Location: 0551_2346 • Run Name: R_2009_12_18_15_27_42_FLX########_Administrator_yourrunname
Analysis Name: D_2009_12_19_01_11_43_XX_fullProcessing
Full Path: /data/R_2009_12_18_15_27_42_FLX########_Administrator_yourrunname/D_2009_12_19_01_11_43_XX_fullProcessing/ • Read Header Len: 32
Name Length: 14
# of Bases: 500
Clip Qual Left: 15
Clip Qual Right: 490
Clip AdapLeft: 0
Clip AdapRight: 0 • Flowgram: 1.03 0.00 1.01 0.02 0.00 0.96 0.00 1.00 0.00 1.04 0.00 0.00 0.97 0.00 0.96 0.02 0.00 1.04 0.01 1.04 0.00 0.97 0.96 0.02 0.00 1.00 0.95 1.04 0.00 0.00 2.04 0.02 0.03 1.05 0.99 0.01 2.84 0.03 0.05 0.97 0.12 0.00 1.01 0.05 0.97 0.01 2.89 0.04 0.09 1.05 0.15 0.00 2.84 0.06 1.00 0.01 0.13 1.01 0.09 0.98 0.01 0.05 1.01 0.06 0.00 1.04 3.72 0.03 0.00 0.96 1.97 0.04 0.01 1.97 0.12 0.98 0.02 0.08 0.95 0.12 ...
FlowIndexes: 1 3 6 8 10 13 15 18 20 22 23 26 27 28 31 31 34 35 37 37 37 40 43 45 47 47 47 50 53 53 53 55 58 60 63 66 67 67 67 67 70 71 71 74 74 76 79 82 83 86 86 88 88 91 93 96 97 99 102 105 ...
Bases: tcagatcagacacgCCACTTTGCTCCCATTTCAGCACCCCACCAAGCACAAGGCTGTCATCCCAATTGGACGGACAGATATGAGGTTAGCATTGGAAACCAATTCAGTCCCTAATTATTCACGACTGAACCCAGCGACAATTGGACATGGATTCATTTTTCAACTTGATTTGTTGTTGTAAAAGCACTGAAGAAGATGCCGCAACAAGAGCTTCCAAAGTTTCCCACCGGATCGACGGTACCCTTTCCCTATGAATCTCCTTATCCTCAGCAGACAGCTTTGATGGACACGCTGCTCGAGTGTTTGCAGCAAAAGGATCACGATGATTCAACATGGCGCCAAACCAATGACAGCCATAGCAAGAACAAGAAGAAACCCCGTGCGGCCGTGATGATGTTGGAGTCTCCTACCGGCACTGGCAAGTCTCTATCTTTGGCGTGTAGTGCCATGGCGTGGCTCAAGTACTGCGAACAACGAGATTTGACTGCAGaagaagaatc
QualityScores: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 38 38 38 40 40 40 39 39 39 40 34 34 34 40 40 40 40 39 26 26 26 26 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ...

  16. Quality Files .phd .qual >contig00016 length=237 numreads=9 20 3 10 64 14 64 9 64 4 19 64 64 4 64 64 64 64 21 64 37 64 64 64 64 64 64 41 12 64 64 64 64 64 32 64 64 64 64 64 64 64 13 37 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 45 64 64 64 64 64 64 29 64 64 64 64 64 20 64 64 64 64 64 39 64 64 64 64 64 64 64 64 64 64 64 20 64 64 64 64 64 64 64 64 64 64 64 40 64 64 64 64 64 20 64 64 64 64 64 20 64 64 64 64 64 23 64 64 64 64 64 64 64 64 64 64 64 39 64 64 64 64 64 64 16 64 64 64 64 64 64 5 64 64 64 64 64 64 64 64 64 64 64 33 64 64 64 64 64 64 4 64 64 64 64 64 64 64 64 64 64 64 64 29 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 39 64 64 64 64 64 64 48 64 64 64 64 64 64 64 48 64 64 64 50 64 64 64 64 64 64 64 32 64 64 34 64 64 33 64 64 64 48 64 64 64 64 >contig00017 length=161 numreads=9 64 64 64 18 17 64 64 43 20 64 24 41 64 64 30 2 53 64 64 64 64 35 64 64 64 64 64 12 64 64 64 64 31 64 64 28 64 16 64 64 64 64 64 41 64 64 17 64 64 34 25 64 30 64 64 64 64 64 64 64 64 20 64 64 64 64 47 64 64 64 40 64 61 64 64 34 64 64 64 64 64 22 64 64 64 64 64 64 64 64 64 64 64 64 64 58 64 64 64 64 64 64 64 64 64 64 64 58 64 64 64 64 64 64 64 64 64 64 37 43 64 64 52 64 64 64 64 64 64 64 64 64 64 64 60 64 64 49 64 64 64 64 64 64 64 64 20 29 64 64 64 64 64 17 64 21 3 64 21 21 3 BEGIN_SEQUENCE CLS_AGTC_1a73_1_x_C10_FLCN12R_x_A08 BEGIN_COMMENT CHROMAT_FILE: CLS_AGTC_1a73_1_x_C10_FLCN12R_x_A08.ab1 BASECALLER_VERSION: KB 1.2 TRACE_PROCESSOR_VERSION: KB 1.2 QUALITY_LEVELS: 99 TIME: WedDec 07 19:41:26 2011 TRACE_ARRAY_MIN_INDEX: 0 TRACE_ARRAY_MAX_INDEX: 16022 TRIM: -1 -1 -1.000000e+000 TRACE_PEAK_AREA_RATIO: -1.000000e+000 CHEM: term DYE: big END_COMMENT BEGIN_DNA T 3 7 G 3 28 G 4 44 A 6 57 A 5 70 G 3 81 C 4 101

  17. .fastq, .fsq, .fq • Incorporates sequence calls and quality values into a single file: @PXDEO:18:45 ATATATATAAAATATAAAAAGGGTTTTTTTTAAAAAAAATTAATCCAGCAATAATTCCAAATTATTTTGAGGCCGAATCGGATGGGTTATTTTTTTTTTTATAAAAAATTATTTGCAACGAGCCATTATATAACAAA + 9=?>??AAAB@-:0+,000&0:.:;===;=(<<<<677(5151552766>;:>9@8=7:>2=7===>.>=?7=6:<7::<4:<99'0(0*---------%*-*4566)60133,366035665)+0/488443+...

  18. Quality scores in ASCII format SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS..................................................... ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...................... ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................... .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ...................... LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL.................................................... !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ | | | | | | 33 59 64 73 104 126 S - Sanger Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33, raw reads typically (0, 41)

  19. Why does the encoding start at 33?

  20. File converters • .sfffasta, qual • sffinfo (Newbler tool, www.Roche.com) • sff_extract (bioinf.comav.upv.es/sff_extract/) • .sff .fastq • SFF workbench (www.dnabaser.com/download/) • sff_extract (bioinf.comav.upv.es/sff_extract/) • Sff2fastq (github.com/indraniel/sff2fastq) • .fastq .fasta (&.qual, if desired) • Prinseq-lite.pl or cat file_in.fastq | perl -e '$i=0;while(<>){if(/^\@/&&$i==0){s/^\@/\>/;print;}elsif($i==1){print;$i=-3}$i++;}' > file_out.fasta

  21. Quality Metrics • Was the sequencing run successful? • Number of phred20(30) bases • Average read length (@Q20) • How much useable data? • Genome assembly • Total high quality bases • RNAseq/CHIPseq • Total number of map-able reads

  22. Sequence Trimming/Masking/Filtering • Trimming • Barcode & adapter sequences • Poor quality sequence at the starts/ends of reads • Masking • Poor quality sequence in the middles of reads • Filtering • Sequence reads that are shorter than a pre-defined threshold

  23. Quality Trimming/Masking >Sequence1 16 16 21 9 10 13 14 12 8 8 9 16 24 21 19 19 19 25 25 33 35 35 34 34 34 34 34 34 34 40 45 45 56 56 56 51 51 40 45 37 37 37 40 40 40 40 40 40 39 39 39 40 40 40 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 51 45 39 39 39 39 39 39 39 39 39 39 39 39 39 39 40 51 51 51 51 51 56 56 56 56 56 56 56 40 35 35 35 35 35 39 40 40 45 45 45 45 45 51 56 40 39 39 39 39 39 45 45 45 45 40 40 40 39 35 35 40 40 40 40 40 40 38 38 38 39 25 23 18 10 8 9 23 31 51 51 45 43 43 43 43 43 43 43 43 43 43 43 43 56 56 56 56 56 56 56 56 43 43 43 43 43 43 45 45 45 51 51 56 51 51 51 51 51 51 56 51 45 43 43 56 56 56 56 56 56 51 51 51 45 45 45 40 40 40 44 42 38 38 40 40 40 51 56 56 56 56 56 46 46 51 51 51 51 56 56 56 56 51 40 45 45 40 40 40 42 42 42 45 45 42 42 42 42 56 42 42 40 40 34 37 33 40 40 40 44 48 48 48 29 29 29 26 32 29 32 32 32 32 33 44 48 56 40 40 40 40 40 40 40 40 40 37 34 34 37 40 40 40 40 37 34 34 48 40 32 28 25 25 25 34 48 48 48 40 40 32 29 24 25 29 40 40 40 40 40 40 33 33 37 40 40 40 43 43 42 42 42 44 44 56 56 56 56 56 40 35 34 33 33 40 40 40 40 40 40 29 29 34 29 29 29 29 40 29 34 25 27 23 23 21 23 18 20 25 25 25 32 32 32 32 29 18 20 14 16 16 17 17 16 22 20 18 25 19 14 16 15 26 27

  24. Masked sequence >Sequence1 CCAGAAACTACGCGGTGGCGGCCGCTCTAGAACTAGTGGATCCCCCGGGCTGCAGATCGTCCGCCAGACTAAAGAAGTCCAAGAGTTGGCTCGCCAAAACGCGCTAAAAACGCAAAAAGCGGCGACCAGTAGANNNNAGGCGAGGCAGGAAGAACAAGCCAACTTTTGGGGTTAACGACTATGTTTTCGTCAAGAAAAAAGGGTTTCCGACGACCGCACCGACGACCAGATTGGATTCACAGTGGACCGGACCATGGCAGATTCTAGAAGAACGAGGATATAGCTATGTTTTGGACGTACCTGAATCGTTTAAAGGAAAAAATTTGTTCCACGCAGACCGCCTCCGCAAAGCCGCAATGGACCCATTACCACAACAGAAAAGAGAGCCGCCTCCGCCAGAAGAGATCAACGCCAGAGTTTGTGGTCGATAAAGTTTTAGCGTCCCGATTATTTGGCCGGAGTAAGATATTGCAATACCAGGTCGCATGGCAAGGATGTGATCCAGACGACACGTGGTACCCGGCTGAAAACTTCAAGAATTCAGCGACAGCCCTTGACGACTTCCACAAGAAGTAC

  25. Sequence header information (Illumina) @M01478:6:000000000-A40C5:1:1101:16859:1439 1:N:0:1 ATCGTTTCGGAGCAAGGCAACTGTNTCAGGCACCATGAAGTTGAGCTATTCTACTGCGCCAACCTTTGCGAGATAAATCGTCNTGCCNTNNTTATCANCGTCAATTGGAANTCAGATGTGCCACCNNAAN + ABBBAABFBBBBGEGGFGGGGGHF#AAFF2AGFGHGHHHHHHFHFFDGFGHHHHGHEGGGGCGGGHFABEEGFFHHEGHEGE#BBFG#?##???FFH#??FEFGHHEHHG#??FFEDGGGFFHFH##??# • Machine name • Run number • Flowcell ID • Flowcell lane • Tile in flowcell • X-coordinate in tile • Y-coordinate in tile • Member of Pair (1/2) • Read filtered? (Y/N) • Control bits on (0 or even number) • Index sequence used

  26. Today’s Exercises • Convert different file formats • Evaluate sequence data quality using FastQC • Trim sequence reads to improve data quality • Re-test trimmed data using FastQC

  27. Tips For a Productive Time • Practice using tab-completion • Make sure you execute all of the steps preceded by check boxes • Tick off/fill-in the check boxes after you have (successfully) completed each command • Do not skip over the text between the check boxes • It provides information designed to aid your understanding of what you are doing • ASK QUESTIONS

More Related