1 / 16

NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools. March 15 th , 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB. Learning Objectives. Linux revisited Quick dive into the Open-Bio pool ( BioPython )

blue
Download Presentation

NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NGS Bioinformatics Workshop1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15th, 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB

  2. Learning Objectives • Linux revisited • Quick dive into the Open-Bio pool (BioPython) • A first look at NGS data: • NCBI short read archive • Processing NGS: FASTX tool kit et al. • Visualization: IGV

  3. Files and Permission • Linux user permissions: owner, group, or others • Owner/user is the person who created the file • “OWNS” the file / directory • Group is a team of people that’s associated together • GROUP project / Team work • Others is just other people on the server • Each file / directory can have it’s permission set to (r)ead, (w)rite, or e(x)ecute

  4. chmod: change file permissions Do a long listing (ls –l) • dr-x-wxrw- Separated into four sections • (d)(r - x)(- w x)(r w -) Examples: chmodo+x foo.txt  grant ‘execute’ permission to ‘others’ on foo.txt chmod g-rw foo.txt  remove ‘read’ and ‘write’ permission from group chmodugo+rwxfoo.txt  grant all rights to everyone To change the user/group (‘owner’) of a file: chmodubuntu:ubuntu foo.txt directory or file (-) user (owner) group others

  5. a few useful tips… • Hitting “tab” will auto-complete file or program names (or suggest possible names) • Up arrow will let you return to previous commands • Editing of text files: “nano” is an easier alternative to “emacs”, but less powerful alternatively, use SSH client to transfer files on your Windows desktop, edit them in Windows, then transfer back BUT: make sure you use a text editor that knows the difference between a Windows and a Linux text file (e.g. Notepad++)

  6. Some more useful basic Linux commands • “cd” changes your directory, e.g. ‘cd /usr/local’ • “man” display manual for command, e.g. ‘man ‘ls’ • “pwd” tells you the directory you are currently in (= working directory) • “history” will list recent commands, enumerated with line numbers. By; typing an exclamation point with the line number (e.g. !123), you can redo the command

  7. Accessing remote servers • “ssh” – Secure Shell ssh –iprivate_keypairuser@host • “scp” – Secure CoPy ssh–iprivate_keypair[user@host:]sourcefile [user@host:]targetfile Where user is the account (default: local user) and host is the internet name of the computer (defaults: local host)

  8. OpenBio Case Study: BioPython http://biopython.org/wiki/Biopython http://biopython.org/DIST/docs/tutorial/Tutorial.html

  9. NGS Bioinformatics Workshop1.2 Tutorial – Sequence Formats, Databases and Visualization Tools First look at ngs data

  10. http://www.ncbi.nlm.nih.gov/sra/

  11. http://hannonlab.cshl.edu/fastx_toolkit/ Linux, MacOSX or Unix only

  12. Get the precompiled binary wgethttp://hannonlab.cshl.edu/fastx_toolkit/Ã fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 bunzip2fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 tar –xvffastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar sudomv bin/* /usr/local/bin

  13. FASTX tool kit I • FASTQ-to-FASTA converter • Convert FASTQ files to FASTA files. • FASTQ Information • Chart Quality Statistics and Nucleotide Distribution • FASTQ/A Collapser • Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts) • FASTQ/A Trimmer • Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise). • FASTQ/A Renamer • Renames the sequence identifiers in FASTQ/A file. • FASTQ/A Clipper • Removing sequencing adapters / linkers

  14. FASTX tool kit II • FASTQ/A Reverse-Complement • Producing the Reverse-complement of each sequence in a FASTQ/FASTA file. • FASTQ/A Barcode splitter • Splitting a FASTQ/FASTA files containing multiple samples • FASTA Formatter • Changes the width of sequences line in a FASTA file • FASTA Nucleotide Changer • Converts FASTA sequences from/to RNA/DNA • FASTQ Quality Filter • Filters sequences based on quality • FASTQ Quality Trimmer • Trims (cuts) sequences based on quality • FASTQ Masker • Masks nucleotides with 'N' (or other character) based on quality

  15. www.bioinformatics.bbsrc.ac.uk/projects/download.html http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

  16. Integrative Genomics Viewer http://www.broadinstitute.org/igv/

More Related