1 / 40

Introduction to Perl

Introduction to Perl. Part II. Associative arrays or Hashes. Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’, ‘aaron’, ‘max’, ‘juan’, ‘sue’) %hash = ( ‘apple’ => 12, ‘pear’ => 3, ‘cherry’ =>30, ‘lemon’ => 2, ‘peach’ => 6, ‘kiwi’ => 3);. Array. Hash.

zia
Download Presentation

Introduction to Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Perl Part II

  2. Associative arrays or Hashes • Like arrays, but instead of numbers as indices can use strings • @array = (‘john’, ‘steve’, ‘aaron’, ‘max’, ‘juan’, ‘sue’) • %hash = ( ‘apple’ => 12, ‘pear’ => 3, ‘cherry’ =>30, ‘lemon’ => 2, ‘peach’ => 6, ‘kiwi’ => 3); Array Hash apple 12 pear 3 0 1 2 3 4 5 cherry 30 lemon 2 ‘john’ ‘steve’ ‘aaron’ ‘max’ ‘juan’ ‘sue’ peach 6 kiwi 3

  3. Using hashes • { } operator • Set a value • $hash{‘cherry’} = 10; • Access a value • print $hash{‘cherry’}, “\n”; • Remove an entry • delete $hash{‘cherry’};

  4. Get the Keys • keys function will return a list of the hash keys • @keys = keys %fruit; • for my $key ( keys %fruit ) { print “$key => $hash{$key}\n”;} • Would be ‘apple’, ‘pear’, ... • Order of keys is NOT guaranteed!

  5. Get just the values • @values = values %hash; • for my $val ( @values ) { print “val is $val\n”;}

  6. Iterate through a set • Order is not guaranteed! while( my ($key,$value) = each %hash){ print “$key => $value\n”;}

  7. Subroutines • Set of code that can be reused • Can also be referred to as procedures and functions

  8. Defining a subroutine • sub routine_name { } • Calling the routine • routine_name; • &routine_name; (& is optional)

  9. Passing data to a subroutine • Pass in a list of data • &dosomething($var1,$var2); • sub dosomething { my ($v1,$v2) = @_;}sub do2 { my $v1 = shift @_; my $v2 = shift;}

  10. Returning data from a subroutine • The last line of the routine set the return valuesub dothis { my $c = 10 + 20;}print dothis(), “\n”; • Can also use return specify return value and/or leave routine early

  11. Write subroutine which returns true if codon is a stop codon (for standard genetic code).-1 on error, 1 on true, 0 on false sub is_stopcodon { my $val = shift @_; if( length($val) != 3 ) { return -1; } elsif( $val eq ‘TAA’ || $val eq ‘TAG’ || $val eq ‘TGA’ ) { return 1; } else { return 0; }

  12. Context • array versus scalar context • my $length = @lst;my ($first) = @lst; • Want array used to report context subroutines are called in • Can force scalar context with scalarmy $len = scalar @lst;

  13. subroutine context sub dostuff { if( wantarray ) { print “array/list context\n”; } else { print “scalar context\n”; } } dostuff(); # scalar my @a = dostuff(); # array my %h = dostuff(); # array my $s = dostuff(); # scalar

  14. Why do you care about context? sub dostuff { my @r = (10,20); return @r; } my @a = dostuff(); # array my %h = dostuff(); # array my $s = dostuff(); # scalar print “@a\n”; # 10 20 print join(“ “, keys %h),”\n”; # 10 print “$s\n”; # 2

  15. References • Are “pointers” the data object instead of object itsself • Allows us to have a shorthand to refer to something and pass it around • Must “dereference” something to get its actual value, the “reference” is just a location in memory

  16. Reference Operators • \ in front of variable to get its memory location • my $ptr = \@vals; • [ ] for arrays, { } for hashes • Can assign a pointer directly • my $ptr = [ (‘owlmonkey’, ‘lemur’)];my $hashptr = { ‘chrom’ => ‘III’, ‘start’ => 23};

  17. Dereferencing • Need to cast reference back to datatype • my @list = @$ptr; • my %hash = %$hashref; • Can also use ‘{ }’ to clarify • my @list = @{$ptr}; • my %hash = %{$hashref};

  18. Really they are not so hard... • my @list = (‘fugu’, ‘human’, ‘worm’, ‘fly’); • my $list_ref = \@list; • my $list_ref_copy = [@list]; • for my $item ( @$list_ref ) { print “$item\n”;}

  19. Why use references? • Simplify argument passing to subroutines • Allows updating of data without making multiple copies • What if we wanted to pass in 2 arrays to a subroutine? • sub func { my (@v1,@v2) = @_; } • How do we know when one stops and another starts?

  20. Why use references? • Passing in two arrays to intermix. • sub func { my ($v1,$v2) = @_; my @mixed; while( @$v1 || @$v2 ) { push @mixed, shift @$v1 if @$v1; push @mixed, shift @$v2 if @$v2; } return \@mixed;}

  21. References also allow Arrays of Arrays • my @lst;push @lst, [‘milk’, ‘butter’, ‘cheese’];push @lst, [‘wine’, ‘sherry’, ‘port’];push @lst, [‘bread’, ‘bagels’, ‘croissants’]; • my @matrix = [ [1, 0, 0], [0, 1, 0], [0, 0, 1] ];

  22. Hashes of arrays • $hash{‘dogs’} = [‘beagle’, ‘shepherd’, ‘lab’];$hash{‘cats’} = [‘calico’, ‘tabby’, ‘siamese’];$hash{‘fish’} = [‘gold’,’beta’,’tuna’]; • for my $key (keys %hash ) { print “$key => “, join(“\t”, @{$hash{$key}}), “\n”;}

  23. More matrix use my @matrix; open(IN, $file) || die $!; # read in the matrix while(<IN>) { push @matrix, [split]; } # data looks like # GENENAME EXPVALUE STATUS # sort by 2nd column for my $row ( sort { $a->[1] <=> $b->[1] } @matrix ) { print join(“\t”, @$row), “\n”;}

  24. Funny operators • my @bases = qw(C A G T) • my $msg = <<EOFThis is the message I wanted to tell you aboutEOF;

  25. Regular Expressions • Part of “amazing power” of Perl • Allow matching of patterns • Syntax can be tricky • Worth the effort to learn!

  26. A simple regexp • if( $fruit eq ‘apple’ || $fruit eq ‘Apple’ || $fruit eq ‘pear’) { print “got a fruit $fruit\n”;} • if( $fruit =~ /[Aa]pple|pear/ ){ print “matched fruit $fruit\n”;}

  27. Regular Expression syntax • use the =~ operator to match • if( $var =~ /pattern/ ) {} - scalar context • my ($a,$b) = ( $var =~ /(\S+)\s+(\S+)/ ); • if( $var !~ m// ) { } - true if pattern doesn’t • m/REGEXPHERE/ - match • s/REGEXP/REPLACE/ - substitute • tr/VALUES/NEWVALUES/ - translate

  28. m// operator (match) • Search a string for a pattern match • If no string is specified, will match $_ • Pattern can contain variables which will be interpolated (and pattern recompiled)while( <DATA> ) { if( /A$num/ ) { $num++ }}while( <DATA> ) { if( /A$num/o ) { $num++ }}

  29. Pattern extras • m// -if specify m, can replace / with anything e.g. m##, m[], m!! • /i - case insensitive • /g - global match (more than one) • /x - extended regexps (allows comments and whitespace) • /o - compile regexp once

  30. Shortcuts • \s - whitespace (tab,space,newline, etc) • \S - NOT whitespace • \d - numerics ([0-9]) • \D - NOT numerics • \t, \n - tab, newline • . - anything

  31. Regexp Operators • + - 1 -> many (match 1,2,3,4,... instances )/a+/ will match ‘a’, ‘aa’, ‘aaaaa’ • * - 0 -> many • ? - 0 or 1 • {N}, {M,N} - match exactly N, or M to N • [], [^] - anything in the brackets, anything but what is in the brackets

  32. Saving what you matched • Things in parentheses can be retrieved via variables $1, $2, $3, etc for 1st,2nd,3rd matches • if( /(\S+)\s+([\d\.\+\-]+)/) { print “$1 --> $2\n”;} • my ($name,$score) = ($var =~ /(\S+)\s+([\d\.\+\-]+)/);

  33. Simple Regexp my $line = “aardvark”;if( $line =~ /aa/ ) { print “has double a\n” }if( $line =~ /(a{2})/ ) { print “has double a\n” }if( $line =~ /(a+)/ ) { print “has 1 or more a\n” }

  34. Matching gene names # File contains lots of gene names # YFL001C YAR102W - yeast ORF names# let-1, unc-7 - worm names http://biosci.umn.edu/CGC/Nomenclature/nomenguid.htm # ENSG000000101 - human Ensembl gene names while(<IN>) { if( /^(Y([A-P])(R|L)(\d{3})(W|C)(\-\w)?)/ ) { printf “yeast gene %s, chrom %d,%s arm, %d %s strand\n”, $1, (ord($2)-ord(‘A’))+1, $3, $4; } elsif( /^(ENSG\d+)/ ) { print “human gene $1\n” } elsif( /^(\w{3,4}\-\d+)/ ) { print “worm gene $1\n”; } }

  35. Putting it all together • A parser for output from a gene prediction program

  36. GlimmerM (Version 3.0) Sequence name: BAC1Contig11 Sequence length: 31797 bp Predicted genes/exons Gene Exon Strand Exon Exon Range Exon # # Type Length 1 1 + Initial 13907 13985 79 1 2 + Internal 14117 14594 478 1 3 + Internal 14635 14665 31 1 4 + Internal 14746 15463 718 1 5 + Terminal 15497 15606 110 2 1 + Initial 20662 21143 482 2 2 + Internal 21190 21618 429 2 3 + Terminal 21624 21990 367 3 1 - Single 25351 25485 135 4 1 + Initial 27744 27804 61 4 2 + Internal 27858 27952 95 4 3 + Internal 28091 28576 486 4 4 + Internal 28636 28647 12 4 5 + Internal 28746 28792 47 4 6 + Terminal 28852 28954 103 5 3 - Terminal 29953 30037 85 5 2 - Internal 30152 30235 84 5 1 - Initial 30302 30318 17

  37. Putting it together • while(<>) { if(/^(Glimmer\S*)\s+\((.+)\)/ { $method = $1; $version = $2; } elsif( /^(Predicted genes)|(Gene)|(\s+\#)/ || /^\s+$/ ) { next } elsif( # glimmer 3.0 output /^\s+(\d+)\s+ # gene num (\d+)\s+ # exon num ([\+\-])\s+ # strand (\S+)\s+ # exon type (\d+)\s+(\d+) # exon start, end \s+(\d+) # exon length /ox ) {my ($genenum,$exonnum,$strand,$type,$start,$end, $len) = ( $1,$2,$3,$4,$5,$6,$7); }}

  38. s/// operator (substitute) • Same as m// but will allow you to substitute whatever is matched in first section with value in the second section • $sport =~ s/soccer/football/ • $addto =~ s/(Gene)/$1-$genenum/;

  39. The tr/// operator (translate) • Match and replace what is in the first section, in order, with what is in the second. • lowercase - tr/[A-Z]/[a-z]/ • shift cipher - tr/[A-Z]/[B-ZA]/ • revcom - $dna =~ tr/[ACGT]/[TGCA]/; $dna = reverse($dna);

  40. (aside) DNA ambiguity chars • aMino - {A,C}, Keto - {G,T} • puRines - {A,G}, prYmidines - {C,T} • Strong - {G,C}, Weak - {A,T} • H (Not G)- {ACT}, B (Not A), V (Not T), D(Not C) • $str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/;

More Related