1 / 59

Roadmap

Roadmap. The topics: basic concepts of molecular biology Elements of Python Python ’ s data types Python functions Python control of flow Python OOP Python regex overview of the field biological databases and database searching sequence alignments phylogenetics structure prediction

brad
Download Presentation

Roadmap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Roadmap The topics: • basic concepts of molecular biology • Elements of Python • Python’s data types • Python functions • Python control of flow • Python OOP • Python regex • overview of the field • biological databases and database searching • sequence alignments • phylogenetics • structure prediction • microarray & next gen

  2. Where to get Python? If you want to run your python programs on your own machine, download Python interpreter from different places: http://www.activestate.com/activepython/ or http://www.python.org/download/ You can find the links at the class website.

  3. Running Python Python is one of the best scripting languages. It is often being used in text-based command shell.

  4. Download IEP as your IDE There are many good Python IDEs. I have the links to some of them at the class website. I’ll use IEP in class: Download IEP at: http://www.iep-project.org/downloads.html I downloaded : “iep-3.2.win32.exe - Windows installer”

  5. Use IEP • You’ll see an icon like the following on your desktop. Start your IEP. Ctrl S: save Ctrl E: execution Drawback: Won’t be able to pass arguments to script

  6. Programming Pythonfor BioinformaticsPart I

  7. A Taste of Python: at prompt • Type statements or expressions at prompt: >>> print "Hello, world" Hello, world >>> x = 12**2 >>> x/2 72 >>> # this is a comment

  8. A Taste of Python: print a message • demo1.py: Greet the entire world. #!/usr/bin/python #greet the entire world x = 7e9; print “Hello world!”; print “All”, x, “of you!”; • command interpretation header • (where to find python) - a comment - variable assignment statement } - function calls (output statements)

  9. A Taste of Python: scripting • demo2.py: parsing email addresses

  10. Overview • Assignment & Names • Data types • Sequences types: Lists, Tuples, and Strings • Mutability • Understanding Reference Semantics in Python

  11. Assignment • Assignment uses = and comparison uses == • The first assignment to a variable creates it • Dynamic typing: no declarations, names don’t have types, objects do • For numbers + - * / %are as expected. • Use of + for string concatenation. • Use of % for string formatting (like printf in C) • Logical operators are words (and,or,not) not symbols • The basic printing command is print • Indentation matters to meaning the code • Block structure indicated by indentation

  12. Assignment • Binding a variable in Python means setting a name to hold a referenceto some object • Assignment creates references, not copies • Names in Python don’t have an intrinsic type, objects have types Python determines type of the reference auto-matically based on what data is assigned to it • You create a name the first time it appears on the left side of an assignment expression: x = 3 • A reference is deleted via garbage collection after any names bound to it have passed out of scope • Python uses reference semantics

  13. Naming Rules • Names are case sensitive and cannot start with a number. They can contain letters, numbers, and underscores. bob Bob _bob _2_bob_ bob_2 BoB • There are some reserved words: and, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while

  14. Naming conventions The Python community has these recommended naming conventions • joined_lower for functions, methods and, attributes • joined_lower or ALL_CAPS for constants • StudlyCaps for classes • camelCase only to conform to pre-existing conventions • Attributes: interface, _internal, __private

  15. Accessing Non-Existent Name Accessing a name before it’s been properly created (by placing it on the left side of an assignment), raises an error >>> y Traceback (most recent call last): File "<pyshell#16>", line 1, in -toplevel- y NameError: name ‘y' is not defined >>> y = 3 >>> y 3

  16. Whitespace Whitespace is meaningful in Python, especially indentation and placement of newlines • Use a newline to end a line of code Use \when must go to next line prematurely • No braces { } to mark blocks of code, use consistentindentation instead • First line with less indentation is outside of the block • First line with more indentation starts a nested block • Colons start of a new block in many constructs, e.g. function definitions, then clauses

  17. Comments • Start comments with #,rest of line is ignored • Can include a “documentation string” as the first line of a new function or class you define • Development environments, debugger, and other tools use it: it’s good style to include one def fact(n): “““fact(n) assumes n is a positive integer and returns factorial of n.”””assert(n>=0) return 1 if n==0 else n*fact(n-1)

  18. Python’s data types

  19. Everything is an object • Python data is represented by objects or by relations between objects • Every object has an identity, a type and a value • Identity never changes once created Location or address in memory • Type (e.g., integer, list) is unchangeable and determines the possible values it could have and operations that can be applied • Value of some objects is fixed (e.g., an integer) and can change for others (e.g., list)

  20. Python’s built-in type hierarchy

  21. Basic Datatypes • Integers (default for numbers) z = 5 / 2 # Answer 2, integer division • Floats x = 3.456 • Strings • Can use "…" or '…' to specify, "foo" == 'foo' • Unmatched can occur within the string “John’s”or ‘John said “foo!”.’ • Use triple double-quotes for multi-line strings or strings than contain both ‘ and “ inside of them: “““a‘b“c”””

  22. Numbers • Numbers • Normal Integers –represent whole numbers Ex: 3, -7, 123, 76 • Long Integers – unlimited size Ex: 9999999999999999999999L • Floating-point – represent numbers with decimal places Ex: 1.2, 3.14159,3.14e-10 • Octal and hexadecimal numbers Ex: O177, 0x9ff, Oxff • Complex numbers Ex: 3+4j, 3.0+4.0j, 3J

  23. Python Basics – arithmetic operations + add - subract * multiply / divide % modulus/remainder OperatorsExample y=5; z=3 x = y + z x = y – z x = y * z x = y / z x = y % z x = 8 x = 2 x = 15 x = 1 x = 2

  24. Python Basics – arithmetic operations << shift left >> shift right | bitwise or ^ bitwise exclusive or & bitwise and ** raise to power OperatorsExample y=5; z=3 x = y << 1 x = y >> 2 x = y | z x = y ^ z x = y & z x = y ** z x = 10, y=5 x = 1 x = 7 x=6 x=1 x = 125

  25. Relational operators == equal !=, <> not equal > greater than >= greater than or equal < less than <= less than or equal Logical operators andand oror not not Python Basics – Relational and Logical Operators

  26. Python Basics – Relational Operators • Assume x = 1, y = 4, z = 14

  27. Python Basics – Logical Operators • Assume x = 1, y = 4, z = 14

  28. Three sequence types: Tuples, Lists, and Strings

  29. Sequence Types • Sequences are containers that hold objects • Finite, ordered, indexed by integers • Tuple: (1, “a”, [100], “foo”) • An immutableordered sequence of items • Items can be of mixed types, including collection types • String: “foo bar” • An immutable ordered sequence of chars • Conceptually very much like a tuple • List: [“one”, “two”, 3] • A Mutableordered sequence of items of mixed types

  30. Similar Syntax • All three sequence types (tuples, strings, and lists) share much of the same syntax and functionality. • Key difference: • Tuples and strings are immutable • Lists are mutable • The operations shown in this section can be applied to all sequence types • most examples will just show the operation performed on one

  31. Sequence Types - 1 • Define tuples using parentheses and commas >>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’) • Define lists are using square brackets and commas >>> li = [“abc”, 34, 4.34, 23] • Define strings using quotes (", ', or """). >>> st = "Hello World" >>> st = 'Hello World' >>> st = """This is a multi-line string that uses triple quotes. """

  32. Sequence Types - 2 • Access individual members of a tuple, list, or string using square bracket “array” notation • Note that all are 0 based… >>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’) >>> tu[1] # Second item in the tuple. ‘abc’ >>> li = [“abc”, 34, 4.34, 23] >>> li[1] # Second item in the list. 34 >>> st = “Hello World” >>> st[1] # Second character in string. ‘e’

  33. Positive and negative indices >>> t = (23, ‘abc’, 4.56, (2,3), ‘def’) Positive index: count from the left, starting with 0 >>> t[1] ‘abc’ Negative index: count from right, starting with –1 >>> t[-3] 4.56

  34. Slicing: Return Copy of a Subset >>> t = (23, 'abc', 4.56, (2,3), 'def') Returns copy of container with subset of original members. Start copying at first index, and stop copying before the second index >>> t[1:4] ('abc', 4.56, (2,3)) You can also use negative indices (counting backward) >>> t[1:-1] ('abc', 4.56, (2,3))

  35. Slicing: Return Copy of a Subset >>> t = (23, 'abc', 4.56, (2,3), 'def') Omit first index to make a copy starting from the beginning of container >>> t[:2] (23, ‘abc’) Omit second index to make a copy starting at 1st index and going to end of the container >>> t[2:] (4.56, (2,3), ‘def’)

  36. Copying the Whole Sequence • [ : ] makes a copy of an entire sequence >>> t[:] (23, ‘abc’, 4.56, (2,3), ‘def’) • Note the difference between these two lines for mutable sequences >>> l2 = l1 # Both refer to same ref, # changing one affects both >>> l2 = l1[:] # Independent copies, 2 refs

  37. The ‘in’ Operator • Boolean test whether a value is inside a container: >>> t = [1, 2, 4, 5] >>> 3 in t False >>> 4 in t True >>> 4 not in t False • For strings, tests for substrings >>> 'TATA' in 'TATATATATATATATATATATATA' True >>> 'ATG' in 'TATATATATATATATATATATATA' False >>> 'AA' not in 'TATATATATATATATATATATATA' True • Careful: the in keyword is also used in the syntax of forloopsand list comprehensions

  38. + Operator is Concatenation • The + operator produces a new tuple, list, or string whose value is the concatenationof its arguments. >>> (1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6) >>> [1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6] >>> 'ACCTGAGAGCT' + 8*'A' 'ACCTGAGAGCTAAAAAAAA'

  39. Build-in functions vs. methods len() is a function on collections that returns the number of things they contain >>> len(['a', 'b', 'c']) 3 >>> len(('a','b','c')) 3 >>> len("abc") 3 index() is a method on collections that returns the index of the 1st occurrence of its arg >>> ['a’,'b’,'c'].index('a') 0 >>> ('a','b','c').index('b') 1 >>> "abc".index('c') 2 • Operations can be functions or methods • Remember that (almost) everything is an object • You just have to learn (and remember or lookup) which operations are functions, which are methods

  40. Other String operations • Many useful built-in functions • >>> mystring = 'ACCTGAGAGCT' • mystring.upper() • 'ACCTGAGAGCT' • >>> mystring.replace('GC', 'CG') • 'ACCTGAGACGT' • >>> set(mystring) • set(['A', 'C', 'T', 'G'])

  41. “blanks” Values to put in blanks Strings • “%” operator:sort of “fill in the blanks” operation:mystring="%s has %d marbles" % ("John",35) • mystring-> “John has 35 marbles” • %s replace with string • %d,%i replace with integer • %f replace with float

  42. Exercise • count.py: dna ="ATGaCGgaTCAGCCGcAAtACataCACTgttca" GC content? • dna = "ATGaCGgaTCAGCCGcAAtACataCACTgttca" • dna1 = dna.upper() • (dna1.count('G') + dna1.count('C'))*1.0 / len(dna1)

  43. Exercise • transcribe.py: dna =“ATGaCGgaTCAGCCGcAAGcGGaattGGCGACataa" rna = ???; • dna = "ATGaCGgaTCAGCCGcAAtACataCACTgttca" • rna = dna.upper() • rna1 = rna.replace('A', 'a') • rna = rna1.replace('T', 'A') • rna1 = rna.replace('C', 'c') • rna = rna1.replace('G', 'C') • rna1 = rna.replace('a', 'U') • rna= rna1.replace('c', 'G') • rna[::-1] # reverse rna

  44. Exercise A valid DNA sequence? dna ="ATGaCGgaTDCUAGCCPGcAAtACataCACTngttca"

  45. Mutability:Tuples vs. Lists

  46. Lists are mutable >>> li = [‘abc’, 23, 4.34, 23] >>> li[1] = 45 >>> li[‘abc’, 45, 4.34, 23] • We can change lists in place. • Nameli still points to the same memory reference when we’re done.

  47. Tuples are immutable >>> t = (23, ‘abc’, 4.56, (2,3), ‘def’) >>> t[2] = 3.14 File "C:\Users\duan\Desktop\CS445\demos\Ch0\tmp.py", line 1 t = (23, ‘abc’, 4.56, (2,3), ‘def’) ^ SyntaxError: invalid syntax • You can’t change a tuple. • You can make a fresh tuple and assign its reference to a previously used name. >>> t = (23, ‘abc’, 3.14, (2,3), ‘def’) • Immutability of tuples they are faster than lists

  48. Tuple details • The comma is the tuple creation operator, not parens >>> 1, (1,) • Python shows parens for clarity (best practice) >>> (1,) (1,) • Don't forget the comma for singletons! >>> (1) 1 • Empty tuples have a special syntactic form >>> () () >>> tuple() ()

  49. Tuples vs. Lists • Lists slower but more powerful than tuples • Lists can be modified and they have many handy operations and methods • Tuples are immutable & have fewer features • Sometimes an immutable collection is required (e.g., as a hash key) • Tuples used for multiple return values and parallel assignments x,y,z = 100,200,300 old,new = new,old • Convert tuples and lists using list() and tuple(): mylst = list(mytup); mytup = tuple(mylst)

  50. Lists methods • Lists have many methods, including index, count, append, remove, reverse, sort, etc. • Many of these modify the list >>> l = [1,3,4] >>> l.append(0) # adds a new element to the end of the list >>> l [1, 3, 4, 0] >>> l.insert(1,200) # insert 200 just before index position 1 >>> l [1, 200, 3, 4, 0] >>> l.reverse() # reverse the list in place >>> l [0, 4, 3, 200, 1] >>> l.sort() # sort the elements. Optional arguments can give >>> l # the sorting function and direction [0, 1, 3, 4, 200] >>> l.remove(3) # remove first occurence of element from list >>> l [0, 1, 4, 200]

More Related