180 likes | 1.06k Views
SEQUENCE RETRIEVAL SYSTEM SRS. Tuomas Hätinen. Motivation. Sequencing information. genetics. S tructural biology. molecular biology. medicine. physiology. toxicology. gene expression. Motivation. There are 3 main sequence retrieval systems: SRS (highly recommended)
E N D
SEQUENCE RETRIEVAL SYSTEM SRS Tuomas Hätinen
Motivation Sequencing information genetics Structural biology molecularbiology medicine physiology toxicology gene expression
Motivation • There are 3 main sequence retrieval systems: • SRS (highly recommended) • Entrez (easier to use but more limited) • DBGet (less recommended) • This is a workshop on using SRS • Start one of the servers below: • http://srs.ebi.ac.uk • http://csc-fserve.hh.med.ic.ac.uk/srs71 • http://walnut.bioc.columbia.edu/srs7/ • http://emb2.bcc.univie.ac.at:8080/srs/ • http://oryx.ulb.ac.be:8080/srs Full list of srs servers available from: http://downloads.lionbio.co.uk/publicsrs.html
What is SRS?: Introduction • Central resource for molecular biology data • Data retrieval system - more than 250 databanks have been indexed. More than 35 SRS servers over the WWW • Data analysis applications server - 11 protein applications - 6 nucleic acid applications • Uniform query interface on the web
What is SRS?: History • 1990 - Main author Dr. Thure Etzold • Development started in EMBL, Heidelberg • 1997 • Moved to EBI in Cambridge. Development work was supported by various grants amongst others from the EMBnet. • 1998 • Etzold and his group join LionBiosciences
Why SRS? • Information retrieval • Easy way to retrieve information from sequence and sequence-related databases • Possibility to search for multiple words/other criteria • Linkage between different databases • E.g. Find all primary structures with known three-dimensional structure • ... and much more
Comments • SRS is both a simple and complicated tool with a number of features. • Can take a few days to get accustomed to. • We will run through some important features during the lecture. • We will apply these features as well as other new ones in the practical session.
What can you do in SRS that you can’t do in UniProt • Sophisticated searches: eg wildcard searches, regexp searches • SRS consolidates multiple databases. • Many tools are available in SRS • Saving of projects • Why bother with Uniprot? Speed.
Temporary Projects • Queries and views are stored by the project manager temporarily • Temporary sessions last 24 hours • Useful when you: • Do not need to keep your results • look something up quickly • Run an occasional application • Click on ‘Start’ paw on SRS start page
Some examples /^glu/ will find terms beginning with ‘glu’ /ase$/ will find terms ending with ‘ase’ /c.t/ will find the words cat, cot, cut……. /c.*t/ will find terms beginning with ‘c’ and then any number of characters and ending with ‘t’ /sm[iy]th/ will find the words ‘smith’ or ‘smyth’ /rho[1-9]/ will find the word ‘rho’ followed by a number from 1-9 /mue?ller/ will find ‘muller’ or ‘mueller’ NB. The ‘*’ symbol has two meanings: -within forward slashes ‘/’ it means the preceding group may be repeated zero or more times - outside forward slashes it means any character
SRS Query syntax • SRS indexes database records using a ‘word by word’ approach. - DE Human glutathione transferase • The SRS description index will contain terms ‘human’, ‘glutathione’ and ‘transferase’.
Boolean operators • (&) AND : ‘human & glutathione & transferase’ • (|) OR: ‘human | glutathione | transferase’ • (!) BUTNOT : ‘human ! glutathione ! transferase’
Wildcards • These are useful when: • Searching for a group of words (eg. Words starting ‘cell’ and ending ‘ase’ : cell*ase) • If unclear about how a word is spelt in a database • Two types: • * one or more characters of any value • ? Single character of any value • Any number of wildcards can be placed anywhere in a search word • Placing a wildcard at the start of a word or string may increase response time because all words in the index have to be checked against the string
SRS Regular expressions • NB: Must appear within forward slashes (/) • Some operators: ^ marks the start of a string /^glu/ begins with ‘glu’ $ marks the end of a string /ase$/ ends with ‘ase’ . dot is any single character […] characters in square brackets are regarded as a set, any of which can be matched [0-9] specifies a range of 1 to 9 * the preceding group may be repeated zero or more times + the preceding group may be repeated one or more times ? The preceding character/group occurs one or zero times