1 / 17

Biological Language Modeling Toolkit “Graphing Utilities”

Biological Language Modeling Toolkit “Graphing Utilities”. by: Danny Lam. Overview. BLMT Ex : Computes association measures in protein sequences Graphing Utilities Display how well the association measures or other data ( known or surmised ) feature boundaries

Download Presentation

Biological Language Modeling Toolkit “Graphing Utilities”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological Language Modeling Toolkit“Graphing Utilities” by: Danny Lam

  2. Overview • BLMT Ex: Computes association measures in protein sequences • Graphing Utilities • Display how well the association measures or other data (known or surmised) feature boundaries • Step 1: Automatic extraction of feature boundaries from given source files • Step 2: Plot data along with feature positions along a sequence

  3. BLMT : Mutual Information • Mutual Information-> Computes "mutual information”, which is a measure of association between adjacent amino acids. • Input: amino acid sequence file(s) • (ex) Swiss prot SW datasets • Output: file.mi.out.av -> • first column is position in sequence • second column is mutual information value associated with that position

  4. Feature Positions • Extract feature position information (via Swiss-prot) • Extracellular (EC), • Cytoplasmic (CP), • Helices (H) --> label where the EC, CP, and H regions are in the sequence.

  5. DR PROSITE; PS00238; OPSIN; 1. KW Photoreceptor; Retinal protein; Transmembrane; Glycoprotein; Vision; KW Phosphorylation; Lipoprotein; Palmitate; G-protein coupled receptor; KW Acetylation; Retinitis pigmentosa; Disease mutation. FT DOMAIN 1 36 EXTRACELLULAR. FT TRANSMEM 37 61 1 (POTENTIAL). FT DOMAIN 62 73 CYTOPLASMIC. FT TRANSMEM 74 98 2 (POTENTIAL). FT DOMAIN 99 113 EXTRACELLULAR. FT TRANSMEM 114 133 3 (POTENTIAL). FT DOMAIN 134 152 CYTOPLASMIC. FT TRANSMEM 153 176 4 (POTENTIAL). FT DOMAIN 177 202 EXTRACELLULAR. FT TRANSMEM 203 230 5 (POTENTIAL). FT DOMAIN 231 252 CYTOPLASMIC. FT TRANSMEM 253 276 6 (POTENTIAL). FT DOMAIN 277 284 EXTRACELLULAR. FT TRANSMEM 285 309 7 (POTENTIAL). FT DOMAIN 310 348 CYTOPLASMIC. FT MOD_RES 1 1 ACETYLATION (BY SIMILARITY). FT CARBOHYD 2 2 N-LINKED (GLCNAC...) (BY SIMILARITY). FT CARBOHYD 15 15 N-LINKED (GLCNAC...) (BY SIMILARITY). FT DISULFID 110 187 BY SIMILARITY. FT BINDING 296 296 RETINAL CHROMOPHORE.

  6. Problems/Solution • Problems: -Making one subplot graph (MATLAB) requires program customization - Generation of multiple subplots together requires more tedious work. Waste of time and effort. • Solution: -Need clear interface to generate subplot graphs for you w/o writing tedious matlab code.

  7. [a1,b1]=textread(’test.out', '%d %f'); hold on subplot(1,1,1); hold on hh1 = plot(a1, b1, 'linewidth',2.5); hold on ylabel('yaxis','fontsize',16, 'Color','k','fontweight','bold'); set(hh1, 'MarkerSize',5); set(gca, 'YLim',[-1, 3]); %set(gca,'ytick',[-.6,-.2,.2] xdash = [NaN,62,73,NaN,134,152,NaN,231,252,NaN,310,348]; %cp ydash = (-.2)*(ones(size(xdash))); line(xdash,ydash,'color','y','linewidth',3); xdash = [1,36,NaN,99,113,NaN,177,202,NaN,277,284,NaN]; %ec ydash = (-.2)*(ones(size(xdash))); line(xdash,ydash,'color','r','linewidth',3); hold on xlabel('x_axis','fontsize',16, 'Color','k'); print -dpsc -r0 sample;

  8. Design Capabilities • Access multiple mutual information output datasets • Display combination of EC/CP/H position information on MI datasets (color coded) • Specify range (Y limits) and naming conventions (X axis) • Output into convenient picture files (ex: .tiff file).

  9. Subplotter • Version 1: (In house use only) -Initially the program takes as input: --> .SW file: (EC/CP/H) --> .m file: (MATLAB file that code will be generated in)

  10. Subplotter ( Version 1) *********************************** How many output files to textread: 1 What is the file to be textread into matlab program [output file 1]: opsdh_1gpcr.out How many TOTAL subplots do you request?: 1 ************************************

  11. Subplotter ( Version 1) ********************* Subplot(1,1,1) ********************* Which file do you want results to be graphed on this subplot?: 0: opsdh_1gpcr.out Make selection (0): 0 ++++++++++++++++++++++++++++++++++++++++++++++ How many items (EC,CP,H) do you want plotted (1,2, 3: GPCR, 4: Loops)?: +++++++++++++++++++++++++++++++++++++++++++++++ --> 3

  12. Subplotter ( Version 1) Specify Y-Axis Label? (y/n): n Y-Axis Label: GPCR Specify YLim? (y,n): n Give name to X-Axis: sample Give name to .tiff file for output (no extension!): sample Matlab Program completed! wait ...

  13. Subplotter (Version 1)

  14. Current/Future Work • Generate graphing utility for every tool on the BLMT website.

  15. Questions?

More Related