1 / 53

Systems and Synthetic Biology: A programming languages point of view

Systems and Synthetic Biology: A programming languages point of view. Saurabh Srivastava Assistant Research Engineer Computer Science + Bioengineering University of California, Berkeley. In perspective. One view: biological specifications. One view: syntax of components.

ora
Download Presentation

Systems and Synthetic Biology: A programming languages point of view

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systems and Synthetic Biology: A programming languages point of view Saurabh Srivastava Assistant Research Engineer Computer Science + Bioengineering University of California, Berkeley

  2. In perspective One view: biological specifications One view: syntax of components One view: semantics of components • Systems biology • Modeling complete biological processes • Genomics • Reading DNA sequences got incredibly fast, and cheap • Algorithms for sequence data analytics within organisms • Synthetic biology • DNA synthesis got incredibly cheap • Functional characterization on the way • Algorithms to predict tweaks to organisms

  3. Different scales in biology Organisms Intercellular processes Intracellular processes Protein interactions Protein function Systems Biology operates here Synthetic Biology operates here

  4. Results and techniques used Intercellular processes Intracellular processes Protein interactions Protein function Systems Biology operates here Synthetic Biology operates here

  5. Results and techniques used Model of cell communication allows understanding cancer Constructed a bacteria that produces Paracetamol/Depon Results: Automatically generating concurrent programs using input-output examples Big data analysis and abstraction Techniques: Intercellular processes Intracellular processes Protein interactions Protein function Systems Biology operates here Synthetic Biology operates here

  6. PART I Synthesizing models for Systems Biology

  7. Synthesizing Systems Biology “Programs” Synthesizing concurrent programs from examples Programs ≡ biological models Examples ≡ biological experiments We assist natural sciences with formal methods • Given experiments, are there other models? • If so, compute a new, disambiguating experiment Part I: how stem cells coordinate their fates

  8. Understanding Diseases • “Cancer is fundamentally a disease of failure of regulation of tissue growth. In order for a normal cell to transform into a cancer cell, the genes which regulate cell growth and differentiation must be altered.” – from Wikipedia • Research on cell differentiation helps understanding diseases such as cancer.

  9. C. elegans: A Model Organism Earthworm used in developmental biology. 959 cells; its organs found in other animals. Differentiation studied on vulval development.

  10. Identical precursor cells collaborate to decide their fate Initial division of embryo Differentiation and then development into organ parts

  11. Modeling Goal What is the mechanism (program) within each cell fordeciding fates through communication?

  12. Building Blocks of these Programs Cells contain communicating proteins. Protein interaction: a protein senses the concentration of other proteins. Interaction is either activation or inhibition. B B A A

  13. How the Vulval Cells Differentiate Anchor Cell low med high If cells sense the same signal strength, data races occur. sem-5 sem-5 sem-5 let-23 let-23 let-23 lst lst lst let-60 let-60 let-60 ... mpk-1 mpk-1 mpk-1 lin-12 lin-12 lin-12 3º 2º 1º

  14. How Biologists Discover Interactions • Measuring protein levels over time is infeasible • If such “cell tracing” is infeasible, infer protein interaction from end-to-end experiments • That is, mutate cells observe resulting fates

  15. A Mutation Experiment Anchor Cell low med high let-23 let-23 let-23 ... lin-12 lin-12 lin-12 3º 2º 1º 1º

  16. Putting Experiments Together Fate of six neighboring cells No protein is mutated. lin-12 is turned off. Multiple outcomes observed Experiments over 35 years by 11 groups

  17. How to Build Accurate Models? Inhibition discovered by predictive modeling [Fisher et al. 2007] Anchor Cell low med high sem-5 sem-5 sem-5 let-23 let-23 let-23 lst lst lst let-60 let-60 let-60 ... mpk-1 mpk-1 mpk-1 lin-12 lin-12 lin-12 3º 2º 1º

  18. Semantics of the Modeling Language • Program has cells • Non-deterministic outcomes via schedule interleaving Cell 1 Cell 2 let-23 sem-5 • Cell has proteins • All proteins advance synchronously lst let-60 lin-12 mpk-1 • Proteins have discrete state and update functions.

  19. Synthesizing Cellular Programs

  20. Synthesis of Programs Given as a partial program biological insight synthesizer completed program specification

  21. Partial Programs Partial programs express biological insight: • Which proteins are in the cell • Which proteins may interact Update functions can be unknown. sem-5 ?? lst let-60 let-23 mpk-1 lin-12

  22. Synthesis Algorithm

  23. Classical CEGIS initial input/output set candidate solution UNSAT SAT synthesizer verifier SAT add input-output counterexample UNSAT

  24. Correctness Condition Safety: all schedules must lead the program to produce experiment outcomes observed in the wet lab. ∀mutation m. ∀schedule s. P(m, s)∈E(m) Completeness: each observed experiment outcome must be reproducible by the program for some schedule. ∀mutation m. ∀fate f∈E(m). ∃schedule s. P(m, s) = f

  25. Counterexample-Guided Inductive Synthesis initial set of input/output examples no candidate completion inductive synthesizer counterexample: execution P(m, s) with bad outcome counterexample: observation (m, f) not reproducible candidate completion completeness verifier safety verifier ok

  26. Verifying for Safety • Safety: ∀mutation m. ∀schedule s. P(m, s)∈E(m) • Attempt to disprove by searching for a demonic schedule: ∃mutation m. ∃schedule s. P(m, s) ∉ E(m) Unroll over the set of performed experiments Search symbolically for a demonic schedule

  27. Verifying for Completeness • Completeness: ∀mutation m. ∀fate f∈E(m). ∃schedule s. P(m, s) = f • Attempt to disprove by showing lack of an angelic schedule for some outcome: ∃mutation m. ∃fate f∈E(m). ∀schedule s. P(m, s) ≠ f Unroll over pairs of mutation and fate Query symbolically for an angelic schedule

  28. Synthesized Models • We synthesized two models of VPCs. • Input: Partial model that specifies known, simple protein behaviors. • Output: Synthesized update functions for two key proteins. model 2 model 1

  29. Does there exist alternative model that differs on new experiment? Additional Algorithms for Going Beyond Synthesisto Assist Scientists What are the minimal number of experiments that constrain the space to current model? For two or more models within the space, do there exist disambiguating experiments?

  30. PART II Predicting DNA insertions for Synthetic Biology

  31. Synthetic Biology inserts DNA Applications enabled • Microbial chemical factories • Sustainable bacterial production of chemicals. E.g., drugs, polymers • Bacteria as a sensor • Agricultural apps: e.g., sense nitrogen depletion in soil, change color • Tumor killing bacteria • Sense multiple environmental factors (lower oxygen, high lactic acid) • Invade cancerous cell • Release drug inside cell

  32. Does it actually work? Sugar Tyenol • Jay D. Keasling • Artemisinin: from Artemisia annuato Yeast • Amyris company: 219M incidence, 300M cure target • 8 years of work; manual insights • Computationally predicted • Our tylenolE. coli strain

  33. How do cells manipulate chemicals Incoming chemicals Some proteins are enzymes Output chemicals

  34. Changing the cellular chemical machinery What happens when we add unnatural/external enzymes?

  35. + Sugar Sugar DNA synthesis + Plasmid Transformation Bio repositories Data dedup/correlation + Search Tylenol Current status Acetaminophen Opportunity Abstractions or rules from data • Rule application to predict Polymers Nylon etc. Biofuels

  36. Prediction for Tylenol 4ABH gene from Mushroom Chorismate pathway 4-aminobenzoate 4-aminophenol Tylenol

  37. 4ABH gene inserted Tylenol 4AP

  38. Sugar Opportunity Abstractions or rules from data • Rule application to predict Polymers Nylon etc. Biofuels

  39. Biochemical rules/abstractions?

  40. Biochemical rules/abstractions? Graph transformation function f f’ f’’ fclass * * * * Operator taking (chemical) graph and transforming it

  41. Open problem 1: Language for chemicals and transforms C1=CC=CC=C1 3D 2D 1D XYZ, CML, PDB SMILES, SYBYL Good for crystallographers Good for biochemists Good for computational storage, retrieval

  42. Alternative new representation Need midway encoding * * * * • Transformation representation • encmolecule • Be able to efficiently compute fclass(encmolecule) • Conflicting objectives: • Trace-based encoding • Difficulty at cycle cuts • True graph encoding • Subgraph isomorphism

  43. Applying reaction operators We can now predict new edges: I.e., external enzymes or even new constructed enzymes

  44. Open problem 2: Deriving biochemical programs • fclass(encmolecule) is a single step • Function composition to get “pathway programs” • fclass0(fclass1 (fclass2 (fclass3 (encmolecule)))) • Complications • Path can be “acyclic” not just straight-line • Mixed concrete and rule instantiation

  45. Crux of the research problem Intelligent rule instantiation: ------ edges Given target chemical Need better search methods Phrase it as a model checking reachability problem: initial experiments point to a need for more efficient representation of search space.

  46. Eventual targets Arbekacin (semi-synthetic) MRSA Drug $20,900/gram  Amikacin (natural) Nothing interesting $118/gram

  47. Modular decomposition through function summaries Arbekacin (semi-synthetic) MRSA Drug $20,900/gram  Amikacin (natural) Nothing interesting $118/gram

  48. Acknowledgements

  49. http://pathway.berkeley.edu Sugar Sugar + DNA synthesis + Plasmid Transformation Bio repositories Data dedup/correlation + Search Tylenol Current status Acetaminophen Opportunity Chemical transformation functions from data Language for biochemistry Synthesis of biochemical program (i.e., DNA modifications) Synthesis of internals of transformation function Polymers Nylon etc. Biofuels

  50. Backup slides

More Related