1 / 328

CS 3016 Compilers

CS 3016 Compilers. PhD. Hieu Chi Nguyen Fall 2008. Levels of language in computing. High-Level Programming Languages. Compiler. Machine Code. Error Messages. Then What?. Machine Code. Program Inputs. Program Outputs. What is a Compiler?. Source Code. Interpreter. Program Outputs.

Download Presentation

CS 3016 Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 3016Compilers PhD. Hieu Chi Nguyen Fall 2008

  2. Levels of language in computing

  3. High-Level Programming Languages Compiler Machine Code Error Messages Then What? Machine Code Program Inputs Program Outputs What is a Compiler?

  4. Source Code Interpreter Program Outputs Program Inputs What is an Interpreter?

  5. Source Code IR Generator Intermediate Program Virtual Machine Program Outputs Program Inputs What is a Just-in-Time Compiler?

  6. Why Study Compilers? • Fundamental tool in computer science since 1952 • Remains a vibrant research topic • Machines keep changing • Languages keep changing • Applications keep changing • When to compile keeps changing • Challenging! • Must correctly handle an infinite set of legal programs • Is itself a large program → “Where theory meets practice”

  7. Goals of a Compiler • A compiler’s job is to • Lower the abstraction level • Eliminate overhead from language abstractions • Map source program onto hardware efficiently • Hide hardware weaknesses, utilize hardware strengths • Equal the efficiency of a good assembly programmer • Optimizing compilers should improve the code • Performance* • Code size • Security • Reliability • Power consumption

  8. Compiler High-Level Programming Languages Machine Code An Interface to High-Level Languages • Programmers write in high-level languages • Increases productivity • Easier to maintain/debug • More portable • HLLs also protect the programmer from low-level details • Registers and caches – the register keyword • Instruction selection • Instruction-level parallelism • The catch: HLLs are less efficient

  9. High-Level Languages and Features • C (80’s) … C++ (Early 90’s) … Java (Late 90’s) • Each language had features that spawned new research • C/Fortran/COBOL • User-defined aggregate data types (arrays, structures) • Control-flow and procedures • Prompted data-flow optimizations • C++/Simula/Modula II/Smalltalk • Object orientation (more, smaller procedures) • Prompted inlining • Java • Type safety, bounds checking, garbage collection • Prompted bounds removal, dynamic optimization

  10. An Interface to Computer Architectures Compiler High-Level Programming Languages Machine Code • Parallelism • Instruction level • multiple operations at once • want to minimize dependences • Processor level • multiple threads at once • want to minimize synchronization • Memory Hierarchies • Register allocation (only portion explicitly managed in SW) • Code and data layout (helps the hardware cache manager) • Designs driven by how well compilers can leverage new features!

  11. How Can We Translate Effectively? High-Level Source Code ? Low-Level Machine Code

  12. Idea: Translate in Steps • Series of program representations • Intermediate representations optimized for various manipulations (checking, optimization) • More machine specific, less language specific as translation proceeds

  13. Simplified Compiler Structure Source code (character stream) if (b==0) a = b; Lexical Analysis Token stream Front End Parsing Machine independent Abstract syntax tree Intermediate Code Generation Intermediate code Optimization Back End Machine dependent LIR Assembly code (character stream) CMP CX, 0 CMOVZ CX, DX Register Allocation

  14. hello.c Hello x86 Front End Back End hello.cc IR Hello alpha hello.f Hello sparc hello.ada Why Separate the Front and Back Ends? • Recall: An interface between HLLs and architectures • Option: X*Y compilers or X-front ends + Y-back ends hello.c Hello x86 Compiler hello.cc X FEs Y BEs Hello alpha hello.f Hello sparc hello.ada

  15. Structure of a Compiler • Front End • Lexical Analysis • Syntax Analysis • Semantic Analysis • Intermediate Code Generation • Back End • Code Optimization • Code Generation

  16. S o m e o n e b r e a k s t h e i c e final := initial + rate * 60 Someone breaks the ice id1 := id2 + id3 * 60 Lexical Analysis

  17. id1 := id2 + id3 * 60 := id1 + id2 * id3 60 Syntax Analysis Someone breaks the ice sentence subject verb object Someone breaks the ice

  18. := id1 + id2 * id3 60 := + id1 * id2 i2r id3 60 Semantic Analysis Someone plays the piano (meaningful) The piano plays someone (meaningless)

  19. := + id1 * id2 i2r id3 60 temp1 := i2r ( 60 ) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 Intermediate Code Generation Most common intermediate representation 􀂄 Syntax trees 􀂄 Directed acyclic graphs (DAG) 􀂄 Postfix notation 􀂄 Three-address code Someone breaks the ice A?

  20. temp1 := i2r ( 60 ) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 temp1 := id3 * 60.0 id1 := id2 + temp1 Code Optimization A? Ab?

  21. temp1 := id3 * 60.0 id1 := id2 + temp mov id3, r2 mul #60.0, r2 mov id2, r1 add r2, r1 mov r1, id1 Code Generation Ab? Abc?

  22. Intermediate Code Gen IR Internal Compiler Structure –Front End • Series of filter passes • Source program – Written in a HLL • Lexical analysis – Convert keywords into “tokens” • Parser – Forms a syntax “tree” (statements, expressions, etc.) • Semantic analysis – Type checking, etc. • Intermediate code generator – Three-address code, interface for back end Source Program Lexical Analyzer Token Stream Parser Syntax Tree Semantic Analyzer Syntax Tree

  23. Internal Compiler Structure –Back End IR Code Optimizer IR Code Generator Target program Code optimization – “improves” the intermediate code (most time is spent here) • Consists of machine independent & dependent optimizations • Code generation • Register allocation, instruction selection

  24. Traditional Compiler Infrastructures • GNU GCC • Targets: everything (pretty much) • Strength: Targets everything (pretty much) • Weakness: Not as extensible as research infrastructures, poor optimization • Stanford SUIF Compiler with Harvard MachSUIF • Targets: Alpha, x86, IPF, C • Strength: high level analysis, parallelization on scientific codes • Intel Open Research Compiler (ORC) • Targets: IPF • Strength: robust with OK code quality • Weakness: Many IR levels

  25. Modern Compiler Infrastructures • IBM Jikes RVM • Targets Linux, x86 or AIX • Strengths: Open-source, wide user base • Weaknesses: In maintenance mode • Microsoft Phoenix • Targets Windows • Strengths: Actively developed • Weaknesses: Closed source, extensive API

  26. What will we learn in this course? • Structure of Compilers • The Front End • The Back End

  27. Required Work • Phaàn Lyù thuyeát: • SV hoïc 45 tieát lyù thuyeát • Phaàn Thöïc haønh: • SV tham döï thöïc haønh – thöïc hieän Baøi taäp Moân hoïc • (1 Baøi taäp Moân hoïc / 1 SV) • Hình thöùc ñaùnh giaù: • Kieåm tra Baøi taäp Moân hoïc  Ñieåm TH • Thi vieát Lyù thuyeát cuoái kyø  Ñieåm LT • Điểm quaù trình hoïc taäp (10%+10%) • Caùch tính ñieåm: • Ñieåm toång keát moân = LT * 40% + TH * 40%+ 10%+10%

  28. Course Materials 1) Alfred V.Aho, Jeffrey D.Ullman (2003|1986). Compilers, Principles techniques, and tools. Addison – Wesley Publishing Company. 2) Phan Thò Töôi (2003). Trình Bieân Dòch.Ñaïi hoïc Baùch Khoa TP. Hoà Chí Minh • Other Helpful Books

  29. Next Time… • Read Compilers Chapter 3 • We will begin discussing lexical analysis • Look out for Assignment

  30. 2. Söï phaân tích cuù phaùp ñoaùn nhaän tröôùc Daïng ñaëc bieät cuûa phaân tích cuù phaùp töø treân xuoáng laø phöông phaùp ñoaùn nhaän tröôùc. Phöông phaùp naøy seõ nhìn tröôùc moät kyù hieäu nhaäp ñeå quyeát ñònh choïn thuû tuïc cho kyù hieäu khoâng keát thuùc töông öùng. Thí duï 2.8. Cho vaên phaïm G: P: S  xA A  z  yA Duøng vaên phaïm G ñeå phaân tích caâu nhaäp xyyz Baûng 2.1. Caùc böôùc phaân tích cuù phaùp cuûa caâu xyyz

  31. Thí duï 2.9. Cho vaên phaïm vôùi caùc luaät sinh nhö sau : S  A  B A  xA  y B  xB  z Baûng 2.2. Phaân tích cuù phaùp cho caâu xxxz khoâng thaønh coâng

  32. - Ñieàu kieän 1 : A 1    ... n - Ñònh nghóa: first (i) = s  s laø kyù hieäu keát thuùc vaø   s… Ñieàu kieän 1 ñöôïc phaùt bieåu nhö sau : A  1    ...  n first (i)  first (j) =  vôùi i  j Löu yù: 1. first (a ) = a 2. Neáu A 1    …  n; thì first (A) = first (1)  first (2) ...  first (n) Thí duï 2.11. Cho vaên phaïm G coù taäp luaät sinh: S  Ax A  x   vôùi  laø chuoãi roãng Baûng 2.3. Phaân tích caâu nhaäp : x

  33. Söï phaân tích thaát baïi - Ñieàu kieän 2: first (A)  follow (A) =  Vôùi A 1    …  n  Follow (A) ñöôïc tính nhö sau: Vôùi moãi luaät sinh Pi coù daïng X  A thì follow (A) laø first ( ). ÔÛ thí duï 2.11 first (A)  follow (A) = x Löu yù vaên phaïm coù ñeä quy traùi seõ vi phaïm ñieàu kieän 1. Thí duï: A  B  AB (2.1) Vaäy first (A) = first (B) ; first (AB) = first (A) = first (B). first (B)  first (AB)  vi phaïm ñieàu kieän 1. Neáu söûa luaät (2.1) thaønh A    AB thì seõ vi phaïm ñieàu kieän 2. Thí duï 2.12. Cho vaên phaïm nhö ôû thí duï 2.6, chuùng ta duøng phöông phaùp phaân tích ñoaùn nhaän tröôùc ñeå phaân tìch caâu array[num dotdot num] of integer (töï xem ôû trang 41). Caùc thuû tuïc ñöôïc goïi khi sinh caây phaân tích cho caùc caâu thuoäc vaên phaïm ôû thí duï 2.12.

  34. CHÖÔNG 3 PHAÂN TÍCH TÖØ VÖÏNG 3.1. Vai troø cuaû boä phaân tích töø vöïng 1. Token, maãu, trò töø vöïng Baûng 3.1Baûng danh bieåu cuûa token

  35. Hình 3.1. Söï giao tieáp giöõa boä phaân tích töø vöïng vaø boä phaân tích cuù phaùp 3.2. CAÙC TÍNH CHAÁT CUÛA TOKEN 3.3. CHÖÙA TAÏM CHÖÔNG TRÌNH NGUOÀN 1. Caëp boä ñeäm Caáu taïo Chöông trình nguoàn token Boä phaân tích töø vöïng Boä phaân tích töø vöïng yeâu caàu token Baûng danh bieåu

  36. Hình 3.2. Caëp boä ñeäm Quy trình hoaït ñoäng Giaûi thuaät: if p2 ôû ranh giôùi moät nöûa boä ñeäm then begin laáp ñaày N kyù hieäu nhaäp môùi vaøo nöûa beân phaûi p2 := p2 + 1; end else if p2 ôû taän cuøng beân phaûi boä ñeäm then begin laáp ñaày N kyø hieäu nhaäp vaøo nöûa beân traùi boä ñeäm chuyeån p2 veà kyù töï taän cuøng beân traùi cuûa boä ñeäm end else p2 := p2 + 1; p1 p2

  37. 2. Phöông phaùp caàm canh N kyù töï N kyù töï p1 p2 Hình 3.3. Caëp boä ñeäm theo phöông phaùp caàm canh Giaûi thuaät:p2 := p2 + 1; if p2 ^ eof then if p2 ôû ranh giôùi moät nöûa boä ñeäm then begin chaát ñaày N kyø hieäu nhaäp vaøo nöûa beân phaûi boä ñeäm; p2 := p2 + 1 end

  38. else if p2 ôû taän cuøng beân phaûi boä ñeäm then begin laáp ñaày N kyù hieäu vaøo nöû beân traùi boä ñeäm; chuyeån p2 veà ñaàu boä ñeäm endelse /* döøng söï phaân tích töø vöïng */ 3.4. Ñaëc taû tokenCaùc quy taéc ñònh nghiaõ bieåu thöùc chính quy 1.  laø bieåu thöùc chính quy, bieåu thò cho taäp  2. a laø kyù hieäu thuoäc , bieåu thò cho taäp a 3. r vaø s laø hai bieåu thöùc chính quy, bieåu thò cho L (r) vaø L (s) thì:ø a) (r)  (s) laø bieåu thöùc chính quy, bieåu thò cho L(r)  L(s). b) (r) (s) laø bieåu thöùc chính quy, bieåu thò cho L(r) L(s). c) (r)* laø bieåu thöùc chính quy, bieåu thò cho (L(r))*. d) r laø bieåu thöùc chính quy, bieåu thò cho L(r).

  39. Thí duï 3.1. Cho  = a, b 1. ab 2. (a b)  (b a) 3. a*Hai bieåu thöùc chính quy töông ñöông r vaø s, kyù hieäu r = s. 2. Ñònh nghóa chính quy Neáu  laø taäp kyù hieäu caên baûn, thì ñònh nghiaõ chính quy laø chuoãi ñònh nghiaõ coù daïng: d1  r1 …… dn  rn Thí duï 3.2. letter  A  B  …Z  a b … z digit  0 1 … 9 id  letter ( letter  digit)* Thí duï 3.3.digit  0  1 …  9 digits  digit digit* optional_fraction  .digits   optional_exponent  (E (+ - ) digits)  

  40. 3.5. Nhaän daïng token Thí duï 3.4. Cho vaên phaïm G: stmt  if exp then stmt  if exp then stmt else stmt   exp  term relop term  term term  id  num Ñònh nghóa chính quy if  if then  then else  else relop  <  <=  >  >=  <>  = id  letter (letter  digit)* num  digit+ (.digit+  ) ( E ( +  -  ) digit+   ) delim  blank  tab  newline ws  delim+ Töø ñònh nghóa chính quy ta xaây döïng baûng maãu cho token nhö ôû baûng 3.3 trang 74.

  41. 3.6. Sô ñoà dòch 1. Mieâu taû 4 8 3 5 8 6 7 2 7 0 1 6 0 = > Baét ñaàu Hình 3.4. Sô ñoà dòch cho >= vaø = * other Start < = return (relop, LE) > return (relop, NE) other * return (relop, LT) = return (relop, EQ) > = (relop, EQ) * other return (relop, EQ) Hình 3.5. Sô ñoà dòch nhaän daïng token relop

  42. 3.7. Ngoân ngöõ ñaëc taû cho boä phaân tích töø vöïng Chöông trình nguoàn trong ngoân ngöõ Lex Lex – yy.c Lex. I Lex – yy.c a.out Doøng nhaäp chuoãi token Hình 3.9. Xaây döïng boä phaân tích töø vöïng Lex-ñaëc taû Moät chöông trình Lex bao goàm ba phaàn : - Khai baùo - %% caùc quy taéc bieân dòch - %% caùc chöông trình con phuï trôï

  43. Löu yù: - Phaàn khai baùo bao goàm khai baùo haèng, bieán bieåu thò vaø caùc ñònh nghóa chính quy. - Phaàn quy taéc bieân dòch laø caùc phaùt bieåu coù daïng: p1haønh vi ngöõ nghóa 1 p2haønh vi ngöõ nghóa 2 …… pn haønh vi ngöõ nghóa n 3.8. Automat höõu haïn 1. Automat höõu haïn khoâng taát ñònh (NFA) Thí duï: Cho NFA: Taäp traïng thaùi S = 0, 1,2, 3;  = a, b; Traïng thaùi baét ñaàu so = 0; Taäp traïng thaùi keát thuùc F = 3.

  44. Baûng 3.4.Baûng truyeàn cho NFA ôû hình 3.10 NFA chaáp nhaän moät chuoãi nhaäp x neáu vaø chæ neáu toàn taïi moät ñöôøng naøo ñoù trong sô ñoà töø traïng thaùi baét ñaàu ñeán traïng thaùi keát thuùc sao cho taát caû teân cuûa caùc caïnh con ñöôøng cho chuoãi x. NFA chaáp nhaän chuoãi aabb. 2. Automat höõu haïn taát ñònh(DFA) DFA laø tröôøng hôïp ñaëc bieät cuûa NFA, noù khoâng coù: i) Söï truyeàn roãng. ii) Vôùi moãi traïng thaùi s vaø kyù hieäu nhaäp a chæ toàn taïi nhieàu nhaát moät caïnh coù teân a xuaát phaùt tö øs.

  45. Giaûi thuaät3.1. Moâ phoûng hoaït ñoäng cuûa DFA treân chuoãi nhaäp x. Thí duï 3.5 start a a b b 0 0 1 1 3 Hình 3.12. DFA nhaän daïng ngoân ngöõ (a  b)*abb 3. Chuyeån NFA sang DFA Giaûi thuaät 3.2. Xaây döïng taäp con (Taïo DFA töø NFA). Nhaäp: Cho NFA goïi laø N. Xuaát: DFA goïi laø D, nhaän daïng cuøng ngoân ngöõ nhö NFA. Phöông phaùp: Xaây döïng baûng truyeàn cho D. Moãi traïng thaùi cuûa D laø taäp traïng thaùi cuûa N. D moâ phoûng ñoàng thôøi moïi chuyeån ñoäng cuûa N treân chuoãi nhaäp cho tröôùc baèng caùc taùc vuï: -closure (s); -closure (T); move (T, a)

  46. Moâ phoûng 3.2. Xaây döïng taäp con Giaûi thuaät: Tính -closure Ñaåy taát caû caùc traïng thaùi trong T leân stack; Khôûi taïo -closure (T) cho T. Moâ phoûng 3.3. Tính -closure Thí duï 3.6. (H.3.13 ) laø NFA nhaän daïng ngoân ngöõ (a  b )* abb. Chuùng ta duøng giaûi thuaät 3.2 ñeå xaây döïng DFA töông ñöông.  a 2 3    start  a b b 0 1 6 7 8 9 10 b  4 5  Hình 3.13. NFA nhaän daïng (a  b)* abb

  47. 3.9. Töø bieåu thöùc chính quy ñeán NFA Xaây döïng NFA töø bieåu thöùc chính quy Giaûi thuaät 3.3. Xaây döïng NFA töø bieåu thöùc chính quy (Caáu truùc Thompson’) Nhaäp: Bieåu thöùc chính quy r treân . Xuaát: NFA nhaän daïng ngoân ngöõ L (r). Phöông phaùp: Quy taéc: 1. Vôùi  , xaây döïng NFA 2. Vôùi a thuoäc , xaây döïng NFA start  i f start a i f

  48. 3. Giaû söû N( s ) vaøN( t ) laø NFA cho bieåu thöùc chính quy s vaø t - Vôùi s  t xaây döïng NFA hoãn hôïp N (s t) N(s)   start f i   N(t) - Vôùi bieåu thöùc st, xaây döïng NFA hoãn hôïp N (st) f N(s) N(t) start i

  49. - Vôùi bieåu thöùc s* , xaây döïng NFA N (s*)   start  i N(s) f  - Bieåu thöùc s thì N (s) laø NFA nhaän daïng L (s) Caùc tính chaát cuaû NFA xaây döïng theo caáu truùc Thompson’ Thí duï 3.7. Giaûi thuaät 3.4. Moâ phoûng NFA Nhaäp: NFA goïi laø N ñöôïc xaây döïng theo giaûi thuaät 3.3, chuoãi nhaäp x. X ñöôïc keát thuùc baèng eof, N coù traïng thai baét ñaàu s0 vaø taäp traïng thaùi keát thuùc F. Xuaát: Giaûi thuaät traû lôøi ñuùng neáu N chaáp nhaän x, ngöôïc laïi traû lôøi sai Phöông phaùp: Giaûi thuaät: Moâ phoûng 3.4.

  50. Thí duï 3.8.Giaû söû ta coù NFA ôû (H.3.13 ), x laø chuoãi nhaäp chöùa a. Duøng giaûi thuaät 3.4 xeùt xem NFA coù chaáp nhaän x ?. Keát quûa giaûi thuaät traû lôøi sai ( nghiaõ laø a khoâng thuoäc ngoân ngöõ do NFA nhaän daïng Thôøi gian vaø khoâng gian caàn thieát cho vieäc nhaän daïng moät chuoãi nhaäp: - Ñoái vôùi DFA: khoâng gian O (2 ( )) vaø thôøi gian O (x  ). - Ñoái vôùi NFA: khoâng gian O (r  ) vaø thôøi gian O ( r  *  x  ). 3.10. Xaây döïng DFA tröïc tieáp töø bieåu thöùc chính quy vaø vaán ñeà toái öu hoùa vieäc so truøng maãu 1. Traïng thaùi quan troïng cuûa NFA Traïng thaùi quan troïng laø töø noù coù söï truyeàn khaùc roãng. Nhö vaäy neáu hai taäp traïng thaùi coù cuøng soá traïng thaùi quan troïng thì chuùng ñöôïc ñoàng nhaát. NFA ñöôïc xaây döïng theo caáu truùc Thompson’ coù traïng thaùi keát thuùc khoâng coù söï truyeàn ra, nhö vaäy noù khoâng phaûi laø traïng thaùi quan troïng ( nhöng thöïc söï noù laïi raát quan troïng ). Ñeå traùnh tình traïng naøy ngöôøi ta theâm kyù hieäu # vaøo sau bieåu thöùc chính quy, vaø traïng thaùi keát thuùc coù söï truyeàn treân kyù hieäu #. Khi xaây döïng taäp con hoaøn taát thì traïng thaùi naøo coù söï truyeàn treân # laø traïng thaùi chaáp nhaän.

More Related