1 / 15

Lex

Lex. COP 3401, Fall 2009. What is Lex?. A tool for building lexical analyzers (lexers) lexer (scanner) is used to perform lexical analysis, or the breaking up of an input stream into meaningful units, or tokens. E.g., consider breaking a text file up into individual words. Usage.

quiana
Download Presentation

Lex

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lex COP 3401, Fall 2009

  2. What is Lex? • A tool for building lexical analyzers (lexers) • lexer (scanner) is used to perform lexical analysis, or the breaking up of an input stream into meaningful units, or tokens. • E.g., consider breaking a text file up into individual words.

  3. Usage Lex source program Lex lex.yy.c C compiler a.out lex.yy.c a.out input tokens

  4. Lex & Yacc Together

  5. *.c is generated after running x.l %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [RULES SECTION] %% < C auxiliary subroutines> This part will be embedded into *.c substitutions, code and start states; will be copied into *.c define how to scan and what action to take for each token any user code. For example, a main function to call the scanning function yylex(). Skeleton of a lex specification (.l file)

  6. The rules section %% [RULES SECTION] <pattern> { <action to take when matched> } <pattern> { <action to take when matched> } … %% Patterns are specified by regular expressions. For example: %% [A-Za-z]* { printf(“this is a word”); } %%

  7. Input: Output:

  8. Regular Expression Basics . : matches any single character except \n * : matches 0 or more instances of the preceding regular expression + : matches 1 or more instances of the preceding regular expression ? : matches 0 or 1 of the preceding regular expression | : matches the preceding or following regular expression [ ] : defines a character class () : groups enclosed regular expression into a new regular expression “…”: matches everything within the “ “ literally

  9. Lex Reg Exp (cont) x|yx or y {i} definition of i x/yx, only if followed by y (y not removed from input) x{m,n} m to n occurrences of x xx, but only at beginning of line x$ x, but only at end of line "s" exactly what is in the quotes (except for "\" and following character) A regular expression finishes with a space, tab or newline

  10. Meta-characters • meta-characters (do not match themselves, because they are used in the preceding reg exps): • ( ) [ ] { } < > + / , ^ * | . \ " $ ? - % • to match a meta-character, prefix with "\" • to match a backslash, tab or newline, use \\, \t, or \n

  11. Regular Expression Examples • an integer: 12345 • [1-9][0-9]* • a word: cat • [a-zA-Z]+ • a (possibly) signed integer: 12345 or -12345 • [-+]?[1-9][0-9]* • a floating point number: 1.2345 • [0-9]*”.”[0-9]+

  12. Lex Regular Expressions Lex uses an extended form of regular expression: (c: character, x,y: regular expressions, s: string, m,n integers and i: identifier). • c any character except meta-characters (see below) • [...] the list of enclosed chars (may be a range) • [...] the list of chars not enclosed • . any ASCII char except newline • xy concatenation of x and y • x* same as x* • x+ same as x+ (i.e. x* but not ) • x? an optional x (same as x+ )

  13. Rules • Lex patterns only match a given input or string once • Lex executes the action for the longest possible match for the current input

  14. Regular Expression Examples • a delimiter for an English sentence • “.” | “?” | ! OR • [“.””?”!] • C++ comment: // call foo() here!! • “//”.* • white space • [ \t]+ • English sentence: Look at this! • ([ \t]+|[a-zA-Z]+)+(“.”|”?”|!)

  15. Special Functions • yytext • where text matched most recently is stored • yyleng • number of characters in text most recently matched • yylval • associated value of current token • yymore() • append next string matched to current contents of yytext • yyless(n) • remove from yytext all but the first n characters • unput(c) • return character c to input stream • yywrap() • may be replaced by user • The yywrap method is called by the lexical analyser whenever it inputs an EOF as the first character when trying to match a regular expression

More Related