Introduction to Lex

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Introduction to Lex: 

Introduction to Lex Ying-Hung Jiang compiler@csie.ntu.edu.tw

Outlines: 

Outlines Review of scanner Introduction to flex Regular Expression Cooperate with Yacc Resources

Review: 

Review Parser -> scanner -> source code See teacher’s slide

flex: 

flex

flex - fast lexical analyzer generator: 

flex - fast lexical analyzer generator Flex is a tool for generating scanners. Flex source is a table of regular expressions and corresponding program fragments. Generates lex.yy.c which defines a routine yylex()

Format of the Input File: 

Format of the Input File The flex input file consists of three sections, separated by a line with just %% in it: definitions %% rules %% user code

Definitions Section: 

Definitions Section The definitions section contains declarations of simple name definitions to simplify the scanner specification. Name definitions have the form: name definition Example: DIGIT [0-9] ID [a-z][a-z0-9]*

Rules Section: 

Rules Section The rules section of the flex input contains a series of rules of the form: pattern action Example: {ID} printf( "An identifier: %s\n", yytext ); The yytext and yylength variable. If action is empty, the matched token is discarded.

Action: 

Action If the action contains a ‘{‘, the action spans till the balancing ‘}‘ is found, as in C. An action consisting only of a vertical bar ('|') means "same as the action for the next rule.“ The return statement, as in C. In case no rule matches: simply copy the input to the standard output (A default rule).

Precedence Problem: 

Precedence Problem For example: a “<“ can be matched by “<“ and “<=“. The one matching most text has higher precedence. If two or more have the same length, the rule listed first in the flex input has higher precedence.

User Code Section: 

User Code Section The user code section is simply copied to lex.yy.c verbatim. The presence of this section is optional; if it is missing, the second %% in the input file may be skipped. In the definitions and rules sections, any indented text or text enclosed in %{ and %} is copied verbatim to the output (with the %{}'s removed).

A Simple Example: 

A Simple Example %{ int num_lines = 0, num_chars = 0; %} %% \n ++num_lines; ++num_chars; . ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); }

Any Question So Far?: 

Any Question So Far?

Regular Expression: 

Regular Expression

Regular Expression (1/3): 

Regular Expression (1/3) x match the character 'x' . any character (byte) except newline [xyz] a "character class"; in this case, the pattern matches either an 'x', a 'y', or a 'z' [abj-oZ] a "character class" with a range in it; matches an 'a', a 'b', any letter from 'j' through 'o', or a 'Z' [^A-Z] a "negated character class", i.e., any character but those in the class. In this case, any character EXCEPT an uppercase letter. [^A-Z\n] any character EXCEPT an uppercase letter or a newline

Regular Expression (2/3): 

Regular Expression (2/3) r* zero or more r's, where r is any regular expression r+ one or more r's r? zero or one r's (that is, "an optional r") r{2,5} anywhere from two to five r's r{2,} two or more r's r{4} exactly 4 r's {name} the expansion of the "name" definition (see above) "[xyz]\"foo“ the literal string: [xyz]"foo \X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C interpretation of \x. Otherwise, a literal 'X' (used to escape operators such as '*')

Regular Expression (3/3): 

Regular Expression (3/3) \0 a NUL character (ASCII code 0) \123 the character with octal value 123 \x2a the character with hexadecimal value 2a (r) match an r; parentheses are used to override precedence (see below) rs the regular expression r followed by the regular expression s; called "concatenation" r|s either an r or an s ^r an r, but only at the beginning of a line (i.e., which just starting to scan, or right after a newline has been scanned). r$ an r, but only at the end of a line (i.e., just before a newline). Equivalent to "r/\n".

Cooperate with Yacc: 

Cooperate with Yacc The y.tab.h file (generated by –d option). See pascal.l and pascal.y for example.

Resources: 

Resources Google directory of lexer and parser generators. Flex homepage: http://www.gnu.org/software/flex Lex/yacc Win32 port: http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html Above links are available on course webpage. Flex(1)

Any Question?: 

Any Question?