T3 lexer

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Modern Compiler Design: 

Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University gretay@post.tau.ac.il http://www.cs.tau.ac.il/~gretay Modern Compiler Design T3 - Lexer

Today: 

Today Goals: how to use FLEX ? learn how to use flex output Today: understand flex input structure of PA2 Lexical Analysis FLEX Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Gen.

Designing Lexer: 

Designing Lexer choose finite set of tokens programming language, parser a token corresponds to sets of strings rules to describe a lexeme IF token “if” or “i” followed by “f”

Lexical Analysis with FLEX: 

Lexical Analysis with FLEX FLEX is Fast Lexical Analyzer Generator recognized lexical patterns in text lexer, scanner Input: spec file Output: a lexical analyzer C++ program FLEX g++ spec Lexical analyzer cool source code tokens c++ source

Structure of PA2: 

Structure of PA2 FLEX g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc lextest.cc cool-parse.h

FLEX Spec File: 

FLEX Spec File definitions %% rules %% user code

FLEX Spec File: 

Possible source of g++ errors down the road FLEX Spec File %{ declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% DIGIT [0-9] LETTER [a-zA-Z] ID {LETTER}({LETTER}|{DIGIT})* INITIAL {LETTER} ({LETTER}|{DIGIT})* user code - functions main() { … yylex(); … } error_handler() {…} %{ declarations – user code local to yylex() %}

Remarks: 

Remarks %{ %} definitions section declare global variable rule section before the first rule declare variables local to yylex() executed whenever the scanning routine is entered Any text not matched by a flex rule is copied to the output

User Code - definitions: 

User Code - definitions %{ #include <cool-parse.h> #include <stringtab.h> #include <utilities.h> #define MAX_STR_CONST 1025 char string_buf[MAX_STR_CONST]; extern int curr_lineno; %}

Structure of PA2: 

Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.cc lextest.cc stringtab.h

User Code - definitions: 

User Code - definitions %{ extern FILE *fin; #undef YY_INPUT #define YY_INPUT(buf,result,max_size) \ if ( (result = fread( (char*)buf,\ sizeof(char), \ max_size, fin)) \ < 0) \ YY_FATAL_ERROR( "read() in flex failed"); #define yylval cool_yylval #define yylex cool_yylex extern YYSTYPE cool_yylval; %}

FLEX Spec File: 

FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}

FLEX Definitions: 

FLEX Definitions Definitions of start condition %x list-of-start-conditions %s list-of-start-conditions Macro definitions Macro-name regular-expr

Start Conditions: 

Start Conditions %s inclusive start conditions rules with no start conditions are also activated %x exclusive start conditions only rules with the start condition are activated INITIAL BEGIN start-condition in action definition activates rules with start condition until next BEGIN EOF retains within the same start condition

%s vs. %x by example: 

%s vs. %x by example %s example %% <example>foo do_something(); bar something_else(); is equivalent to %x example %% <example>foo do_something(); <INITIAL,example>bar something_else(); is equivalent to %x example %% <example>foo do_something(); <*>bar something_else();

Why using start conditions ?: 

Why using start conditions ? %x independent mini-scanner inside scanner scan syntactically different portion of the input tokenize according to context comments (* *) quote string “ ”

Example Macros: 

Example Macros ALPHA [A-Za-z_] DIGIT [0-9] ALPHA_NUMERIC {ALPHA}|{DIGIT} ID {ALPHA}({ALPHA_NUMERIC})* NUMBER ({DIGIT})+ WHITE_SPACE ([\ \n\r\t\f])+

Regular Expressions: 

Regular Expressions

FLEX Spec File: 

FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}

Lexical Analysis Rules: 

Lexical Analysis Rules Rule structure <states> regexp { C++ code action } Priority for rule matching longest string More than one match for same length – priority for rule appearing first ! Important: rules given in a flex specification should match all possible input !

Action Body: 

Action Body C++ code Can use special methods and vars ECHO - copy yytext to the output REJECT - proceed on to the "second best" rule which matched the input or a prefix of the input #define yywrap() 1 yytext, yyleng, yyin, yyterminate(), yymore(), etc. Lexer state transition BEGIN, YY_START INITIAL

FLEX Spec File: 

FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}

Running the Lexer: 

Running the Lexer %% main( int argc, char *argv ) { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); } (Just for testing Lexer as stand-alone program)

Putting it all together…: 

Putting it all together… %{ #include “tokens.h” int line_count = 0; %} Letter [a-zA-Z_] Digit [0-9] %% [ \t] {;} [\n] {line_count++;} “;” { return SemiColumn;} “++” { return PlusPlus ;} “+=” { return PlusEqual ;} “+” { return Plus} “while” { return While ; } {Letter}({Letter}|{Digit})* { yylval.sym = strdup( yytext ); return Id ;} “<=” { return LessOrEqual;} “<” { return LessThen ;} %% main (){ yylex(); }

Structure of PA2: 

Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.h YYSTYPE cool_yylval; int curr_lineno; FILE *fin; int cool_yylex(); lextest.cc stringtab.h

cool-parse.h: 

cool-parse.h Defines token constant ids Tells parser what is the token returned by lexer Actual value doesn’t matter Actually, generated by bison ?