logging in or signing up T3 lexer Mikhail Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 852 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 28, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Modern Compiler Design: Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University gretay@post.tau.ac.il http://www.cs.tau.ac.il/~gretay Modern Compiler Design T3 - LexerToday: Today Goals: how to use FLEX ? learn how to use flex output Today: understand flex input structure of PA2 Lexical Analysis FLEX Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Gen.Designing Lexer: Designing Lexer choose finite set of tokens programming language, parser a token corresponds to sets of strings rules to describe a lexeme IF token “if” or “i” followed by “f” Lexical Analysis with FLEX: Lexical Analysis with FLEX FLEX is Fast Lexical Analyzer Generator recognized lexical patterns in text lexer, scanner Input: spec file Output: a lexical analyzer C++ program FLEX g++ spec Lexical analyzer cool source code tokens c++ sourceStructure of PA2: Structure of PA2 FLEX g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc lextest.cc cool-parse.hFLEX Spec File: FLEX Spec File definitions %% rules %% user code FLEX Spec File: Possible source of g++ errors down the road FLEX Spec File %{ declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% DIGIT [0-9] LETTER [a-zA-Z] ID {LETTER}({LETTER}|{DIGIT})* INITIAL {LETTER} ({LETTER}|{DIGIT})* user code - functions main() { … yylex(); … } error_handler() {…} %{ declarations – user code local to yylex() %}Remarks: Remarks %{ %} definitions section declare global variable rule section before the first rule declare variables local to yylex() executed whenever the scanning routine is entered Any text not matched by a flex rule is copied to the output User Code - definitions: User Code - definitions %{ #include <cool-parse.h> #include <stringtab.h> #include <utilities.h> #define MAX_STR_CONST 1025 char string_buf[MAX_STR_CONST]; extern int curr_lineno; %}Structure of PA2: Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.cc lextest.cc stringtab.hUser Code - definitions: User Code - definitions %{ extern FILE *fin; #undef YY_INPUT #define YY_INPUT(buf,result,max_size) \ if ( (result = fread( (char*)buf,\ sizeof(char), \ max_size, fin)) \ < 0) \ YY_FATAL_ERROR( "read() in flex failed"); #define yylval cool_yylval #define yylex cool_yylex extern YYSTYPE cool_yylval; %}FLEX Spec File: FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}FLEX Definitions: FLEX Definitions Definitions of start condition %x list-of-start-conditions %s list-of-start-conditions Macro definitions Macro-name regular-exprStart Conditions: Start Conditions %s inclusive start conditions rules with no start conditions are also activated %x exclusive start conditions only rules with the start condition are activated INITIAL BEGIN start-condition in action definition activates rules with start condition until next BEGIN EOF retains within the same start condition%s vs. %x by example: %s vs. %x by example %s example %% <example>foo do_something(); bar something_else(); is equivalent to %x example %% <example>foo do_something(); <INITIAL,example>bar something_else(); is equivalent to %x example %% <example>foo do_something(); <*>bar something_else(); Why using start conditions ?: Why using start conditions ? %x independent mini-scanner inside scanner scan syntactically different portion of the input tokenize according to context comments (* *) quote string “ ” Example Macros: Example Macros ALPHA [A-Za-z_] DIGIT [0-9] ALPHA_NUMERIC {ALPHA}|{DIGIT} ID {ALPHA}({ALPHA_NUMERIC})* NUMBER ({DIGIT})+ WHITE_SPACE ([\ \n\r\t\f])+ Regular Expressions: Regular ExpressionsFLEX Spec File: FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}Lexical Analysis Rules: Lexical Analysis Rules Rule structure <states> regexp { C++ code action } Priority for rule matching longest string More than one match for same length – priority for rule appearing first ! Important: rules given in a flex specification should match all possible input ! Action Body: Action Body C++ code Can use special methods and vars ECHO - copy yytext to the output REJECT - proceed on to the "second best" rule which matched the input or a prefix of the input #define yywrap() 1 yytext, yyleng, yyin, yyterminate(), yymore(), etc. Lexer state transition BEGIN, YY_START INITIALFLEX Spec File: FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}Running the Lexer: Running the Lexer %% main( int argc, char *argv ) { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); } (Just for testing Lexer as stand-alone program)Putting it all together…: Putting it all together… %{ #include “tokens.h” int line_count = 0; %} Letter [a-zA-Z_] Digit [0-9] %% [ \t] {;} [\n] {line_count++;} “;” { return SemiColumn;} “++” { return PlusPlus ;} “+=” { return PlusEqual ;} “+” { return Plus} “while” { return While ; } {Letter}({Letter}|{Digit})* { yylval.sym = strdup( yytext ); return Id ;} “<=” { return LessOrEqual;} “<” { return LessThen ;} %% main (){ yylex(); }Structure of PA2: Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.h YYSTYPE cool_yylval; int curr_lineno; FILE *fin; int cool_yylex(); lextest.cc stringtab.hcool-parse.h: cool-parse.h Defines token constant ids Tells parser what is the token returned by lexer Actual value doesn’t matter Actually, generated by bison ? You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
T3 lexer Mikhail Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 852 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 28, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Modern Compiler Design: Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University gretay@post.tau.ac.il http://www.cs.tau.ac.il/~gretay Modern Compiler Design T3 - LexerToday: Today Goals: how to use FLEX ? learn how to use flex output Today: understand flex input structure of PA2 Lexical Analysis FLEX Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Gen.Designing Lexer: Designing Lexer choose finite set of tokens programming language, parser a token corresponds to sets of strings rules to describe a lexeme IF token “if” or “i” followed by “f” Lexical Analysis with FLEX: Lexical Analysis with FLEX FLEX is Fast Lexical Analyzer Generator recognized lexical patterns in text lexer, scanner Input: spec file Output: a lexical analyzer C++ program FLEX g++ spec Lexical analyzer cool source code tokens c++ sourceStructure of PA2: Structure of PA2 FLEX g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc lextest.cc cool-parse.hFLEX Spec File: FLEX Spec File definitions %% rules %% user code FLEX Spec File: Possible source of g++ errors down the road FLEX Spec File %{ declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% DIGIT [0-9] LETTER [a-zA-Z] ID {LETTER}({LETTER}|{DIGIT})* INITIAL {LETTER} ({LETTER}|{DIGIT})* user code - functions main() { … yylex(); … } error_handler() {…} %{ declarations – user code local to yylex() %}Remarks: Remarks %{ %} definitions section declare global variable rule section before the first rule declare variables local to yylex() executed whenever the scanning routine is entered Any text not matched by a flex rule is copied to the output User Code - definitions: User Code - definitions %{ #include <cool-parse.h> #include <stringtab.h> #include <utilities.h> #define MAX_STR_CONST 1025 char string_buf[MAX_STR_CONST]; extern int curr_lineno; %}Structure of PA2: Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.cc lextest.cc stringtab.hUser Code - definitions: User Code - definitions %{ extern FILE *fin; #undef YY_INPUT #define YY_INPUT(buf,result,max_size) \ if ( (result = fread( (char*)buf,\ sizeof(char), \ max_size, fin)) \ < 0) \ YY_FATAL_ERROR( "read() in flex failed"); #define yylval cool_yylval #define yylex cool_yylex extern YYSTYPE cool_yylval; %}FLEX Spec File: FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}FLEX Definitions: FLEX Definitions Definitions of start condition %x list-of-start-conditions %s list-of-start-conditions Macro definitions Macro-name regular-exprStart Conditions: Start Conditions %s inclusive start conditions rules with no start conditions are also activated %x exclusive start conditions only rules with the start condition are activated INITIAL BEGIN start-condition in action definition activates rules with start condition until next BEGIN EOF retains within the same start condition%s vs. %x by example: %s vs. %x by example %s example %% <example>foo do_something(); bar something_else(); is equivalent to %x example %% <example>foo do_something(); <INITIAL,example>bar something_else(); is equivalent to %x example %% <example>foo do_something(); <*>bar something_else(); Why using start conditions ?: Why using start conditions ? %x independent mini-scanner inside scanner scan syntactically different portion of the input tokenize according to context comments (* *) quote string “ ” Example Macros: Example Macros ALPHA [A-Za-z_] DIGIT [0-9] ALPHA_NUMERIC {ALPHA}|{DIGIT} ID {ALPHA}({ALPHA_NUMERIC})* NUMBER ({DIGIT})+ WHITE_SPACE ([\ \n\r\t\f])+ Regular Expressions: Regular ExpressionsFLEX Spec File: FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}Lexical Analysis Rules: Lexical Analysis Rules Rule structure <states> regexp { C++ code action } Priority for rule matching longest string More than one match for same length – priority for rule appearing first ! Important: rules given in a flex specification should match all possible input ! Action Body: Action Body C++ code Can use special methods and vars ECHO - copy yytext to the output REJECT - proceed on to the "second best" rule which matched the input or a prefix of the input #define yywrap() 1 yytext, yyleng, yyin, yyterminate(), yymore(), etc. Lexer state transition BEGIN, YY_START INITIALFLEX Spec File: FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %}Running the Lexer: Running the Lexer %% main( int argc, char *argv ) { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); } (Just for testing Lexer as stand-alone program)Putting it all together…: Putting it all together… %{ #include “tokens.h” int line_count = 0; %} Letter [a-zA-Z_] Digit [0-9] %% [ \t] {;} [\n] {line_count++;} “;” { return SemiColumn;} “++” { return PlusPlus ;} “+=” { return PlusEqual ;} “+” { return Plus} “while” { return While ; } {Letter}({Letter}|{Digit})* { yylval.sym = strdup( yytext ); return Id ;} “<=” { return LessOrEqual;} “<” { return LessThen ;} %% main (){ yylex(); }Structure of PA2: Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.h YYSTYPE cool_yylval; int curr_lineno; FILE *fin; int cool_yylex(); lextest.cc stringtab.hcool-parse.h: cool-parse.h Defines token constant ids Tells parser what is the token returned by lexer Actual value doesn’t matter Actually, generated by bison ?