logging in or signing up jflex tutorial Charlo Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 4421 Category: Entertainment License: All Rights Reserved Like it (2) Dislike it (0) Added: November 27, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript JFlex: JFlex Basically, a lexer is a Finite-State “Transducer” plus bells and whistles Arbitrary Java code can be associated with actions state transitions Specify the “transducer” in a .flex file; JFlex compiles it into a .java file By default, JFlex gives you convenience methods to access results of state transitionsA simple task: A simple task Charniak’s statistical parser takes input sentences delimited by <s>...</s> Suppose we want to take a Reader over such input and get back a Tokenizer over the tokens, which returns Word objects, plus a special end-of-sentence character garbage garbage garbage <s>Stocks skyrocketed on news that investigation of Cheney ’s energy taskforce was dropped . </s>more garbageedu.stanford.nlp.process.AbstractTokenizer: edu.stanford.nlp.process.AbstractTokenizer ... /** * Internally fetches the next token. * * @return the next token in the token * stream, or null if none exists. */ protected abstract Object getNext(); ...Lexical Rules: Lexical Rules Basically you’re specifying a finite-state automaton* with actions associated with state transitions *though not strictly limited by FSA expressivitySchematic .flex file: Schematic .flex file {user code} %% {options and declarations} %% {lexical rules} Lexical Rules (schematic): Lexical Rules (schematic) <YYINITIAL> { {BeginSentence} { yybegin{SENTENCE}; return yylex(); } {WhiteSpace} { /* ignore */ return yylex();} . { /* ignore */ return yylex();} } <SENTENCE> { {EndSentence} { yybegin{YYINITIAL}; return SENTENCE_BOUNDARY; } {Token} { return new Word(yytext()); } {Space} { /* ignore */ return yylex(); } }Lexical Rules (detail): Lexical Rules (detail) <YYINITIAL> { {BeginSentence} / .* { yybegin(SENTENCE); return yylex();} ... } <SENTENCE> { {EndSentence} / .* { yybegin(YYINITIAL); return SENTENCE_BOUNDARY;} {Token} { return new Word(yytext()); } ... }Options and declarations: States and Macros: Options and declarations: States and Macros Macros can be used to define other macros Order of macro definition is irrelevant %state SENTENCE SentenceLetter = s BeginSentence = <{SentenceLetter}> EndSentence = <\/{SentenceLetter}> WhiteSpace = [ \t\r\n\f] Token = [^ \t\r\n\f]+Other options and declarations: Other options and declarations %class CharniakTokenizer %implements Tokenizer %extends AbstractTokenizer %unicode %type Object %eofval{ return null; %eofval}Options & declarations: class-internal code (1): Options & declarations: class-internal code (1) %{ static final Word SENTENCE_BOUNDARY = new Word("SENTENCE_BOUNDARY"); public Object getNext() { try { Object o = yylex(); return o; } catch(IOException e) { return null; } } ... %}Options & declarations: class-internal code (2): Options & declarations: class-internal code (2) %{ ... public static void main(String[] args) throws IOException { Reader r = new FileReader(args[0]); Tokenizer t = new CharniakTokenizer(r); while(t.hasNext()) { System.out.println(t.next()); } } %} User Code inserted directly into the file: User Code inserted directly into the file package rog; import java.util.*; import java.io.*; import edu.stanford.nlp.ling.Word; import edu.stanford.nlp.process.*; /** A lexer for Charniak input sentences * @author Roger Levy */ Beyond FSA expressivity: Beyond FSA expressivity %class ParenCounter %{ private int numParens = 0; %} ... %% ... <YYINITIAL> { \( { numParens++; return yytext(); } \) { if(numParens == 0) throw new RuntimeException( "error – too many close parens!"); else { numParens--; return yytext(); } } } You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
jflex tutorial Charlo Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 4421 Category: Entertainment License: All Rights Reserved Like it (2) Dislike it (0) Added: November 27, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript JFlex: JFlex Basically, a lexer is a Finite-State “Transducer” plus bells and whistles Arbitrary Java code can be associated with actions state transitions Specify the “transducer” in a .flex file; JFlex compiles it into a .java file By default, JFlex gives you convenience methods to access results of state transitionsA simple task: A simple task Charniak’s statistical parser takes input sentences delimited by <s>...</s> Suppose we want to take a Reader over such input and get back a Tokenizer over the tokens, which returns Word objects, plus a special end-of-sentence character garbage garbage garbage <s>Stocks skyrocketed on news that investigation of Cheney ’s energy taskforce was dropped . </s>more garbageedu.stanford.nlp.process.AbstractTokenizer: edu.stanford.nlp.process.AbstractTokenizer ... /** * Internally fetches the next token. * * @return the next token in the token * stream, or null if none exists. */ protected abstract Object getNext(); ...Lexical Rules: Lexical Rules Basically you’re specifying a finite-state automaton* with actions associated with state transitions *though not strictly limited by FSA expressivitySchematic .flex file: Schematic .flex file {user code} %% {options and declarations} %% {lexical rules} Lexical Rules (schematic): Lexical Rules (schematic) <YYINITIAL> { {BeginSentence} { yybegin{SENTENCE}; return yylex(); } {WhiteSpace} { /* ignore */ return yylex();} . { /* ignore */ return yylex();} } <SENTENCE> { {EndSentence} { yybegin{YYINITIAL}; return SENTENCE_BOUNDARY; } {Token} { return new Word(yytext()); } {Space} { /* ignore */ return yylex(); } }Lexical Rules (detail): Lexical Rules (detail) <YYINITIAL> { {BeginSentence} / .* { yybegin(SENTENCE); return yylex();} ... } <SENTENCE> { {EndSentence} / .* { yybegin(YYINITIAL); return SENTENCE_BOUNDARY;} {Token} { return new Word(yytext()); } ... }Options and declarations: States and Macros: Options and declarations: States and Macros Macros can be used to define other macros Order of macro definition is irrelevant %state SENTENCE SentenceLetter = s BeginSentence = <{SentenceLetter}> EndSentence = <\/{SentenceLetter}> WhiteSpace = [ \t\r\n\f] Token = [^ \t\r\n\f]+Other options and declarations: Other options and declarations %class CharniakTokenizer %implements Tokenizer %extends AbstractTokenizer %unicode %type Object %eofval{ return null; %eofval}Options & declarations: class-internal code (1): Options & declarations: class-internal code (1) %{ static final Word SENTENCE_BOUNDARY = new Word("SENTENCE_BOUNDARY"); public Object getNext() { try { Object o = yylex(); return o; } catch(IOException e) { return null; } } ... %}Options & declarations: class-internal code (2): Options & declarations: class-internal code (2) %{ ... public static void main(String[] args) throws IOException { Reader r = new FileReader(args[0]); Tokenizer t = new CharniakTokenizer(r); while(t.hasNext()) { System.out.println(t.next()); } } %} User Code inserted directly into the file: User Code inserted directly into the file package rog; import java.util.*; import java.io.*; import edu.stanford.nlp.ling.Word; import edu.stanford.nlp.process.*; /** A lexer for Charniak input sentences * @author Roger Levy */ Beyond FSA expressivity: Beyond FSA expressivity %class ParenCounter %{ private int numParens = 0; %} ... %% ... <YYINITIAL> { \( { numParens++; return yytext(); } \) { if(numParens == 0) throw new RuntimeException( "error – too many close parens!"); else { numParens--; return yytext(); } } }