Parsing: Parsing
Outline: 2301373 Chapter 4 Parsing 2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First and follow sets Constructing LL(1) parsing table Error recovery Bottom-up parsing Shift-reduce parsers LR(0) parsing LR(0) items Finite automata of items LR(0) parsing algorithm LR(0) grammar S LR(1) parsing S LR(1) parsing algorithm S LR(1) grammar Parsing conflict
Introduction: 2301373 Chapter 4 Parsing 3 Introduction Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only need to do this? Stream of tokens Context-free grammar Parser Parse tree
Top–Down Parsing Bottom–Up Parsing: 2301373 Chapter 4 Parsing 4 Top–Down Parsing Bottom–Up Parsing A parse tree is created from root to leaves The traversal of parse trees is a preorder traversal Tracing leftmost derivation Two types: Backtracking parser Predictive parser A parse tree is created from leaves to root The traversal of parse trees is a reversal of postorder traversal Tracing rightmost derivation More powerful than top-down parsing Try different structures and backtrack if it does not matched the input Guess the structure of the parse tree from the next input
Parse Trees and Derivations: 2301373 Chapter 4 Parsing 5 Parse Trees and Derivations E E + E id + E id + E * E id + id * E id + id * id E E + E E + E * E E + E * id E + id * id id + id * id Top-down parsing Bottom-up parsing id E * E id id + E E E + * id E E id E E id E
Top-down Parsing: 2301373 Chapter 4 Parsing 6 Top-down Parsing What does a parser need to decide? Which production rule is to be used at each point of time ? How to guess? What is the guess based on? What is the next token? Reserved word if, open parentheses, etc. What is the structure to be built? If statement, expression, etc.
Top-down Parsing: 2301373 Chapter 4 Parsing 7 Top-down Parsing Why is it difficult? Cannot decide until later Next token: if Structure to be built: St St MatchedSt | UnmatchedSt UnmatchedSt if ( E ) St| if ( E ) MatchedSt else U nmatchedSt MatchedSt if ( E ) MatchedSt else MatchedSt |... Production with empty string Next token: id Structure to be built: par par parList | parList exp , parList | exp
Recursive-Descent: 2301373 Chapter 4 Parsing 8 Recursive-Descent Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal.
Recursive-Descent: Example: 2301373 Chapter 4 Parsing 9 Recursive-Descent: Example E E O F | F O + | - F ( E ) | id procedure F { switch token { case (: match(‘(‘); E; match(‘)’); case id: match(id); default: error; } } For this grammar: We cannot decide which rule to use for E, and If we choose E E O F, it leads to infinitely recursive loops. Rewrite the grammar into EBNF procedure E { F; while (token=+ or token=-) { O; F; } } procedure E { E; O; F; } E ::= F {O F} O ::= + | - F ::= ( E ) | id
Match procedure: 2301373 Chapter 4 Parsing 10 Match procedure procedure match(expTok) { if (token==expTok) then getToken else error } The token is not consumed until getToken is executed.
Problems in Recursive-Descent: 2301373 Chapter 4 Parsing 11 Problems in Recursive-Descent Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use -production A
LL(1) Parsing: 2301373 Chapter 4 Parsing 12 LL(1) Parsing LL(1) Read input from ( L ) left to right Simulate ( L ) leftmost derivation 1 lookahead symbol Use stack to simulate leftmost derivation Part of sentential form produced in the leftmost derivation is stored in the stack. Top of stack is the leftmost nonterminal symbol in the fragment of sentential form.
Concept of LL(1) Parsing: 2301373 Chapter 4 Parsing 13 Concept of LL(1) Parsing Simulate leftmost derivation of the input. Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, replace it with Y if we have a production rule X Y. Which production will be chosen, if there are both X Y and X Z ?
Example of LL(1) Parsing: 2301373 Chapter 4 Parsing 14 Example of LL(1) Parsing ( n + ( n ) ) * n $ $ E E T X X A T X | A + | - T F N N M F N | M * F ( E ) | n T X F N ) E ( T X F N n A T X + F N ( E ) T X F N n M F N * n Finished E TX FNX ( E)NX ( TX)NX ( FNX)NX (n NX)NX (n X)NX (n ATX)NX (n+ TX)NX (n+ FNX)NX (n+( E)NX)NX (n+( TX)NX)NX (n+( FNX)NX)NX (n+(n NX)NX)NX (n+(n X)NX)NX (n+(n) NX)NX (n+(n) X)NX (n+(n)) NX (n+(n)) MFNX (n+(n))* FNX (n+(n))*n NX (n+(n))*n X (n+(n))*n
LL(1) Parsing Algorithm: 2301373 Chapter 4 Parsing 15 LL(1) Parsing Algorithm Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not $) SWITCH (Top of stack, next token) CASE (terminal a, a): Pop stack; Get next token CASE (nonterminal A, terminal a): IF the parsing table entry M[A, a] is not empty THEN Get A X 1 X 2 ... X n from the parsing table entry M[A, a] Pop stack; Push X n ... X 2 X 1 into stack in that order ELSE Error CASE ($,$): Accept OTHER: Error
LL(1) Parsing Table: 2301373 Chapter 4 Parsing 16 LL(1) Parsing Table If the nonterminal N is on the top of stack and the next token is t , which production rule to use? Choose a rule N X such that X * tY or X * and S * WNt Y N Q t … … … X Y t Y t N X
First Set: 2301373 Chapter 4 Parsing 17 First Set Let X be or be in V or T. First( X ) is the set of the first terminal in any sentential form derived from X . If X is a terminal or , then First( X ) ={ X }. If X is a nonterminal and X X 1 X 2 ... X n is a rule, then First( X 1 ) -{ } is a subset of First(X) First( X i )-{ } is a subset of First(X) if for all j<i First( X j ) contains { } is in First(X) if for all j ≤ n First( X j )contains
Examples of First Set: 2301373 Chapter 4 Parsing 18 Examples of First Set exp exp addop term | term addop + | - term term mulop factor | factor mulop * factor (exp) | num First(addop) = {+, -} First(mulop) = {*} First(factor) = {(, num} First(term) = {(, num} First(exp) = {(, num} st ifst | other ifst if ( exp ) st elsepart elsepart else st | exp 0 | 1 First(exp) = {0,1} First(elsepart) = {else, } First(ifst) = {if} First(st) = {if, other}
Algorithm for finding First(A): 2301373 Chapter 4 Parsing 19 Algorithm for finding First(A) F or all terminals a, First(a) = {a} F or all nonterminals A, First(A) := {} W hile there are changes to any First(A) F or each rule A X 1 X 2 ... X n F or each X i in {X 1 , X 2 , …, X n } I f for all j<i First(X j ) contains , T hen add First(X i )-{ } to First(A) I f is in First(X 1 ), First(X 2 ), ..., and First(X n ) T hen a dd to First(A) If A is a terminal or , then First(A) = {A}. If A is a nonterminal, then for each rule A X 1 X 2 ... X n , First(A) contains First(X 1 ) - { }. If also for some i<n, First(X 1 ), First(X 2 ), ..., and First(X i ) contain , then First(A) contains First(X i+1 )-{ }. If First(X 1 ), First(X 2 ), ..., and First(X n ) contain , then First(A) also contains .
Finding First Set: An Example : 2301373 Chapter 4 Parsing 20 Finding First Set: An Example exp term exp’ exp’ addop term exp’ | addop + | - term factor term’ term’ mulop factor term’ | mulop * factor ( exp ) | num First exp exp’ addop term term’ mulop factor + - * ( num + - ( num * ( num
Follow Set: 2301373 Chapter 4 Parsing 21 Follow Set Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A). If there is a rule B X A Y, then First(Y) - { } is in Follow(A). If there is production B X A Y and is in First(Y), then Follow(A) contains Follow(B).
Algorithm for Finding Follow(A): 2301373 Chapter 4 Parsing 22 Algorithm for Finding Follow(A) Follow(S) = {$} FOR each A in V-{S} Follow(A)={} WHILE change is made to some Follow sets FOR each production A X 1 X 2 ... X n , FOR each nonterminal X i Add First(X i+1 X i+2 ...X n )-{ } into Follow(X i ). (NOTE: If i=n, X i+1 X i+2 ...X n = ) IF is in First(X i+1 X i+2 ...X n ) THEN Add Follow(A) to Follow(X i ) If A is the start symbol, then $ is in Follow(A). If there is a rule A Y X Z, then First(Z) - { } is in Follow(X). If there is production B X A Y and is in First(Y), then Follow(A) contains Follow(B).
Finding Follow Set: An Example: 2301373 Chapter 4 Parsing 23 Finding Follow Set: An Example exp term exp’ exp’ addop term exp’ | addop + | - term factor term’ term’ mulop factor term’ | mulop * factor ( exp ) | num First exp exp’ addop term term’ mulop factor + - * ( num + - ( num * ( num Follow ) + - $ ( num ( num + - * $ ( num $ * + - $ $ + - $ ) ) ) ) ) )
Constructing LL(1) Parsing Tables: 2301373 Chapter 4 Parsing 24 Constructing LL(1) Parsing Tables FOR each nonterminal A and a production A X FOR each token a in First(X) A X is in M(A, a) IF is in First(X) THEN FOR each element a in Follow(A) Add A X to M(A, a)
Example: Constructing LL(1) Parsing Table: 2301373 Chapter 4 Parsing 25 Example: Constructing LL(1) Parsing Table First Follow exp {(, num} {$,)} exp’ {+,-, } {$,)} addop {+,-} {(,num} term {(,num} {+,-,),$} term’ {*, } {+,-,),$} mulop {*} {(,num} factor {(, num} {*,+,-,),$} 1 exp term exp’ 2 exp’ addop term exp’ 3 exp’ 4 addop + 5 addop - 6 term factor term’ 7 term’ mulop factor term’ 8 term’ 9 mulop * 10 factor ( exp ) 11 factor num ( ) + - * n $ exp exp’ addop term term’ mulop factor 1 1 2 2 3 3 4 5 6 6 7 8 8 8 8 9 10 11
LL(1) Grammar: 2301373 Chapter 4 Parsing 26 LL(1) Grammar A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry.
LL(1) Parsing Table for non-LL(1) Grammar: 2301373 Chapter 4 Parsing 27 LL(1) Parsing Table for non-LL(1) Grammar 1 exp exp addop term 2 exp term 3 term term mulop factor 4 term factor 5 factor ( exp ) 6 factor num 7 addop + 8 addop - 9 mulop * First(exp) = { (, num } First(term) = { (, num } First(factor) = { (, num } First(addop) = { +, - } First(mulop) = { * }
Causes of Non-LL(1) Grammar: 2301373 Chapter 4 Parsing 28 Causes of Non-LL(1) Grammar What causes grammar being non-LL(1)? Left-recursion Left factor
Left Recursion: 2301373 Chapter 4 Parsing 29 Left Recursion Immediate left recursion A A X | Y A A X 1 | A X 2 |…| A X n | Y 1 | Y 2 |... | Y m General left recursion A => X =>* A Y Can be removed very easily A Y A’, A’ X A’| A Y 1 A’ | Y 2 A’ |...| Y m A’, A’ X 1 A’| X 2 A’|…| X n A’| Can be removed when there is no empty-string production and no cycle in the grammar A=Y X* A={Y 1 , Y 2 ,…, Y m } {X 1 , X 2 , …, X n }*
Removal of Immediate Left Recursion: 2301373 Chapter 4 Parsing 30 Removal of Immediate Left Recursion exp exp + term | exp - term | term term term * factor | factor factor ( exp ) | num Remove left recursion exp term exp’ exp’ + term exp’ | - term exp’ | term factor term’ term’ * factor term’ | factor ( exp ) | num exp = term ( term)* term = factor ( * factor)*
General Left Recursion: 2301373 Chapter 4 Parsing 31 General Left Recursion Bad News! Can only be removed when there is no empty-string production and no cycle in the grammar. Good News!!!! Never seen in grammars of any programming languages
Left Factoring: 2301373 Chapter 4 Parsing 32 Left Factoring Left factor causes non-LL(1) Given A X Y | X Z. Both A X Y and A X Z can be chosen when A is on top of stack and a token in First(X) is the next token. A X Y | X Z can be left-factored as A X A’ and A’ Y | Z
Example of Left Factor: 2301373 Chapter 4 Parsing 33 Example of Left Factor ifSt if ( exp ) st else st | if ( exp ) st can be left-factored as ifSt if ( exp ) st elsePart e lsePart else st | seq st ; seq | st can be left-factored as seq st seq’ seq’ ; seq |
Bottom-up Parsing: 2301373 Chapter 4 Parsing 34 Bottom-up Parsing Use explicit stack to perform a parse Simulate rightmost derivation (R) from left (L) to right, thus called LR parsing More powerful than top-down parsing Left recursion does not cause problem Two actions Shift: take next input token into the stack Reduce: replace a string B on top of stack by a nonterminal A, given a production A B
Example of Shift-reduce Parsing: 2301373 Chapter 4 Parsing 35 Example of Shift-reduce Parsing Reverse of rightmost derivation from left to right 1 ( ( ) ) 2 ( ( ) ) 3 ( ( ) ) 4 ( ( S ) ) 5 ( ( S ) ) 6 ( ( S ) S ) 7 ( S ) 8 ( S ) 9 ( S ) S 10 S’ S Grammar S’ S S (S)S | Parsing actions Stack Input Action $ ( ( ) ) $ shift $ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S ) S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S ) S $ S $ accept
Example of Shift-reduce Parsing: 2301373 Chapter 4 Parsing 36 Example of Shift-reduce Parsing 1 ( ( ) ) 2 ( ( ) ) 3 ( ( ) ) 4 ( ( S ) ) 5 ( ( S ) ) 6 ( ( S ) S ) 7 ( S ) 8 ( S ) 9 ( S ) S 10 S’ S Grammar S’ S S (S)S | Parsing actions Stack Input Action $ ( ( ) ) $ shift $ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S ) S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S ) S $ S $ accept Viable prefix handle
Terminologies: 2301373 Chapter 4 Parsing 37 Terminologies Right sentential form sentential form in a rightmost derivation Viable prefix sequence of symbols on the parsing stack Handle right sentential form + position where reduction can be performed + production used for reduction LR(0) item production with distinguished position in its RHS Right sentential form ( S ) S ( ( S ) S ) Viable prefix ( S ) S, ( S ), ( S, ( ( ( S ) S, ( ( S ), ( ( S , ( (, ( Handle ( S ) S. with S ( S ) S . with S ( ( S ) S . ) with S ( S ) S LR(0) item S ( S ) S. S ( S ) . S S ( S . ) S S ( . S ) S S . ( S ) S
Shift-reduce parsers: 2301373 Chapter 4 Parsing 38 Shift-reduce parsers There are two possible actions: shift and reduce Parsing is completed when the input stream is empty and the stack contains only the start symbol The grammar must be augmented a new start symbol S’ is added a production S’ S is added To make sure that parsing is finished when S’ is on top of stack because S’ never appears on the RHS of any production.
LR(0) parsing: 2301373 Chapter 4 Parsing 39 LR(0) parsing Keep track of what is left to be done in the parsing process by using finite automata of items An item A w . B y means: A w B y might be used for the reduction in the future, at the time, we know we already construct w in the parsing process, if B is constructed next, we get the new item A w B . Y
LR(0) items: 2301373 Chapter 4 Parsing 40 LR(0) items LR(0) item production with a distinguished position in the RHS Initial Item Item with the distinguished position on the leftmost of the production Complete Item Item with the distinguished position on the rightmost of the production Closure Item of x Item x together with items which can be reached from x via -transition Kernel Item Original item, not including closure items
Finite automata of items: 2301373 Chapter 4 Parsing 41 Finite automata of items Grammar: S’ S S (S)S S Items: S’ .S S’ S. S .(S)S S (.S)S S (S.)S S (S).S S (S)S. S . S’ .S S’ S. S .(S)S S . S (S.)S S (.S)S S (S).S S (S)S. S S ( ) S
DFA of LR(0) Items: 2301373 Chapter 4 Parsing 42 DFA of LR(0) Items S’ .S S’ S. S .(S)S S . S (S.)S S (.S)S S (S).S S (S)S. S S ( ) S S’ .S S .(S)S S . S (.S)S S .(S)S S . S’ S. S (S).S S .(S)S S . S (S.)S S (S)S. S ( S ) ( ( S
LR(0) parsing algorithm: 2301373 Chapter 4 Parsing 43 LR(0) parsing algorithm
LR(0) Parsing Table: 2301373 Chapter 4 Parsing 44 LR(0) Parsing Table A’ .A A .(A) A .a A’ A. A a. A (A). A (.A) A .(A) A .a A (A.) A A a a ( ( ) 0 4 3 2 1 5
Example of LR(0) Parsing: 2301373 Chapter 4 Parsing 45 Example of LR(0) Parsing Stack Input Action $ 0 ( ( a ) ) $ shift $ 0 ( 3 ( a ) ) $ shift $ 0 ( 3 ( 3 a ) ) $ shift $ 0 ( 3 ( 3 a 2 ) ) $ reduce $ 0 ( 3 ( 3 A 4 ) ) $ shift $ 0 ( 3 ( 3 A 4 ) 5 ) $ reduce $ 0 ( 3 A 4 ) $ shift $ 0 ( 3 A 4 ) 5 $ reduce $ 0 A 1 $ accept
Non-LR(0)Grammar: 2301373 Chapter 4 Parsing 46 Non-LR(0)Grammar 0 1 4 2 5 3 S’ .S S .(S)S S . S (.S)S S .(S)S S . S’ S. S (S).S S .(S)S S . S (S.)S S (S)S. S ( S ) ( ( S Conflict Shift-reduce conflict A state contains a complete item A x. and a shift item A x.By Reduce-reduce conflict A state contains more than one complete items. A grammar is a LR(0) grammar if there is no conflict in the grammar .
SLR(1) parsing: 2301373 Chapter 4 Parsing 47 SLR(1) parsing Simple LR with 1 lookahead symbol Examine the next token before deciding to shift or reduce If the next token is the token expected in an item, then it can be shifted into the stack. If a complete item A x. is constructed and the next token is in Follow(A), then reduction can be done using A x. Otherwise, error occurs. Can avoid conflict
SLR(1) parsing algorithm: 2301373 Chapter 4 Parsing 48 SLR(1) parsing algorithm
SLR(1) grammar: 2301373 Chapter 4 Parsing 49 SLR(1) grammar Conflict Shift-reduce conflict A state contains a shift item A x. W y such that W is a terminal and a complete item B z . such that W is in Follow(B). Reduce-reduce conflict A state contains more than one complete item with some common Follow set. A grammar is a n SLR ( 1 ) grammar if there is no conflict in the grammar.
SLR(1) Parsing Table: 2301373 Chapter 4 Parsing 50 SLR(1) Parsing Table A’ .A A .(A) A .a A’ A. A a. A (A). A (.A) A .(A) A .a A (A.) A A a a ( ( ) 0 4 3 2 1 5 A (A) | a
SLR(1) Grammar not LR(0): 2301373 Chapter 4 Parsing 51 SLR(1) Grammar not LR(0) 0 1 4 2 5 3 S’ .S S .(S)S S . S (.S)S S .(S)S S . S’ S. S (S).S S .(S)S S . S (S.)S S (S)S. S ( S ) ( ( S S (S)S |
Disambiguating Rules for Parsing Conflict: 2301373 Chapter 4 Parsing 52 Disambiguating Rules for Parsing Conflict Shift-reduce conflict Prefer shift over reduce In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else Reduce-reduce conflict Error in design
Dangling Else: 2301373 Chapter 4 Parsing 53 Dangling Else state if else other $ S I 0 S4 S3 1 2 1 ACC 2 R1 R1 3 R2 R2 4 S4 S3 5 2 5 S6 R3 6 S4 S3 7 2 7 R4 R4 S’ .S 0 S .I S . other I . if S I . if S else S S I. 2 S . other 3 I if S else . S 6 S .I S . other I . if S I . if S else S I if .S 4 I if .S else S S .I S . other I . if S I . if S else S S’ S. 1 I if S. 5 I if S. else S I . if S else S 7 S S if I other S if I I if other else other other