Presentation Transcript
Why Generative Models Underperform Surface Heuristics: Why Generative Models Underperform Surface Heuristics UC Berkeley
Natural Language Processing
John DeNero, Dan Gillick, James Zhang, and Dan Klein
Overview: Learning Phrases: Overview: Learning Phrases
Overview: Learning Phrases: Overview: Learning Phrases Sentence-aligned
corpus Phrase-level generative model
Outline: Outline I) Generative phrase-based alignment
Motivation
Model structure and training
Performance results
II) Error analysis
Properties of the learned phrase table
Contributions to increased error rate
III) Proposed Improvements
Motivation for Learning Phrases: Motivation for Learning Phrases J ’ ai un chat . I have a spade .
Motivation for Learning Phrases: Motivation for Learning Phrases
Motivation for Learning Phrases: Motivation for Learning Phrases … appelle un chat un chat …
A Phrase Alignment Model Compatible with Pharaoh: A Phrase Alignment Model Compatible with Pharaoh les chats aiment le poisson frais .
Training Regimen That Respects Word Alignment: Training Regimen That Respects Word Alignment
Training Regimen That Respects Word Alignment: Training Regimen That Respects Word Alignment les chats aiment le poisson cats like fresh fish . . frais .
Performance Results: Performance Results
Performance Results: Performance Results
Outline: Outline I) Generative phrase-based alignment
Model structure and training
Performance results
II) Error analysis
Properties of the learned phrase table
Contributions to increased error rate
III) Proposed Improvements
Example: Maximizing Likelihood with Competing Segmentations: Training Corpus
French: carte sur la table
English: map on the table
French: carte sur la table
English: notice on the chart Example: Maximizing Likelihood with Competing Segmentations
Example: Maximizing Likelihood with Competing Segmentations: Training Corpus
French: carte sur la table
English: map on the table
French: carte sur la table
English: notice on the chart Example: Maximizing Likelihood with Competing Segmentations
EM Training Significantly Decreases Entropy of the Phrase Table: EM Training Significantly Decreases Entropy of the Phrase Table French phrase entropy: 10% of French phrases have deterministic distributions
Effect 1: Useful Phrase Pairs Are Lost Due to Critically Small Probabilities: Effect 1: Useful Phrase Pairs Are Lost Due to Critically Small Probabilities In 10k translated sentences, no phrases with weight less than 10-5 were used by the decoder.
Effect 2: Determinized Phrases Override Better Candidates During Decoding: Effect 2: Determinized Phrases Override Better Candidates During Decoding the situation varies to an enormous degree the situation varie d ' une immense degré the situation varies to an enormous degree the situation varie d ' une immense caractérise Heuristic Learned
Effect 3: Ambiguous Foreign Phrases Become Active During Decoding: Effect 3: Ambiguous Foreign Phrases Become Active During Decoding Translations for the French apostrophe
Outline: Outline I) Generative phrase-based alignment
Model structure and training
Performance results
II) Error analysis
Properties of the learned phrase table
Contributions to increased error rate
III) Proposed Improvements
Motivation for Reintroducing Entropy to the Phrase Table: Motivation for Reintroducing Entropy to the Phrase Table Useful phrase pairs are lost due to critically small probabilities.
Determinized phrases override better candidates.
Ambiguous foreign phrases become active during decoding.
Reintroducing Lost Phrases: Reintroducing Lost Phrases Interpolation yields up to 1.0 BLEU improvement
Smoothing Phrase Probabilities: Smoothing Phrase Probabilities
Conclusion: Conclusion Generative phrase models determinize the phrase table via the latent segmentation variable.
A determinized phrase table introduces errors at decoding time.
Modest improvement can be realized by reintroducing phrase table entropy.
Questions?: Questions?