Compiler Construction Lec 1 CC

Insert YouTube videos in PowerPont slides with aS Desktop
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Compiler Construction : 

Compiler Construction Lecture 1 Dr. Tariq Mahmood NUCES (FAST) – KHI tariq.mahmood@nu.edu.pk 1

Instructor Profile : 

Instructor Profile Name: Tariq MAHMOOD Other email: tariq.mahmood@gmail.com 1996: O-Levels – Karachi Public School 1998: A-Levels – Karachi Grammar School 2002: BCS – GIK Institute of Engineering Sciences and Technology 2004: MCS in Machine Learning (ML)– Universite Pierre et Marie Curie (Paris 6), Paris, France 2009: PhD (Application of ML techniques to E-Commerce Systems) – University of Trento, Trento, Italy. 2

Course Textbooks : 

Course Textbooks Compilers – Principles, Techniques and Tools (Second Edition) Authors: Alfred V. Aho Monica S. Lam Ravi Sethi Jeffrey D. Ullman 3

Course Objectives : 

Course Objectives Introduce and Describe the Fundamentals of Compilation Techniques Assist you in Writing your Own Compiler (or a part of it) Overview & Basic Concepts Lexical Analysis Syntax Analysis / Syntax-Directed Translation Semantic Analysis Intermediate Code Generation. 4

Grade Distribution : 

Grade Distribution Assignments: 5% Quizzes: 10% Mid-Term I: 10% Mid-Term II: 10% Project: 15% Final: 50% 5

What is a Compiler? : 

What is a Compiler? Programming languages: Notations for describing computations to people and to machines E.g. Java, C#, C++, Perl, Prolog etc. A program written in any language must be translated into a form that is understood by the computer This form is typically known as the Machine Language (ML), Machine Code, or Object Code Consists of streams of 0’s and 1’s The program that carries out the translation activity is called the compiler. 6

Basic Concept : 

Basic Concept The language to be translated: Source language Input code is called the Source code The language produced: Target language Output code is called Target code A compiler is a program that translates source code written in one language into the target code of another language. 7 Java Machine Language

Why Compile? : 

Why Compile? Executables: Files that actually do something by carrying out a set of instructions (as compared to files that only contain data) E.g., .exe files in Windows If you look at their data, it won’t make sense 8 Hex Dump of a boot loader executable The primary reason for compiling source code is to create an executable ML program

Why Compile? : 

Why Compile? Once the executable is there, it can be called by the user to process the given inputs to a program and produce the desired outputs Initially, there is a compile phase which is followed by the run/execute phase This approach is used in many languages, e.g., Java, C++, C# etc. Interpretation is concept parallel to compilation There is only one run/execute phase. 9

Compiler vs. Interpreter : 

Compiler vs. Interpreter Interpreter: A program that doesn’t produce the target executable. It can do three things: In a line-by-line fashion, it directly executes the source code by using the given inputs, and producing the desired outputs May translate source code into some intermediate language and then execute that immediately, e.g., Perl, Python, and Matlab May also execute previously stored pre-compiled code, made by a compiler that is part of the interpreter system, e.g., Java and Pascal Some systems, e.g., SmallTalk, may combine 2 and 3. 10

Interpreter : 

Interpreter 11 Compiler Source Code Program Inputs Bytecodes: Intermediate Program Virtual Machine (Interpreter) Desired Outputs Java combines compilation and interpretation Portability: Bytecodes compiled on one machine can be interpreted on another one Faster Execution through Just-in-Time (JIC) compilers: translate parts of bytecode into ML immediately before execution

Language Processing System : 

Language Processing System 12 Preprocessor Source Code Library Files Re-locatable ML Files Compiler Assembler Linker/Loader Preprocessed Source Code Target Assembly Program Re-locatable ML Target ML Combines source code stored in different files Converts source code into Assembly Language (which is easier to generate as an intermediate representation and also, easier to debug) Assembler produces ML Linker: resolves external references Loader: Puts all object files into memory for execution Combines different ML parts (of the same program) that have been compiled separately

Structure of a Compiler : 

Structure of a Compiler 13 Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Character Stream Machine-Independent Code Optimizer Machine Code Generator Machine-Dependent Code Optimizer Token Stream Syntax Tree Semantically-correct Syntax Tree Intermediate Representation Intermediate Representation Target Machine Code Optimized Target Machine Code SYMBOL TABLE

Compilation – Analysis Part : 

Compilation – Analysis Part Phases involved: Lexical Analysis Syntax Analysis Semantic Analysis Determines the operations implied by the source program which are recorded in a tree structure called the Syntax Tree Breaks up the source code into constituent pieces, while storing information in the symbol table Imposes a grammatical structure on these pieces. 14

Compilation – Synthesis Part : 

Compilation – Synthesis Part Phases involved: Intermediate Code Generation Intermediate Code Optimization Machine Code Generator Machine Dependent Code Optimization Constructs the target code from the syntax tree, and from the information in the symbol table Optimization is another important activity Both the intermediate and machine codes are optimized in order to achieve an efficient translation, with the least possible use of computing resources. 15

Front End & Back End : 

Front End & Back End Front End phases: Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Generation Intermediate Code Optimization Front end is machine independent Back End phases: Machine Code generation Machine Code optimization Back end is machine dependent. 16

Lexical Analysis : 

Lexical Analysis Read the character stream of the source code and break it up into meaningful sequences called lexemes (a.k.a Scanning) Each lexeme is represented as a token <token_name, attribute value> Single, atomic unit of the language E.g., force = mass * 60 Lexeme force: token <id,1> Lexeme =: token <=> Lexeme mass: token <id,2> Lexeme *: token <*> Lexeme 60:token <60>. Output: <id,1>=<id,2>*60 17 SYMBOL TABLE

Syntax Analysis : 

Syntax Analysis Syntax analysis: Parsing the token stream to identify the grammatical structure of the stream (a.k.a parsing) Typically builds a parse tree, which replaces the linear sequence of tokens with a tree structure The tree is built according to the rules of a formal grammar which define the language's syntax It is analyzed, augmented, and transformed by later phases of compilation. 18 <id,1> = * <id,2> 60 Syntax tree for <id,1>=<id,2>*60

More Examples : 

More Examples 19

Semantic Analysis : 

Semantic Analysis Semantic analysis: Adding semantic information to the parse tree  Performs semantic checks, e.g., type checking (checking for type errors),  object binding (associating variable and function references with their definitions), rejecting incorrect programs or issuing warnings Suppose force and mass are floats and acceleration is integer: type casting required 20 <id,1> = * <id,2> inttofloat Semantically correct syntax tree for <id,1>=<id,2>*60 60

Example: English Language : 

Example: English Language Ahmad told us that Faisal was going to his place Who does his refer to? In programming language, such ambiguities are avoided { int Ahmad = 10; { int Ahmad = 15; System.out.println(Ahmad); } } 21 15 is printed

Intermediate Code Generator : 

Intermediate Code Generator Intermediate Code Generation: After verifying the semantics of the source code, we convert it into a machine-like intermediate representation i.e., like a program for a machine Typically, an assembly language-like form is used, because it is easy to generate and debug Three address code: 3 operands/instruction Suppose t1 and t2 are registers t1=inttofloat(60) t2=id2*t1 id1=t2. 22 <id,1> = * <id,2> inttofloat 60

Machine-Independent Optimizer : 

Machine-Independent Optimizer Machine-Independent Optimizer: Optimize the intermediate code, so that a better target program can be generated Faster, smaller code that consumes less computation power E.g., X = Y * 0 is the same as X = 0 Example: t1=inttofloat(60) t2=id2*t1 t1=id2*60.0 id1=t2 id1=t1 23 Eliminating the one time use of register t2, and converting 60 to a float

Machine Code Generation : 

Machine Code Generation Machine Code Generator: Converts the optimized intermediate code into the target code (typically the Machine Code) This is done through the Assembly Language In case of Machine Code, registers (memory locations) are selected for each variable Then, intermediate instructions are translated into sequences of machine code instructions that perform the same task Example: id1=id2*60.0 LDF R2,id2 MULF R1, R2, #60 24 Transfer id2 into R2, multiply and assign to R1

Machine-Dependent Optimizer : 

Machine-Dependent Optimizer Machine-Dependent Optimizer: In this phase, after the Machine Code has been generated, it is optimized further (if needed) This completes the compilation process Important: Optimization is an optional activity in compilation One or both of the optimization phases might be missing. 25

Questions : 

Questions 26