6 Phases of a Compiler: Explained with Examples & Diagrams

When you write a program in a high-level language like C, C++, or Java, the computer is unable to understand your code directly. The computer can only understand machine code (binary). This is where a compiler becomes useful. The phases of compiler show how a high-level program is processed and converted into machine code

There are 6 phases of compiler, each having a specific job to make your code ready for execution. Let’s understand each of these phases in simple terms.

Table of Contents:

What is a Compiler?
Phases of Compiler
Symbol Table
Error Handling in Phases of the Compiler
Symbol Table Management
Applications of Compilers
Conclusion

What is a Compiler?

A compiler is a tool that converts the code written in a high-level programming language into machine code that a computer can understand and execute easily. It processes the entire source code at once and generates an executable program or file. The compilation is done in several steps, such as lexical analysis, syntax checking, optimization, and code generation. A compiler helps to detect errors during the translation process before running the program. Each of these phases of the compiler works together to produce the final executable code. Examples of compilers are GCC for C/C++ and javac for JAVA.

Phases of Compiler

The phases of compiler are basically the steps by which the compiler goes through in order to convert the source code to machine code. These phases are broken down into two major parts, analysis and synthesis.

1. Analysis Phase: The analysis phase breaks the source code into its basic elements and creates an intermediate representation. This phase checks for lexical, syntactic, and semantic errors. It is also called the front-end of the compiler.

2. Synthesis Phase: The synthesis phase takes the intermediate representation and transforms it into the final target program or machine code. This phase performs code optimization and code generation. It is also called the back end of the compiler.

Now, let’s discuss each of the six phases of compiler in detail.

1. Lexical Analysis

Lexical analysis is the first phase of a compiler that converts the source code into a sequence of tokens. It simplifies the input for the next stage of the compiler. It is also known as the process of tokenization.

The lexical analyzer reads the code character by character and groups them into keywords, identifiers, operators, and symbols. This phase also removes white spaces and comments. It also creates a symbol table to store information about variables and constants. Then, at last, the output is passed to the syntax analysis phase for further processing.

Example:

int x = 20;

The lexer or lexical analyzer would break this and generate the following tokens:

int -> keyword
x -> identifier
= -> assignment operator
10 -> constant
; -> delimiter

So, basically, lexical analysis is performed by the lexical analyzer or scanner, which is a special program, and the output, a stream of tokens, is used by the next phase for the syntax analysis of the code.

2. Syntax Analysis

Syntax analysis is the second phase of a compiler, which takes the stream of tokens that are generated by the lexical analyzer and checks whether they follow the grammatical structure or syntax rules of the programming language. It is also known as parsing.

The syntax analyzer makes a parse or syntax tree, which shows the hierarchical structure of the source code. If the code does not follow the grammar of the language, the parser gives syntax errors. This phase makes sure that the program is structurally correct before moving to the next phase of the compilation.

Example:

The above statement “int x =20;” is now analyzed for the grammatical structure. The parse tree of this might look like the diagram given below:

This tree shows how the parser groups tokens according to language rules and gives structure to the source code. If any token is out of order or missing, like the semicolon, then the parser flags a syntax error.

3. Semantic Analysis

The third phase of a compiler is called semantic analysis, which ensures that the logic of the program is correct or not, as per the rules of the programming language. It ensures that the code is semantic and sensible. During this stage, the semantic analyzer checks items like variable declaration and use, type matching in expressions, the passing of the right number and type of arguments to functions, scoping, and binding of identifiers. It also updates and puts information about variables, functions, and their types in the symbol table.

Example:

int a = 7;
float b = 3.5;
a = a + b;

Type Checking:

a is of type int
b is of type float
Expression a + b results in a float
Assigning a float to an int without explicit casting is not allowed in many statically typed languages

Error: “Type mismatch: cannot assign float to int variable a.”

This is an example of semantic analysis, which catches a type mismatch, even though the syntax is perfectly valid.

Enroll Now to Get Certified in Software Engineering and Boost Your Career!

Start Your Tech Journey Today!

Explore Program

4. Intermediate Code Generation

Intermediate code generation is the fourth phase of a compiler, which takes the output of the semantic analysis phase and then converts it into an intermediate representation (IR) that is easier to optimize and translate into machine code. The intermediate code is independent of both the source and target machine architecture, which makes the compiler more portable and modular. This phase bridges the front-end and back-end of the compiler.

The common forms of intermediate code are:

Three-address code (TAC), which uses temporary variables and simple instructions.
Postfix notation (Reverse Polish Notation).
Syntax trees or DAGs (Directed Acyclic Graphs).
LLVM IR (used in modern compilers like Clang).

Example:

Source Code: a = b + c * d;
Intermediate Code (Three-Address Format):
t1 = c * d
t2 = b + t1
a = t2

Here, t1 and t2 are temporary variables that are used to break the expression into simpler steps.

5. Code Optimization

Code optimization is the fifth phase of a compiler, which improves the intermediate code so that the final machine code is more efficient, either by running faster using less memory or both, without changing the program’s output. This phase is optional but highly valuable for performance, and optimization can happen at different levels: local (within a block), global(across blocks/functions), or even machine-specific.

Types of Optimizations:

Constant Folding: Evaluate constant expressions at compile time.
Dead Code Elimination: Remove code that has no effect on the result.
Strength Reduction: Replace expensive operations with cheaper ones (e.g., x * 2 -> x + x).
Loop Optimization: Improve performance of loops (e.g., loop unrolling, invariant code motion).

Example:

Before optimization:

int x = 10;
int y = 20;
int z = x * 2;
int a = x * 2 + y;

Optimized code:

int x = 10;
int y = 20;
int t = x * 2;
int a = t + y;

Here, x * 2 is computed once and reused, reducing redundant computation.

6. Code Generation

The last significant phase in the compiler is code generation, in which the optimized intermediate code is passed, after which it is translated into target machine code. This is machine code that is quite specific to any hardware architecture it will operate upon, e.g., x86, ARM. The main goal is to generate the appropriate and effective machine code, which will be run by the computer as it is. The code generation has the following tasks:

Choosing appropriate machine instructions for the operations.
Using the CPU registers for storing variables efficiently.
To schedule the instructions to avoid pipeline delays.
Assigning addresses to variables and managing the stack or heap.

Example:

Intermediate Code (Three-address):

t1 = a + b
t2 = t1 * c

Generated Assembly Code (x86-style):

MOV R1, a
ADD R1, b
MOV R2, R1
MUL R2, c

Here, R1 and R2 are CPU registers, and high-level operations are converted into low-level instructions.

Get 100% Hike!

Master Most in Demand Skills Now!

Symbol Table

A symbol table is a data structure that is utilized by the compiler to store and maintain information regarding the identifiers, variables, functions, classes, etc., which have been used within the source code. It is created and maintained when lexical, syntax, and semantic analysis are carried out. Symbol table plays a significant role in type checking, scope control, and the generation of code tasks. Items that are stored in the symbol table are:

Name (identifier)
Type (int, float, etc.)
Scope (local, global)
Memory location or address
Size (for arrays, structures)
Compiler Generated Temporaries
Additional attributes (e.g., function parameters)

The main purpose of the symbol table is to prevent the redeclaration of identifiers, support type checking, assist in code generation, and handle the scope resolution.

Example:

For the code given below,

int a = 10;
float b;

The symbol table might look like:

Name	Type	Scope	Memory Location
a	int	global	1000
b	float	global	1004

Error Handling in Phases of Compiler

The compiler is designed to find and manage errors during the different phases of the program translation and its execution. These are classified into two: compile-time and runtime. Let’s discuss both in detail.

1. Compile-Time Errors

The compiler-time errors are detected before the program runs, during the compilation process. These errors are handled in the various phases of compiler.

Compiler Phase	Type of Error	Example
Lexical Analysis	Invalid tokens, illegal characters	int 9x = 5; → invalid identifier
Syntax Analysis	Violations of language grammar	if (x > ) → missing expression
Semantic Analysis	Type mismatches, undeclared variables	x = “hello”; where x is int
Intermediate Code Gen / Optimization	Logic flaws or undefined behavior	Rare, but possible during translation

Handling Compile-Time Errors

The error messages contain the line number, description, and position. Error-recovery strategies that the compiler may use in trying to recover and restart parsing are:

Panic Mode Recovery: In panic mode, the compiler skips tokens until a synchronizing symbol is found, which may be a semicolon (;) or a closing brace ( }). This is to aid the compiler to jump back and continue parsing the other portions of code.
Phrase-Level Recovery: Phrase-level recovery is done by adding, swapping, or removing tokens in a localized way to fix minor errors. An example is that it may be able to insert a missing ) and resume parsing, or replace an accidentally typed keyword.
Error Productions: The grammar is rigged to detect common errors (e.g., missing semicolons, redundant parentheses), and special production rules give the parser specific error information and graceful recovery.
Global Correction: The compiler tries to find the changes needed to fix the code and make it syntactically valid. This method is powerful but computationally expensive and rarely used in practice.

2. Run-Time Errors

Runtime errors usually occur after the successful compilation while the program is running. These errors can be handled by the runtime environment or language-level error-handling mechanisms.

Type of Error	Example
Division by zero	int x = 10 / 0;
Null pointer reference	Accessing memory through a null pointer
Array index out of bounds	arr[10] when size is 5
Stack overflow	Infinite recursion
Memory leaks/segmentation faults	Improper memory access or freeing

Run-Time Error Handling

Exception Handling: An exception in Java, C++, or Python will be used to handle errors during the execution process. This helps to avoid the crashing of the program.
Input Validation: You always need to validate the user input. This helps in preventing type errors or invalid operations (e.g., dividing by zero).
Null Checks: Verify that the pointers or object references do not point to null before reading them. This cannot cause dereference errors of null pointers
Array or Pointer Bounds Check: You must confirm that array indices or pointers are valid within range to prevent going out of bounds to access program memory.
Logging and Debugging Tools: You can use logging or debugging statements when there is a need to keep track of the behavior of the program and find the cause of run-time errors more easily.

Symbol Table Management

We have already discussed the symbol table that stores the information about identifiers during compilation. And, efficient management of the symbol table is important for fast lookup, insertion, and update. The following are different implementations of the symbol table depending on the data structure used in the management:

1. List-Based Implementation

Implementations of symbol tables that are list-based are those that apply a linear list array or a linked list to contain the identifiers’ information. Each one has an attribute such as name, type, and scope. It is very easy to implement, and the search is slow (O(n)) since it is sequential. It can be applied to small programs or learning.

Let’s understand it with an example in Python.

Python

Output:

The code above illustrates the use of a list symbol table, which can contain each symbol in the form of a dictionary containing such properties as name, type, and scope. The lookup() method scans the list in a linear fashion and returns the details of a symbol by its name (IPageID) upon discovering it.

2. Binary Search Tree (BST)

Binary Search Tree (BST) implementation stores symbols as nodes in a tree, where the trees are arranged and ordered by the name of the identifiers. It allows faster searching, insertion, and deletion than a list, typically in O(log n) time for balanced trees. BST is useful when ordered traversal of symbols is needed, but may degrade to O(n) if the tree is unbalanced.

Let’s understand it with an example in Python.

Python

Output:

The code given above shows how a binary search tree (BST) based symbol table is created, where each node contains the name and the type of an identifier. The insert function inserts new symbols in ascending order, and the lookup traverses either left or right based on the name.

Build a Solid Programming Foundation - Join Now!

Join the Free Course. Enroll Now!

Explore Program

3. Hash Table

The hash table implementation involves a hash function that maps the names of identifiers to particular indices in an array, therefore causing the hash table’s average-case lookups and inserts to run in constant time (O(1)). It resolves collisions through algorithms such as chaining, open addressing, or hybrid algorithms. The hash tables are efficient, and they are widely used by modern compilers to deal with large symbol tables.

Let’s understand it with an example in Python.

Python

Output:

The above code uses a Python dictionary as a hash table to store and retrieve symbol information using the identifier name as a key, enabling fast lookups through hashing. The lookup() function returns the symbol data if found, or None otherwise.

Applications of a Compiler

Language Translation: A compiler is used to translate a high-level source code into some machine code or an intermediate code so that it can be easily executed by a computer.
Error Detection: It finds syntax and semantic errors in compilation, which will aid the developer in error correction at early stages.
Code Optimization: Code optimization enables the performance of code, such as speed, memory use, and efficiency, through the intervention of a compiler.
Portability: The software is able to run on any platform using the same source code, which is able to be compiled on different platforms as well.
IDEs Support: Integrated Development Environments (IDEs) incorporate compiler functionality in the form of syntax highlighting, auto-complete, and debugging.
Static Security: A compiler helps in code inspection to detect potential security vulnerabilities or unsafe code practices.
Language Design and Implementation: A compiler is used to create and test new programming languages, and it also serves the researchers and the developers.

Conclusion

The compiler is an important tool for software development. At its most basic, it is a tool that helps you convert human-readable code in your source files to machine-readable code in the form of executables and libraries. There are six major phases of the compiler that transform the high-level source code into optimized machine code. Each of the components of the compiler (e.g., symbol table, error handling routines) must ensure that your code is accurate, optimized, and stable. It can also be helpful to understand the phases of a compiler in order to understand and organize the main components of the programming process and ultimately optimize the scope of your programming process.

Phases of Compiler – FAQs

Q1. What are the six phases of compiler?

The six phases of compiler are Lexical Analysis, Syntax Analysis, Semantic Analysis, Intermediate Code Generation, Code Optimization, and Code Generation. Each phase plays a different role in converting the source code into efficient machine code.

Q2. Which phase of compiler is responsible for error detection?

Error detection is part of the Syntax Analysis and Semantic Analysis phases. They are important phases of the compiler that accept only correct syntactically and meaningful programs.

Q3. What is the difference between front-end and back-end phases of compiler?

Front-end phases of the compiler are Lexical, Syntax, and Semantic Analysis that examine and validate the source code.
The back-end phases of the compiler are Code Optimization and Code Generation, which emphasize the production of executable and improved machine code.

Q4. Why are the phases of compiler important in program execution?

The phases of compiler help ensure the accuracy of programs that are ultimately transformed and executed by a computer. The phases help find errors earlier in the development process, optimize the execution of programs, and generate output that is machine-level reliable.

Q5. Which phase of compiler generates intermediate code?

The Intermediate Code Generation phase takes the analyzed source code and translates it to an intermediate representation. It is an important component of the compiler stages to find a medium between a high-level language and final code generation.

Q6. What is the difference between analysis and synthesis phase?

The analysis phase is about understanding and breaking down the problem, while the synthesis phase is about building a solution from those insights.

Q7. Which phase handles errors in compiler?

The error handling is mainly done in the analysis phase of a compiler, where it detects and reports errors in the code.

The 6 Phases of a Compiler Explained (With Diagrams)

What is a Compiler?

Phases of Compiler

1. Lexical Analysis

2. Syntax Analysis

3. Semantic Analysis

4. Intermediate Code Generation

5. Code Optimization

6. Code Generation

Symbol Table

Error Handling in Phases of Compiler

1. Compile-Time Errors

2. Run-Time Errors

Symbol Table Management

1. List-Based Implementation

2. Binary Search Tree (BST)

3. Hash Table

Applications of a Compiler

Conclusion

Phases of Compiler – FAQs

About the Author