2ND YEAR  (B.TECH)  

COMPUTER SCIENCE AND ENGINEERING

RTMNU , NAGPUR 

SUB-     SYATEM PROGRAMMING

Unit :- 5


Compiler:-

·         Compiler is a translator program that translates a program written in (HLL) the source program and translate it into an equivalent program in (MLL) the target program.

·         As an important part of a compiler is error showing to the programmer.

 

·         Executing a program written n HLL programming language is basically of two parts. the source program must first be compiled translated into a object program.

·         Then the results object program is loaded into a memory executed.

Interpreter:-

·         An interpreter is a program that appears to execute a source program as if it were machine language.

 

·    Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses interpreter. The process of interpretation can be carried out in following phases.

1. Lexical analysis

2. Syntax analysis

3. Semantic analysis

 4. Direct Execution

TRANSLATOR

·    Translator is a program that takes as input a program written in one language and produces as output a program in another language.

·    The translator performs another very important role, the error-detection. Any violation of d HLL specification would be detected and reported to the programmers.

·      Important role of translator are:

               1) Translating the HLL program input into an equivalent ml program.

             2) Providing diagnostic messages wherever the programmer violates specification of the HLL.

PHASES OF A COMPILER:

 A compiler operates in phases. A phase is a logically interrelated operation that takes source program in one representation and produces output in another representation.

Lexical Analysis:-

·         The first phase of scanner works as a text scanner. This phase scans the source code as a stream of characters and converts it into meaningful lexemes.

·          Lexical analyzer represents these lexemes in the form of tokens as:

·         Example

<token-name, attribute-value>

 

Syntax Analysis:-

·         The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree).

·         In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.

Semantic Analysis:-

·         Semantic analysis checks whether the parse tree constructed follows the rules of language.

·         For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc.

·         The semantic analyzer produces an annotated syntax tree as an output.

Intermediate Code Generation:-

·         After semantic analysis the compiler generates an intermediate code of the source code for the target machine.

·         It represents a program for some abstract machine. It is in between the high-level language and the machine language.

·          This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code.

Code Optimization:-

·         The next phase does code optimization of the intermediate code. Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory).

Code Generation:-

·         In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language.

·         The code generator translates the intermediate code into a sequence of (generally) re-locatable machine code. Sequence of instructions of machine code performs the task as the intermediate code would do.

Symbol Table:-

·         It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names along with their types are stored here.

·         The symbol table makes it easier for the compiler to quickly search the identifier record and retrieve it. The symbol table is also used for scope management.

Example


LEXICAL ANALYSIS- ROLE OF FINITE STATE AUTOMATA IN LEXICAL ANALYSIS

Lexical Analysis:-

·         Lexical Analysis is the first phase of compiler design where input is scanned to identify tokens.

·         lexeme is an instance of a token. A token is a sequence of characters representing a unit of information in the source program.

Generating a lexical analyzer:-

·         lex or flex is a program that is used to generate a lexical analyzer. These translate regular definitions into C source code for efficient lexical analysis.

Role of Finite Automata:-

·         Specification of tokens by use of regular expressions.

·         Construction of a finite automata equivalent to a regular expression.

·         Recognition of tokens by a constructed finite automaton.

·         Correlation of error messages with the source code.

·         Removal of white spaces and comments from source program.

·         Expansion of macros if found in source program.

·         Identification of tokens.

·         Reading of input characters from source program.

FINITE AUTOMATON

·   A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and “no” otherwise.

·   The recognizer of the tokens as a finite automaton.

·   A finite automaton can be: deterministic (DFA) or non-deterministic (NFA)

·   This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer.

·   Both deterministic and non-deterministic finite automaton recognizes regular sets.

·   Which one?

        deterministic – faster recognizer, but it may take more space

        non-deterministic – slower, but it may take less space

        Deterministic automatons are widely used lexical analyzers.

·   First, we define regular expressions for tokens; Then we convert them into a DFA to get a lexical analyzer for our tokens.

            Non-Deterministic Finite Automaton (NFA)

·   A non-deterministic finite automaton (NFA) is a mathematical model that consists of:

o S - a set of states

o Σ - a set of input symbols (alphabet)

o move - a transition function move to map state-symbol pairs to sets of states.

o s0 - a start (initial) state

o F- a set of accepting states (final states)

·   ε- Transitions are allowed in NFAs. In other words, we can move from one state to another one without consuming any symbol.

·   A NFA accepts a string x, if and only if there is a path from the starting state to one of accepting states such that edge labels along this path spell out x.



Example


     

  Deterministic Finite Automaton (DFA)

·   A Deterministic Finite Automaton (DFA) is a special form of a NFA.

·   No state has ε- transition

·   For each symbol a and state s, there is at most one labeled edge a leaving s. i.e. transition function is from pair of state-symbol to state (not set of states)

Example:


Design of lexical analyzer:-

  Converting RE to NFA

·   This is one way to convert a regular expression into a NFA.

·   There can be other ways (much efficient) for the conversion.

·   Thomson’s Construction is simple and systematic method.

·   It guarantees that the resulting NFA will have exactly one final state, and one start state.

·   Construction starts from simplest parts (alphabet symbols).

·   To create a NFA for a complex regular expression, NFAs of its sub-expressions are combined to create its NFA.

To recognize an empty string ε:

·   To recognize a symbol a in the alphabet Σ:


For regular expression r1 | r2:

N(r1) and N(r2) are NFAs for regular expressions r1 and r2.

·   For regular expression r1 r2

Here, final state of N(r1) becomes the final state of N(r1r2).

·  

                                      


For regular expression r*

·   Example:

·   For a RE (a|b) * a, the NFA construction is shown below.




  Converting NFA to DFA (Subset Construction)

merge together NFA states by looking at them from the point of view of the input characters:

·   From the point of view of the input, any two states that are connected by an –transition may as well be the same, since we can move from one to the other without consuming any character. Thus states which are connected by an -transition will be represented by the same states in the DFA.

·   If it is possible to have multiple transitions based on the same symbol, then we can regard

a transition on a symbol as moving from a state to a set of states (ie. the union of all those states reachable by a transition on the current symbol). Thus these states will be combined into a single DFA state.

To perform this operation, let us define two functions:

·   The -closure function takes a state and returns the set of states reachable from it based on (one or more) -transitions. Note that this will always include the state itself. We should be able to get from a state to any state in its -closure without consuming any input.

·   The function move takes a state and a character, and returns the set of states reachable by one transition on this character.

·    

Syntax Analysis – Context Free Grammar

Parser