2ND YEAR  (B.TECH)  
COMPUTER SCIENCE AND ENGINEERING
RTMNU , NAGPUR 
SUB-     SYATEM PROGRAMMING
Unit :- 2 Assembler 

Elements of assembly language programming

·         An assembly language is a machine dependent, low level programming language which is specific to certain computer system. It provide 3 basic features which simplify programming

                   Mnemonic Operation Codes

                   Symbolic operands

                   Data declarations

1)      Mnemonic operation codes

·         Eliminates the need to memorize numeric operation codes.

2)      Symbolic operands

·         Symbolic names can be associated with data or instructions. Symbolic names can be used as operands in assembly statements (need not know details of memory bindings).

3)      Data declarations

·         Data can be declared in a variety of notations, including the decimal notation (avoids Conversion of constants into their internal representation).

Statement format

·         An Assembly language statement has following format:

              [Label] <opcode> <operand spec>[,<operand spec>..]

·         If a label is specified in a statement, it is associated as a symbolic name with the memory word generated for the statement.

             <operand spec> has the following syntax:

             <symbolic name> [+<displacement>] [(<index register>)]

                                                      Eg. AREA , or AREA+5, or AREA(4), or AREA+5(4)

Mnemonic Operation Codes

  ·   Each statement has two operands, first operand is always a register and second operand refers to a memory word using a symbolic name and optional displacement.


Assembly Language Statements

An assembly program contains three kinds of statements:

                                  Imperative Statements

                                  Declaration Statements

                                  Assembler Directives

A)    Imperative Statements:

·         They indicate an action to be performed during the execution of an assembled program. Each imperative statement is translated into one machine instruction.

B)    Declaration Statements:

           Syntax is as follows:

                [Label] DS <constant>

                                                                   [Label] DC '<value>

·         The DS (declare storage) statement reserves memory and associates names with them. Ex: A DS 1 ; reserves a memory area of 1 word, associating the name A to it G DS 200 ; reserves a block of 200 words and the name G is associated with the first word of the block .

·         The DC (declare constant) statement constructs memory words containing constants. Ex: ONE DC '1’ ; associates name one with a memory word containing value 1

C)    Assembler Directive

·         Assembler directives instruct the assembler to perform certain actions during the assembly of a program. Some assembler directives are described in the following:

1)  START <constant>

·         This directive indicates that the first word of the target program generated by the assembler should be placed in the memory word having address <constant>.

2)  END [<operand spec>]

·         This directive indicates the end of the of the source program. The optional <operand spec> indicates the address of the instruction where the execution of the program should begin.

D)    Design Specification of an assembler

·         There are four steps involved to design the specification of an assembler:

1.       Identify information necessary to perform a task.

2.       Design a suitable data structure to record information.

3.       Determine processing necessary to obtain and maintain the information.

4.       Determine processing necessary to perform the task

GENERAL DESIGN PROCEDURE

·  There is a general way in which all software is designed and this should be known before designing the assembler. Listed below are six steps that should be followed by the designer:

1.       Specify the problem.

2.       Specify data structures.

3.       Define format for data structures.

4.       Specify algorithm

5.       Look for modularity (i.e. dividing a program into modules)

6.       Repeat 1 through 5 on each module in step 5.


DESIGN OF ASSEMBLER

STATEMENT OF PROBLEM

                 Source of program                              Relative address         mnemonic       Relative address      mnemonic           

JOH N

START

USING

0

*,15

 

 

L

1,FIVE

0

L

1,_(0,15)

0

L

1,16(0,15)

 

A

1,FOUR

4

A

1,_(0,15)

4

A

1,12(0,15)

 

ST

1,TEMP

8

ST

1,_(0,15)

8

ST

1,20(0,15)

FOUR

DC

F’4’

12

4

 

12

4

 

FIVE

DC

F’5’

16

5

 

16

5

 

TEMP

DS

1F

20

-

 

20

-

 

 

END

 

 

 

 

 

 


 

·   The building blocks of the assembly programs are pseudo codes, machine codes and symbols or literals.

· The pseudo codes as discussed give information to the assembler. The machine codes have got unique binary code.

·    Next the symbols or literals that represent values stored at particular locations are substituted with these memory addresses.

·  A pseudo code START that informs the assembler that the name of the program is JOHN. Next is the USING

·  Pseudo code that informs the assembler that R15 is used as the base register and during execution will contain the address of the first instruction of the program.

·   The load instruction next is substituted for its binary values and then R1 and since location of FIVE is not known so we leave space for offset, and substitute 0 for index register since it is not used and 15 in place of base register.

·       L 1,_(0, 15). The add (A) instruction and store (ST) instruction is assembled in the same way as the address of FOUR and TEMP is not known.

·     Next we have pseudo codes DC then define symbols FOUR, FIVE in the relative location 12 and 16.

·   The DS pseudo codes then define TEMP as 1F. And the END informs the assembler the program terminates over here. 

Pass 1: Purpose define symbols and literals.

1.       Determine length of machine instructions (MOTGET1).

2.       Keep track of Location Counter (LC).

3.       Remember values of symbols until pass 2 (STSTO).

4.       Process some pseudo ops, e.g. EQU, DS(POTGET1).

5.       Remember literals (LITSTO).

Pass 2: Purpose generate object programs.

1.       Look up values of symbols (STGET).

2.       Generate instructions (MOTGET2).

3.       Generate data (for DS, DC and literals).

4.       Process pseudo ops (POTGET2).



DATA STRUCTURE

The second step in our design procedure is to establish the databases that we have to work with.

Pass 1: databases:

1.       Input source program.

2.       A Location Counter (LC), used to keep track of each instruction’s location.

3.       A table, the Machine-Operation Table (MOT), that indicates the symbolic mnemonic for each instruction and its length (two, four, or six bytes).

4.       A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1.

5.       A table, the Symbol Table (ST), that is used to store each literal and its corresponding value.

6.       A table, the Literal Table (LT), that is used to store each literal encountered and its corresponding assigned location.

7.       A copy of the input to be used later by pass 2. This may be stored in a secondary storage device, such as magnetic tape, disk, or drum, or the original source deck may be read by the assembler a second time for pass 2.

 

Pass 2: databases:

1.       Copy of source program input to pass 1.

2.       Location Counter (LC).

3.       Text Box: Page3A table, the Machine Operation Table (MOT), that indicates for each instruction: (a)Symbolic mnemonic , (b)Length, (c)Binary machine op code, (d)Format (e.g. RS, RX, SI)

4.       A table, the Pseudo-Operation Table (POT), that indicates for each pseudo-op the symbolic mnemonic and the action to be taken in pass 2.

5.       The Symbol Table (ST), prepared by pass1, containing each label and its corresponding value.

6.       A table, the Base Table (BT), that indicates which registers are currently specified as base registers by USING pseudo-ops and what are the specified contents of these registers.

7.       A work-space, INST, that is used to hold each instruction as its various parts (e.g. binary op-code, register field, length field, displacement field) are being assembled together.

8.       A workspace, PRINT LINE, used to produce a printable listing.

9.       A workspace, PUNCH CARD, used prior to actual outputting for converting the assembled instructions into the format needed by the loader.

10.    An output deck of assembled instructions in the format needed by the loader.


FORMAT OF DATABASES

·         Pass 2 requires a Machine Operation Table (MOT) containing name, length, binary code and formats, where as pass 1 requires only name and the length.

·         The MOT and POTs are example of fixed tables i.e. the content of these tables are not filled or altered during the assembly process.


·         The op code is the key and its value is the binary op code equivalent, which is stored for use in generating machine codes.

·         The instruction length is store for use in updating the location counter; the instruction format for used in forming the machine language equivalent.

·         Each pseudo op is listed with an associated pointer to the assembler routine for processing the pseudo op.

·         The Symbol Table and Literal Table (figure 3.8) include for each entry not only the name and assembly time value field bit also a length field and a relative location indicator.

·         The length filed indicates the length (in bytes) of the instruction or datum to which the symbol is attached.

·         The symbol COMMA being a character has length 1, symbol F being a floating point has length 4, AD being an add instruction has length 4 and symbols WORD   being    literal   has   length   4. 

·          If    symbol equivalent to another its length is made the same as that of the other. * if of length 1 and has the address as that of current content of the LC

·      A possible Base table that is used by the assembler to generate the proper base register reference in machine instructions and to compute the correct offsets.

·      The assembler must generate an address (offset, base register number, index register number) for most symbolic references.

·         The ST contains the address of the symbol relative to the beginning of the program.

·       The address is then formulated as Base register number = the base register containing a values closest to the symbolic reference and Offset= (value of the symbol in ST) - (content of the base register) it the address is “A”.

Format of database in Pass1 and Pass2 assembler

ALGORITHM

·         The flowchart in the figure 3.10 and 3.11 describe in some detail an algorithm for an assembler for an IBM 360 computer.

·         These diagrams represent a simplification of the operations performed in a complex assembler but they illustrate most of the logical processes involved.


The algorithm for pass 1 may be summarized as follows:

1.       Initialize the location counter (LC) to 0, i.e. LC=0.

2.       Read the card (READ 1)

3.       if the operation code is pseudo then (POTGET 1)

a.        Search the POT table.

b.       If found check its type.

i.      If DS or DC then

1.       Adjust to proper alignment

2.       Assign length of the data field to L (DLENGTH)

3.       goto step 5

ii.      if EQU then

1.       evaluate operand field (EVAL)

2.       assign value to symbol in label field (STSTO)

3.       goto step 7

iii.      if UISNG or DROP then

1.       goto step 7

iv.      if END then

1.       assign storage location to literals (LITASS)

2.       rewind and reset copy files

3.       goto pass 2.

4.       if the operation code is machine then

a.        search the MOT table

b.       assign length of the instruction to L

c.        process any literals and enter into literal table (LTSTO)

5.       if label field has a symbol then

a.        assign current value of LC to symbols

6.       update LC as LC=LC+L

7.       write copy of the card on file for use by pass 2. (WRITE 1)

8.       goto 1.

 

The algorithm for pass 2 may be summarized as follows:

1.       Initialize the location counter (LC) to 0, i.e. LC=0.

2.       Read the card from file copy (READ 2)

3.       if the operation code is pseudo then

a.       search the POT table (POTGET 2)

b.       If found check its type.

i.      If DS or DC then

1.       Adjust to proper alignment

2.       Check its type

a.        if DC then

i.      Insert the constant in the assembled program (DCGEN)

ii.      goto step 3.b.i.2.b.i

c.         if DS then

i.      Assign length of data field to L.

           (DLENGTH)

ii.      goto 4.c.iv.3

ii.      if EQU then

1.       print listing (PRINT)

2.       goto 1

iii.      if UISNG then

1.       evaluate operand (EVAL)

2.       enter base reg. no. and value into base table (BTSTO)

3.       print listing (PRINT)

4.       goto 1

iv.      if DROP then

1.       indicate base reg. no. unavailable (BTDROP)

2.       print listing (PRINT)

3.       goto 1

v.      if END then

1.       generate literals for entries in Literal table (LTGEN)

2.       STOP.

4.       if the operation code is machine then

a.       search the MOT table (MOTGET)

b.       assign length of the instruction to L and get op-code byte and format code.

c.        Check for the type for instruction

i.      if SS then

1.   …

ii.      if RS then

1.   …

iii.      if SI then

1.   …

iv.      if RR then

1.       evaluate both register expressions and inset into 2nd byte. (EVAL)

2.       punch assembled instructions (PUNCH)

3.       print assembly listing line (PRINT)

4.       update location counter i.e. LC=LC+L

5.       goto step 1

v.      if RX then

1.       evaluate register and index expressions and inset into 2nd byte (EVAL)

2.       calculate effective address (EA) of operand (EVAL)

3.       determine appropriate displacement and base register  D+C(B)=EA(BTGET)


Pass 1: DEFINE SYMBOLS

·         The purpose of the first pass is to assign a location to each instruction and data defining pseudo-instruction, and this to define vales for symbols appearing in the label fields of the source program.

·         Initially, the LC is set to the first location in the program (relative address 0).

·         Then a source statement is read. The operation code field is examined to determine if it is a pseudo op; if it is not, the table of MOT is searched to find a match for the source statement’s op code field.

·         If new literal is found, it is entered into the LT for later processing.

·         The symbol is saved in ST along with the current value of the LC.

·         The current value of the LC is incremented by the length of the instruction and a copy of the source code is saved for use by pass 2. The above sequence is then repeated for the next instruction.

·          The assembler need only save the USING and DROP cards for pass 2.

·         When the END pseudo op is encountered, pass 1 is terminated.

Pass 2: GENERATE CODE

·      The LC is initialized as in pass 1, and the processing continues. A card is read form the source file left by pass

·      The MOT is searched to find a match for the card’s op code field. The matching MOT entry specifies the length, binary op code, and the format type of the instruction.

·      The RR- format instructions, each of the two register specification field is evaluated.

·      The LC is incremented and processing is continued with the next card.

·      The EQU pseudo op requires very little processing in pass 2, because symbol definition was completed in pass 1.

·      The USING and DROP pseudo ops, which were largely ignored in pass 1, require additional processing in pass 2.

·      The BT is used extensively in pass 2 to compute the base and displacement fields for machine instructions with storage operands.

·      The DS and DC pseudo ops are processed essentially as in pass 1. In pass 2, however, actual code must be generated for the DC pseudo op.

·      END pseudo ops indicate the end of the source program and terminates

 



          Pass1

1.

READ1

-

Read the next assembly source card.

2.

POTGET1

--

Search the pass 1 POT for a match with the operation field of the current source card.

3.

MOTGET1

--

Search the MOT for a match with the operation of the current source card.

4.

STSTO

--

Store a label and its associated value into the ST. if the symbol is already in the table, return error indication (multiply defined symbols)

5.

LTSTO

--

Store the literal into the LT; do not store the same literal twice.

6.

WRITE1

--

Write a copy of the assembly source card on a storage device for use by pass 2.

7.

DLENGTH

--

Scan operand fields of DC or DC pseudo op to determine the amount of storage required.

8.

EVAL

--

Evaluate an arithmetic expression consisting of constants and symbols (e.g. ALPHA, 6 , GAMMA etc)

9

STGET

--

Search the ST for the entry corresponding to a specific symbol (used by STSTO and EVAL).

10

LITASS

--

Assign storage location to each literal in the literal table (may use DLENGTH).

PASS 2

 

 

 

1.

READ2

--

Reads the next assembly source card form the file copy.

2.

POTGET2

--

Similar to POTGET1 (search POT)

3.

MOTGET2

--

Same as in pass 1 (Search MOT)

4.

EVAL

--

Same as in pass 1 (evaluate expressions)

5.

PUNCH

--

Convert generated instruction to card format; punch card when it is filled with data

6.

PRINT

--

Converts relative location and generate code to character format: print the line along with copy of the same source card.

7.

DCGEN

--

Process the field of the DC pseudo to generate object code (uses EVAL and PUNCH)

8.

DLENGTH

--

Same as pass 1

9

BTSTO

--

Insert data into appropriate entry of BT

10

BTDROP

--

Insert “unavailable” indicator into appropriate entry of BT

11.

BTGET

--

Converts effective address into base and displacement by searching BT for available registers.

12.

LTGEN

--

Generate code for the literal (use DCGEN)


TABLE PROCESSING: SEARCHING AND SORTING

LINEAR SEARCH

·         Linear search is basically used when the table is unsorted. Here the key is compared with entries of the table from the first to the nth item. Linear search is also called as brute search.

·         Let TL be the time required to find the item in the last location in the table, then the average time to search an item it TL/2. Hence its complexity is of O(n).

BINARY SEARCH

·         Binary search is basically employed to a table with sorted key values. Here in each comparison the list is reduced to half.

·          A item to be searched is compared with the middle item of the table. This lead to the following outcome:

·         Item equal to the middle element: Here the item is found and the corresponding record is returned.

·         Item more than the middle element:   Here the item is searched in the lower half of the table.

·         Item less than the middle element:      Here the item is searched in the upper half of the table.

·         This process continues till the item search succeeds or fails. The searched list gets reduced each and every time so the average time taken in this case to search an item in the table is O(log(n)).

·         Hence this search is much faster than the linear search but subject to constraint that the input list or table is a sorted one.

SORTING A TABLE

INTERCHANGE SORT

·         Interchange sort is otherwise called as bubble sort or sinking sort or shifting sort. The sorting process involves an interchange of adjacent pair of elements and put them in order. Let us take an example and explain the sorting techniques.

·          Let the list of number (keys) be 19, 13, 05, 27, 01, 26, 31, 16, 02, 09, 11 and 21. Then these are sorted as follows:


Interchange sort solved example

SHELL SORT

·         This is kind of comparative sort that compare two numbers d distance apart, where d is the length of the list.

·         In pass the distance d is updated as (d+1)/2 and the comparison and exchange process continues till a sorted list is obtained then value of d turns to be 1.

·         This sorting take log2d passes and since in each step it reduces the d to half, so it has a complexity equal to N*(log2N)2.  The process can be clearly understood from the example given below


Shell sort solved examples

BUCKET SORT

·         One of the simplest type of distributed sort is radix sort or bucket sort. It involves examining of the least significant digit of the keys first, and the same is assigned to as bucket uniquely dependent on the value of the digit.

·         After the items have been distributed into buckets, they are merged in order and then the process is repeated until no more digits are left. A number with a base requires p buckets. Although this sorting is faster but suffers from serious disadvantages that are as follows:

1. It involves two processes, a separation and a merge.

 2. It requires a lot of extra storage for the buckets.

·         This can be overcome chaining records within a logical record rather than predefining maximum size to bucket. The process can be clear from the example given below.

Solved example of Bucket and Radix sort 

ADDRESSB CALCULATION SORT