2ND YEAR (B.TECH)
COMPUTER SCIENCE AND ENGINEERING
RTMNU , NAGPUR
SUB- SYATEM PROGRAMMING
Unit :- 2 Assembler
Elements of assembly language
programming
·
An assembly language
is a machine dependent, low level programming language which is specific to certain computer system. It provide 3 basic features which simplify programming
•
Mnemonic Operation Codes
•
Symbolic operands
•
Data
declarations
1)
Mnemonic operation codes
·
Eliminates the need to memorize numeric
operation codes.
2)
Symbolic operands
·
Symbolic names can be associated with data or instructions.
Symbolic names can be used
as operands in assembly statements (need not know details of memory bindings).
3)
Data declarations
· Data can be declared in a variety of notations, including the decimal notation (avoids Conversion of constants into their internal representation).
Statement format
·
An Assembly language
statement has following format:
[Label] <opcode> <operand spec>[,<operand spec>..]
·
If a label is specified
in a statement, it is associated as a symbolic
name with the memory word generated for the statement.
<operand spec>
has the following syntax:
<symbolic name>
[+<displacement>] [(<index register>)]
Eg. AREA , or AREA+5, or AREA(4), or AREA+5(4)
Mnemonic Operation Codes
· Each statement has two operands, first operand is always a register and second operand refers to a memory word using a symbolic name and optional displacement.
Assembly Language Statements
An
assembly program contains
three kinds of statements:
•
Imperative Statements
•
Declaration Statements
•
Assembler Directives
A)
Imperative Statements:
· They indicate an action to be performed during the execution of an assembled program. Each imperative statement is translated into one machine instruction.
B)
Declaration Statements:
Syntax is as follows:
[Label] DS <constant>
[Label] DC '<value>
·
The DS (declare
storage) statement reserves
memory and associates names with them.
Ex: A DS 1 ; reserves a memory
area of 1 word, associating the name A to it G DS 200 ; reserves a block of 200 words and
the name G is associated with the first word of the block .
· The DC (declare constant) statement constructs memory words containing constants. Ex: ONE DC '1’ ; associates name one with a memory word containing value 1
C)
Assembler Directive
· Assembler directives instruct the assembler to perform certain actions during the assembly of a program. Some assembler directives are described in the following:
1) START <constant>
· This directive indicates that the first word of the target program generated by the assembler should be placed in the memory word having address <constant>.
2) END [<operand spec>]
· This directive indicates the end of the of the source program. The optional <operand spec> indicates the address of the instruction where the execution of the program should begin.
D)
Design Specification of an assembler
·
There
are four steps involved to design the specification of an assembler:
1.
Identify information necessary to perform a task.
2.
Design
a suitable data structure to record information.
3.
Determine processing necessary to obtain and maintain the information.
4. Determine processing necessary to perform the task
GENERAL DESIGN
PROCEDURE
· There is a general way in which all
software is designed and this should be known
before designing the assembler. Listed below are six steps that should
be followed by the designer:
1.
Specify
the problem.
2.
Specify
data structures.
3.
Define
format for data structures.
4.
Specify
algorithm
5.
Look for modularity (i.e. dividing a program into modules)
6.
Repeat
1 through 5 on each module in step 5.
DESIGN OF ASSEMBLER
STATEMENT OF PROBLEM
Source of
program
Relative address
mnemonic Relative
address mnemonic
JOH N |
START USING |
0 *,15 |
|
|||||
|
L |
1,FIVE |
0 |
L |
1,_(0,15) |
0 |
L |
1,16(0,15) |
|
A |
1,FOUR |
4 |
A |
1,_(0,15) |
4 |
A |
1,12(0,15) |
|
ST |
1,TEMP |
8 |
ST |
1,_(0,15) |
8 |
ST |
1,20(0,15) |
FOUR |
DC |
F’4’ |
12 |
4 |
|
12 |
4 |
|
FIVE |
DC |
F’5’ |
16 |
5 |
|
16 |
5 |
|
TEMP |
DS |
1F |
20 |
- |
|
20 |
- |
|
|
END |
|
|
|
|
|
|
|
· The
building blocks of the assembly programs are pseudo codes, machine codes and symbols or literals.
· The
pseudo codes as discussed give information
to the assembler. The machine codes have got unique binary code.
· Next
the symbols or literals that
represent values stored at particular locations are substituted with these memory addresses.
· A pseudo code START that
informs the assembler that the name of the program
is JOHN. Next is the USING
· Pseudo code that informs the assembler
that R15 is used as the base register and during
execution will contain the address of the first instruction of the program.
· The load instruction next is substituted
for its binary values and then R1 and since location of FIVE is not known so we leave space for offset,
and substitute 0 for
index register since it is not used and 15 in place
of base register.
· L 1,_(0, 15). The add (A) instruction and store (ST) instruction is assembled
in the same way as the address of FOUR and TEMP is not known.
· Next we have pseudo codes DC then define symbols FOUR, FIVE in the
relative location 12 and 16.
· The DS pseudo codes then define
TEMP as 1F. And the END informs the assembler
the program terminates over here.
Pass 1: Purpose – define symbols
and literals.
1. Determine length
of machine instructions (MOTGET1).
2. Keep track of Location
Counter (LC).
3. Remember values
of symbols until
pass 2 (STSTO).
4. Process some pseudo ops, e.g. EQU, DS(POTGET1).
5. Remember literals
(LITSTO).
Pass 2: Purpose – generate object
programs.
1. Look up values of symbols (STGET).
2. Generate instructions (MOTGET2).
3. Generate data (for DS, DC and literals).
4. Process
pseudo ops (POTGET2).
The second step in our design procedure
is to establish the databases
that we have to work with.
Pass 1: databases:
1. Input source program.
2. A Location
Counter (LC), used to keep track of each instruction’s location.
3.
A table, the Machine-Operation Table (MOT), that indicates the symbolic mnemonic
for each instruction and its length (two, four, or six bytes).
4.
A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic
and action to be taken for
each pseudo-op in pass
1.
5.
A table, the Symbol Table (ST), that is used to store each literal
and its corresponding value.
6.
A table, the Literal Table (LT), that is
used to store each literal encountered and its corresponding assigned location.
7.
A copy of the input to be used later by pass 2. This may be stored in a secondary storage device, such as magnetic
tape, disk, or drum, or the original source deck may
be read by the assembler a second
time for pass 2.
Pass 2: databases:
1.
Copy of source
program input to pass 1.
2.
Location Counter (LC).
3.
A table, the Machine
Operation Table (MOT), that indicates
for each instruction: (a)Symbolic mnemonic ,
(b)Length, (c)Binary machine op code, (d)Format (e.g. RS, RX, SI)
4.
A table, the Pseudo-Operation Table
(POT), that indicates for each pseudo-op the symbolic
mnemonic and the action to be taken in
pass 2.
5.
The Symbol Table
(ST), prepared by pass1, containing each label and its
corresponding value.
6.
A table, the Base Table (BT), that indicates which registers are currently
specified as base registers by USING pseudo-ops and what are the specified contents
of these registers.
7.
A work-space, INST, that is used to hold each instruction as its various
parts (e.g. binary
op-code, register field,
length field, displacement field)
are being assembled together.
8.
A workspace, PRINT LINE, used to produce
a printable listing.
9.
A workspace, PUNCH CARD, used prior to actual outputting for converting the
assembled instructions into the format
needed by the loader.
10.
An output deck of assembled instructions in the
format needed by the loader.
FORMAT OF DATABASES
·
Pass 2 requires a Machine Operation Table (MOT) containing name, length,
binary code and formats, where as
pass 1 requires only name and the length.
·
The MOT and POTs are example
of fixed tables i.e. the content of these tables are not filled or altered during the assembly process.
·
The op code is the key and its value is the binary op code equivalent, which is stored for use in generating machine codes.
·
The instruction length is store for use in updating
the location counter; the
instruction format for used in forming the machine language
equivalent.
·
Each pseudo op is listed with an associated pointer to the assembler routine for processing the pseudo op.
·
·
The length
filed indicates the length (in bytes) of the instruction or datum to
which the symbol is attached.
·
The
symbol COMMA being a character has length
1, symbol F being a floating point has length 4, AD being an add instruction has length 4 and symbols WORD
being a literal
has
length
4.
·
If a
symbol equivalent to another its length
is made the same as that of the other. * if of length 1 and has the address
as that of current
content of the LC
· A possible Base table that
is used by the assembler to generate
the proper base register reference
in machine instructions and to compute
the correct offsets.
· The assembler must
generate an address (offset, base register number,
index register number)
for most symbolic
references.
·
The ST contains the address of the
symbol relative to the beginning of
the program.
· The address is then formulated as Base register
number = the base register containing a values closest
to the symbolic reference and Offset= (value of the symbol in ST) -
(content of the base register)
it the address is “A”.
Format of database in Pass1 and Pass2 assembler
ALGORITHM
·
The
flowchart in the figure 3.10 and 3.11 describe in some detail an algorithm for an assembler for an IBM 360 computer.
·
These
diagrams represent a simplification of
the operations performed in a complex assembler but they illustrate most of the logical
processes involved.
The algorithm for pass 1 may
be summarized as follows:
1.
Initialize the location counter
(LC) to 0, i.e. LC=0.
2.
Read the card (READ 1)
3.
if the operation
code is pseudo then
(POTGET 1)
a.
Search the POT table.
b. If found check its type.
i. If DS or
DC then
1. Adjust to proper alignment
2.
Assign length of the data field
to L (DLENGTH)
3. goto step 5
ii. if EQU then
1.
evaluate operand field (EVAL)
2.
assign value to symbol in label field (STSTO)
3. goto step 7
iii. if UISNG or DROP then
1. goto step 7
iv. if END then
1.
assign storage location
to literals (LITASS)
2. rewind and reset copy files
3.
goto pass 2.
4.
if the operation code is machine
then
a.
search the MOT table
b. assign length of the instruction to L
c.
process any literals
and enter into literal table (LTSTO)
5.
if label field has a symbol then
a.
assign current
value of LC to symbols
6.
update LC as LC=LC+L
7.
write copy of the card on file for use by pass 2. (WRITE 1)
8.
goto 1.
The algorithm
for pass 2 may
be summarized as follows:
1.
Initialize the location counter
(LC) to 0, i.e. LC=0.
2.
Read the card from file copy (READ 2)
3.
if the operation code is pseudo then
a.
search the POT table (POTGET 2)
b. If found check its type.
i. If DS or
DC then
1. Adjust to proper alignment
2. Check its type
a.
if DC then
i.
Insert the constant in the
assembled program (DCGEN)
ii.
goto step 3.b.i.2.b.i
c.
if DS then
i.
Assign length
of data field to L.
(DLENGTH)
ii.
goto 4.c.iv.3
ii. if EQU then
1.
print listing (PRINT)
2. goto 1
iii. if UISNG
then
1.
evaluate operand (EVAL)
2.
enter base reg. no. and value into base table (BTSTO)
3.
print listing (PRINT)
4. goto 1
iv. if DROP then
1.
indicate base reg. no. unavailable (BTDROP)
2.
print listing (PRINT)
3. goto 1
v. if END then
1.
generate literals for entries in Literal table (LTGEN)
2.
STOP.
4. if the operation code is machine
then
a.
search the MOT table (MOTGET)
b.
assign length of the instruction to L
and get op-code byte and format code.
c.
Check for the type for instruction
i.
if SS then
1. …
ii.
if RS then
1. …
iii.
if SI then
1. …
iv.
if RR then
1.
evaluate both register expressions
and inset into 2nd byte. (EVAL)
2.
punch assembled instructions (PUNCH)
3.
print assembly listing
line (PRINT)
4. update location
counter i.e. LC=LC+L
5. goto step 1
v. if RX then
1. evaluate
register and index expressions and inset into 2nd byte (EVAL)
2.
calculate effective address
(EA) of operand
(EVAL)
3.
determine
appropriate displacement and base register
D+C(B)=EA(BTGET)
Pass 1: DEFINE SYMBOLS
·
The
purpose of the first pass is to assign a location to each instruction and data defining pseudo-instruction, and this to
define vales for symbols appearing in the label
fields of the source program.
·
Initially,
the LC is set to the first location in the program
(relative address 0).
·
Then
a source statement is read. The operation code
field is examined to determine if it is a pseudo op; if it is not, the
table of MOT is searched to find a match for the source
statement’s op code field.
·
If
new literal is found, it is entered into the LT for later processing.
·
The
symbol is saved in ST along with the current
value of the LC.
·
The
current value of the LC is incremented by the
length of the instruction and a copy of the source code is saved for use
by pass 2. The above sequence is then repeated
for the next instruction.
·
The assembler need only save the USING and DROP cards for pass 2.
·
When
the END pseudo op is encountered, pass 1 is terminated.
Pass 2: GENERATE CODE
·
The LC is initialized as in pass 1, and the processing continues. A card is read form the source file left by pass
·
The MOT is searched to find a match for the card’s op code field. The
matching MOT entry specifies the length, binary
op code, and the format type of the instruction.
·
The RR- format instructions, each of the two register
specification field is evaluated.
· The LC is incremented and processing is continued with the next card.
·
The EQU pseudo op requires very little processing in pass 2, because symbol
definition was completed in pass 1.
·
The USING and DROP pseudo ops, which
were largely ignored in pass 1, require additional
processing in pass 2.
· The
BT is used extensively in pass 2 to compute the base and displacement fields for machine instructions with storage operands.
·
The DS and DC pseudo ops are processed essentially as in pass 1. In pass 2, however,
actual code must be generated for the DC pseudo op.
·
END pseudo ops indicate the end of the
source program and terminates
1. |
READ1 |
- |
Read
the next assembly source card. |
2. |
POTGET1 |
-- |
Search the
pass 1 POT for a match with the operation field of the current source card. |
3. |
MOTGET1 |
-- |
Search the MOT for a match
with the operation of the current source card. |
4. |
STSTO |
-- |
Store a label
and its associated value into the ST. if the symbol is already in the
table, return error
indication (multiply defined
symbols) |
5. |
LTSTO |
-- |
Store
the literal into
the LT; do not
store the same
literal twice. |
6. |
WRITE1 |
-- |
Write
a copy of the assembly source card on a storage
device for use by pass
2. |
7. |
DLENGTH |
-- |
Scan
operand fields of DC or DC pseudo
op to determine the amount of storage required. |
8. |
EVAL |
-- |
Evaluate an
arithmetic expression consisting of constants and symbols (e.g.
ALPHA, 6 , GAMMA etc) |
9 |
STGET |
-- |
Search the ST
for the entry corresponding to a specific symbol (used by STSTO
and EVAL). |
10 |
LITASS |
-- |
Assign storage
location to each literal in the literal table (may use DLENGTH). |
PASS 2 |
|
|
|
1. |
READ2 |
-- |
Reads
the next assembly source card form
the file copy. |
2. |
POTGET2 |
-- |
Similar to POTGET1 (search
POT) |
3. |
MOTGET2 |
-- |
Same
as in pass
1 (Search MOT) |
4. |
EVAL |
-- |
Same
as in pass 1 (evaluate expressions) |
5. |
PUNCH |
-- |
Convert
generated instruction to card format; punch card when it is filled
with data |
6. |
PRINT |
-- |
Converts
relative location and generate code to character format: print the
line along with
copy of the same source
card. |
7. |
DCGEN |
-- |
Process the field of the DC pseudo to generate object
code (uses EVAL
and PUNCH) |
8. |
DLENGTH |
-- |
Same
as pass 1 |
9 |
BTSTO |
-- |
Insert data into appropriate entry of
BT |
10 |
BTDROP |
-- |
Insert “unavailable” indicator into appropriate
entry of BT |
11. |
BTGET |
-- |
Converts effective address into base
and displacement by searching BT for available registers. |
12. |
LTGEN |
-- |
Generate code for the literal (use
DCGEN) |
TABLE PROCESSING: SEARCHING AND SORTING
LINEAR SEARCH
·
Linear search is basically
used when the table is unsorted. Here the key is compared with entries of the table from
the first to the nth item. Linear search
is also called as brute search.
·
Let
TL be the time required to find the item in the last location in the table, then the average
time to search an item it TL/2. Hence its complexity is of O(n).
BINARY SEARCH
·
Binary
search is basically employed to a table with sorted key values. Here in each comparison the list is reduced to half.
·
A item to be searched is compared with the middle item of the table.
This lead to the following outcome:
·
Item
equal to the middle element: Here the item is found and the corresponding record is returned.
·
Item
more than the middle element: Here the
item is searched in the lower half of the table.
·
Item
less than the middle element: Here
the item is searched in the upper half of the table.
·
This
process continues till the item search succeeds or fails. The searched list
gets reduced each and every time so
the average time taken in this case to search an item in the table
is O(log(n)).
·
Hence
this search is much faster than the linear search
but subject to constraint that the input list or table is a sorted one.
SORTING A TABLE
INTERCHANGE SORT
·
Interchange
sort is otherwise called as bubble sort or
sinking sort or shifting sort. The
sorting process involves an interchange of adjacent pair of elements and put them in order. Let us take an example and
explain the sorting techniques.
·
Let the list
of number (keys) be 19, 13, 05, 27, 01, 26, 31, 16, 02, 09, 11 and 21. Then
these are sorted as follows:
SHELL SORT
·
This is kind of comparative sort that compare
two numbers d distance apart,
where d is the length of the
list.
·
In pass the distance d is updated as (d+1)/2 and the comparison
and exchange process continues till a sorted list is obtained then value of d turns to be 1.
·
This sorting take log2d passes and since in each step it reduces the d
to half, so it has a complexity equal to N*(log2N)2. The
process can be clearly understood from the example
given below
BUCKET
SORT
·
One of the
simplest type of distributed sort is radix sort or bucket sort. It involves
examining of the least significant digit of the keys first, and the same is
assigned to as bucket uniquely dependent on the value of the digit.
·
After the items
have been distributed into buckets, they are merged in order and then the
process is repeated until no more digits are left. A number with a base
requires p buckets. Although this sorting is faster but suffers from serious
disadvantages that are as follows:
1.
It involves two processes, a separation and a merge.
2. It requires a lot of extra storage for the
buckets.
·
This can be
overcome chaining records within a logical record rather than predefining
maximum size to bucket. The process can be clear from the example given below.
ADDRESSB
CALCULATION SORT
0 Comments