Introduction to IBM PC
Assembly Language
By
Rafi Ibn Sultan
Lecturer
Department of CSE
Varendra University
Assembly Language
• Assembly language programs are translated into machine language
instructions by an assembler
• So, they must be written to conform to the assembler's specifications
• We use the Microsoft Macro Assembler (MASM)
• Assembly language code is generally not case sensitive, but we use
upper case
Statements
• Programs consist of statements, one per line
• Each statement is either
• An instruction (which the assembler translates into machine code)
• An assembler directive, which instructs the assembler to perform some specific
task
• Both instructions and directives have up to four fields:
name operation operand(s) comment
• An example of an instruction:
START: MOV CX, 5 ;initialize counter
• An example of an assembler directive:
MAIN PROC
Name Field
• The name field is used for:
• Instruction labels
• Procedure names
• Variable names etc.
• The assembler translates names into memory addresses
• Limitations of name field:
• Names can be from 1 to 31 characters long
• may consist of letters, digits, and the special characters: ? . @ _ $ %
• Embedded blanks are not allowed
• If a period is used, it must be the first character
• Names may not begin with a digit
• The assembler does not differentiate between upper and lower case in a name
Example of Illegal Names
Name Fields Reasons
• TWO WORDS • contains a blank
• 2abc • begins with a digit
• A45.28 • . not first character
• YOU&ME • contains an illegal character
Operation Field
• For an instruction, the operation field contains a symbolic operation
code (opcode)
• The assembler translates a symbolic opcode into a machine language
opcode
• Opcode symbols often describe the operation's function; for example,
MOV, ADD, SUB
• Pseudo-ops are not translated into machine code; rather, they simply tell
the assembler to do something
Operand Field
• For an instruction, the operand field specifies the data that are to
be acted on by the operation
• An instruction may have zero, one, or two operands:
• NOP ; no operands: does nothing
• INC AX ; one operand: adds 1 to the contents of AX
• ADD WORD1, 2 ; two operands; adds 2 to the contents of memory word
WORDl
• The first operand is the destination operand. It is the register or memory
location where the result is stored
• The second operand is the source operand. The source is usually not
modified by the instruction
Comment Field
• The comment field of a statement is used by the programmer to say
something about what the statement does
• A semicolon marks the beginning of this field, and the assembler
ignores anything typed after the semicolon
Program Data
• The processor operates only on binary data
• The assembler must translate all data representation into binary
numbers
• A decimal number is a string of decimal digits, ending with an
optional "D" or "d" (this is optional, any number is by default decimal)
• A hex number must begin with a decimal digit and end with the
letter "H" or "h"
Legal and Illegal Numbers for MASM
• 11011;decimal
• 11011B ;binary
• 64223;decimal
• -214569D ;decimal
• 1,224 ; illegal-contains a non-digit character
• 1B4Dh ;hex
• 1B4D ;illegal hex, number-doesn't end in "H“
• FFFFh ; illegal hex number-doesn't begin with
a decimal digit
• 0FFFFh ;hex
Characters
• Characters and character strings must be enclosed in single or double
quotes: for example, "A" or 'hello’
• Characters are translated into their ASCII codes by the assembler, so
there is no difference between using "A" and 41h (the ASCII code for
"A") in a program
Variables
• Variables play the same role in assembly language that they do in
high-level languages
• Each variable has a data type and is assigned a memory address by
the program
• The data-defining pseudo-ops and their meanings:
Pseudo-op Stands for
DB define byte
DW define word
DD define doubleword (two consecutive
words)
DQ define quadword (four consecutive
words)
DT define tenbytes (ten consecutive bytes)
Byte Variables
• The assembler directive that defines a byte variable takes the following form:
name DB initial_value
(Ex:) ALPHA DB 4
• A question mark (“?") used in place of an initial value sets aside an
uninitialized byte
(Ex: ) BYT DB ?
• The decimal range of initial values that can be specified is -128 to 127 if a
signed interpretation is being given, or 0 to 255 for an unsigned
interpretation
Word Variables
• The assembler directive for defining a word variable has the following
form:
WRD DW -2
• A question mark in place of an initial value means an uninitialized
word
• The decimal range of Initial values that can be specified is -32768 to
32767 for a signed interpretation, or 0 to 65535 for an unsigned
interpretation
Arrays
• To define a three-byte array called B_ARRAY, whose initial values are
10h, 20h, and 30h, we can write,
B_ARRAY DB 10H,20H,30H
• If the assembler assigns the offset address 0200h to B_ARRAY, then
memory would look like this:
Symbol Address Contents
B_ARRAY 200h 10h
B_ARRAY+1 201h 20h
B_ARRAY+2 202h 30h
• An array of words may be defined. For example
W_ARRAY DW 1000,40,29887,329
Symbol Address Contents
W_ARRAY 0300h 1000
W_ARRAY+2 0302h 40
W_ARRAY+4 0304h 29887
W_ARRAY+6 0306h 329
Character Strings
• An array of ASCII codes can be initialized with a string of characters
LETTERS DB ‘ABC’
(is equivalent to)
LETTERS DB 41h,42h,43h
• It is possible to combine characters and numbers in one definition:
MSG DB 'HELLO’, 0AH, 0DH,’$’
(is equivalent to)
MSG DB 48H,45H,4CH,4CH,4FH,0AH,0DH,24H
Named Constants
• To assign a name to a constant, we can use the EQU (equates)
pseudo-op:
LF EQU 0AH
• The name LF may now be used in place of 0Ah anywhere in the
program
• The symbol on the right of an EQU can also be a string
PROMPT EQU ‘TYPE YOUR NAME’
Now we could say:
MSG DB PROMPT
• No memory is allocated for EQU names
MOV and XCHG Instruction
• MOV (move) is used to transfer data between registers, between a register
and a memory location, or to move a number directly into a register or
memory location
• Example:
• MOV AX, WORDl
• MOV AH, 'A’
• MOV AX, BX
• The XCHG (exchange) operation is used to exchange the contents of two
registers. or a register and a memory location
• Example:
• XCHG AX, WORDl
• XCHG AX, BX
Restrictions on MOV and XCHG
• For technical restrictions, there are a few restrictions on the use of
MOV and XCHG
• Note that a MOV or XCHG between memory locations is not allowed
• We can get around this restriction by using a register:
MOV AX, WORD2
MOV WORDl, AX
ADD, SUB, INC, DEC and NEG
Instructions
• The ADD and SUB instructions are used to add or subtract the contents of
two registers, a register and a memory location, or to add (subtract) a
number to (from) a register or memory location
• ADD WORD1, AX
• SUB AX, DX
• These instructions are not allowed between memory locations
• INC (increment) Is used to add 1 to the contents of a register or memory
location and DEC (decrement) subtracts 1 from a register or memory
location
• INC WORDl
• DEC BYTEl
• NEG is used to negate the contents of the destination
• NEG BX
Type Agreement of Operands
• The operands of the preceding two-operand instruction must be of
the same type, both bytes or words. Thus: if BYTE1 is a byte type
variable then:
• MOV AX, BYTE1 ; illegal
Translation of High Level Language
to Assembly Language
High Level Language Assembly Language Assembly Language
(Another Way)
A=5-A MOV AX,5 ;put 5 in NEG A ;A = -A
;AX ADD A, 5 ;A = 5 - A
SUB AX,A ;AX
contains :, - A
MOV A,AX ;put it in
A
• Exercise: Convert this statement: A =B - 2 x A into assembly language
Memory Models
• The size of code and data a program can have Is determined by specifying a
memory model using the .MODEL directive
• Syntax: .MODEL memory_model
• Unless there is a lot of code or data, the appropriate model is SMALL
• Different Memory
Models:
Data Segment
• A program's data segment contains all the variable definitions
• Constant definitions are often made here as well, but they may be
placed elsewhere in the program since no memory allocation is involved
• To declare a data segment, we use the directive. .DATA, followed by
variable and constant declarations
.DATA
WORD1 DW 2
W0RD2 DW 5
MSG DB 'THIS IS A MESSAGE’
MASK EQU 100100105B
Stack Segment
• The purpose of the stack segment declaration is to set aside a block of
memory (the stack area) to store the stack
• The stack area should be big enough to contain the stack at its
maximum size
• Syntax: .STACK size
where size is an optional number that specifies the stack area size In bytes
• .STACK 100H
sets aside l00h bytes for the stack area (a reasonable size for most
applications)
• If size is omitted, 1 KB is set aside for the stack area
Code Segment
• The code segment contains a program's instructions
• .CODE name
• where name is the optional name of the segment
• Inside a code segment, instructions are organized as procedures
• Here is an example of a code segment definition:
.CODE
MAIN PROC
;main procedure instructions
MAIN ENDP
;other procedures go here
END MAIN
• PROC and ENDP are pseudo-ops that delineate the procedure
• The last line- in the program should be the END directive, followed by name of the
main procedure
The INT Instruction
• To invoke a DOS or BIOS routine, the INT (interrupt) instruction is used:
INT interrupt_number
where interrupt_number is a number that specifies a routine
• INT 21H may be used to Invoke a large number of DOS functions
• A particular function is requested by placing a function number in the AH register
and invoking INT 21H
Function number Routine
1 single-key input
2 single-character output
9 character string output
Functions of INT 21H
• Function 1: (Single-Key Input)
• Input: AH =1
• Output: AL = ASCII code if character key is pressed
= 0 if non-character key is pressed
• Function 2: (Display a character or execute a control function)
• Input: AH =2
DL = ASCII code of the display character or control character
• Output: Printing the character in the console
• Function 9: (Display a string)
• Input: DX = Offset address of string. The string must end with a ‘$’ character
• The “$" marks the end of the string and is not displayed
A Basic Assembly Program:
When a program terminates,
it should return control to DOS.
This can be
accomplished by executing
INT 21h, function 4CH
Creating and Running a Program
1. Use a text editor or word processor to create a source program file
2. Use an assembler to create a machine language object file
3. Use the LINK program to link one or more object files to create a
run file
4. Execute the run file
Editor
create source program
.ASM
file
Assem assemble source
bler program
.OBJ
file
Linker link object program
.EXE
file
The LEA instruction
• For printing a string INT 21h, function 9, expects the offset address of
the character string to be in DX
• We use a new instruction:
• LEA destination, source
LEA DX, MSG
• LEA stands for "Load Effective Address.“
• It puts a copy of the source offset address into the destination
Why We Need to Initialize DS
• When a program is loaded, we need to initialize DS so that DS can contain
the Data Segment Number
• As DS does not contain the segment number of the data segment to correct
this, a program containing a data segment begins with these two
instructions:
MOV AX,@DATA
MOV DS,AX
• @Data is the name of the data segment defined by .DATA
• The assembler translates the name @DATA into a segment number
• Two instructions are needed because a number (the data segment number)
may not be moved directly into a segment register
Assembly Program for Printing a
String
EX: 4.12 is left for Lab Work