Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm
Data Types and Memory Allocation
1. Integer Data Types
2. Allocating Memory for Integer Variables
3. Data Organization: DB, DW, and EQU
4. Endianness: Byte Ordering in Computer Memory
5. Little Endian Example
6. Big and Little Endian
7. Data Allocation Directives
8. Abbreviated Data Allocation Directives
9. Multi-byte Definitions
10. Symbol Table
11. Correspondence to C Data Types
12. Data Allocation Directives, Cont.
13. Size of an Integer
14. Integer Formats
15. Data Allocation Directives for Uninitialized Data
16. Working with Simple Variables, PTR operator
17. Copying Data Values
18. The MOV Instruction
19. More MOV Instruction Types
20. XCHG Instruction, Exchanging Integers
21. The XCHG Examples
22. Memory-to-memory exchange
23. BSWAP Instruction Swap Bytes
24. Extending Signed and Unsigned Integers
25. Sign Extending Signed Value
26. Sign Extending Unsigned Value
27. Sign Extending with MOVSX and MOVZX
28. The XLATB Instruction
1. Integer Data Types
In this section:
data allocation
data types and sizes
pointers to objects in memory
MOV instruction, copying data
sign-extending integers
2. Allocating Memory for Integer Variables
Intel x86 CPU performs operations on different sizes of data.
An integer is a whole number with no fractional part.
In assembler, the variables are created by data allocation directives.
Assembler declaration of integer variable assigns a label to a memory space allocated for the integer.
The variable name becomes a label for the memory space. For example,
MyVar
db
77h ; byte-sized variable called MyVar initialised to 77h
where
MyVar is variable name
1 of 11
24/08/2015 10:26
Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
db is directive for byte-sized memory allocation
77h is initializer specifying initial value.
3. Data Organization: DB, DW, and EQU
Representing data types in assembly source files
requires appropriate assembler directives.
The directives allocate data and format x86 littleendian values.
Bytes are allocated by define bytes DB.
Words are allocated by define words DW.
Both allow more than one byte or word to be
allocated.
Question marks specify uninitialized data.
Strings allocate multiple bytes.
Labels in front of the directives remember offsets
from the beginning of the segment which
accommodates the directive.
DUP allows to allocate multiple bytes. The following
two lines produce identical results:
DB ?, ?, ?, ?, ?
DB 5 DUP(?)
Note that EQU directive does not allocate any
memory: it creates a constant value to be used by
Assembler:
CR EQU 13
DB CR
.
mov al, CR
4. Endianness: Byte Ordering in Computer Memory
Consider a small program, little_endian.asm .
Assembler fragment of little_endian.lst listing file shows generated data and code:
00000000
00000000
00000002
00000004
00000008
00000000
00000000
00000000
00000005
0000000A
.DATA
EE FF
1234
56789ABC
00000000
byte0
word2
var4
var8
BYTE
WORD
DWORD
DWORD
0EEh, 0FFh
1234h
56789ABCh
0
.CODE
_start:
B8 00000002 R
A3 00000008 R
C3
mov eax, OFFSET word2
mov [var8], eax
ret
; Exit program
DUMPBIN output for this program yields:
C:\>DUMPBIN /DISASM little_endian.exe
Dump of file E:\little_endian.exe
File Type: EXECUTABLE IMAGE
__start:
00301000: B8 02 40 30 00
mov
2 of 11
eax,304002h
24/08/2015 10:26
Data Types and Memory Allocation
00301005: A3 08 40 30 00
0030100A: C3
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
mov
ret
dword ptr ds:[00304008h],eax
Did you notice something strange about highlighted opcodes?
The byte sequence that belongs to the 32-bit displacement seems out of order: instead of expected
00 30 40 08
we see a reversed sequence,
08 40 30 00.
5. Little Endian Example
Step-by step execution of little_endian.asm program in OllyDbg debugger view looks like this:
At the beginning:
At the end:
The byte sequence of 304002h was reversed when the value was stored in memory.
Note that command switch /base:0x300000 was used to change the base address of the executable image:
LINK /base:0x300000 /debug /subsystem:console /entry:_start /out:little_endian.exe little_endian.obj
6. Big and Little Endian
Different processors store multibyte integers in different orders in
memory.
Byte sequence order
Data
type
There are two popular methods of storing integers: big endian and
little indian.
Big endian method is the most natural:
the biggest (i.e. most significant) byte is stored first, then the
next biggest, etc.
IBM mainframes, most RISC processors and Motorola processors all
use this big endian method.
However, Intel-based processors use the little endian method, in
which the least significant byte is stored first.
WORD
DWORD
Value(*)
Big endian
Little
endian
1234
12 34
34 12
47D5A8
00 47 d5 a8
a8 d5 47 00
DWORD 56789ABC 56 78 9a bc bc 9a 78 56
(*)
All values shown in base 16.
Normally, the programmer does not need to worry about which format
is used, unless
1. Binary data is transfered between different computers e.g. over
a network.
All TCP/IP headers store integers in big endian format
3 of 11
24/08/2015 10:26
Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
(called network byte order.)
2. Binary data is written out to memory as a multibyte integer and
then read back as individual bytes or vise versa.
Endianness does not apply to the order of array elements.
See also: wikipedia article about endianness .
Big Endian:
Little Endian:
7. Data Allocation Directives
Five define directives allocate memory space for initialized data:
DB Define Byte, allocates 1 byte
DW Define Word, allocates 2 bytes
DD Define Doubleword, allocates 4 bytes
DQ Define Quadword, allocates 8 bytes
DT Define Ten bytes, allocates 10 bytes
Examples:
sorted
value
Total
float1
DB
DW
DD
DD
'y'
25159
542803535
1.234
8. Abbreviated Data Allocation Directives
Multiple definitions can be abbreviated.
For example,
message
DB
DB
DB
DB
DB
'B'
'y'
'e'
0DH
0AH
can be written as
message
DB
'B', 'y', 'e', 0DH, 0AH
and even more compactly as
4 of 11
24/08/2015 10:26
Data Types and Memory Allocation
message
DB
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
'Bye', 0DH, 0AH
9. Multi-byte Definitions
Multiple definitions can be cumbersome to initialize data structures such as arrays
For example, to declare and initialize an integer array of 8 elements
values
DW
0, 0, 0, 0, 0, 0, 0, 0
What if we want to declare and initialize to zero an array of a lot more elements?
Assembler provides a better way of doing this by DUP directive:
values DW 8 DUP (0)
10. Symbol Table
For multiple data directives Assembler builds a symbol table
Both offset (in bytes) and label refer to the allocated storage space in memory:
.DATA
value
sum
marks
message
char1
DW
DD
DW
DB
DB
0
0
10 DUP (?)
'The grade is:',0
?
;
;
;
;
;
;
;
;
label
name
-------value
sum
marks
message
char1
memory
offset
------0
2
6
26
40
11. Correspondence to C Data Types
Directive
---------DB
DW
DD
DQ
DT
C data type
--------------------char
int, unsigned int
float, long
double
internal intermediate float value
12. Data Allocation Directives, Cont.
Keyword
5 of 11
Description
BYTE, DB (byte)
Allocates unsigned numbers from 0 to 255.
SBYTE (signed byte)
Allocates signed numbers from 128 to +127.
WORD, DW (word = 2 bytes)
Allocates unsigned numbers from 0 to 65,535 (64K).
SWORD (signed word)
Allocates signed numbers from 32,768 to +32,767.
DWORD, DD (doubleword = 4
bytes)
Allocates unsigned numbers from 0 to 4,294,967,295 (4 megabytes)
SDWORD (signed doubleword)
Allocates signed numbers from 2,147,483,648 to +2,147,483,647.
FWORD, DF (farword = 6 bytes)
Allocates 6-byte (48-bit) integers. These values are normally used only as pointer variables on the
80386/486 processors.
24/08/2015 10:26
Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
QWORD, DQ (quadword = 8
bytes)
Allocates 8-byte integers used with 8087-family coprocessor instructions.
TBYTE, DT (10 bytes)
Allocates 10-byte (80-bit) integers if the initializer has a radix specifying the base of the number.
13. Size of an Integer
Storing different data types in register:
Data Type
Bytes
BYTE, SBYTE
WORD, SWORD
DWORD, SDWORD 4
FWORD
QWORD
TBYTE
10
14. Integer Formats
The data types
SBYTE, SWORD, and
SDWORD tell the
assembler to treat the
initializers as signed
data.
It is important to use
these signed types with
high-level constructs
such as .IF, .WHILE,
and .REPEAT, and
with PROTO and
INVOKE directives.
For descriptions of
these directives, refer
to the sections
Loop-Generating
Directives
Declaring
Procedure
Prototypes
Calling
Procedures with
INVOKE
in MASM
Programmer's Guide.
6 of 11
24/08/2015 10:26
Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
15. Data Allocation Directives for Uninitialized Data
There are five reserve directives:
RESB Reserve a Byte, allocates 1 byte
RESW Reserve a Word, allocates 2 bytes
RESD Reserve a Doubleword, allocates 4 bytes
RESQ Reserve a Quadword, allocates 8 bytes
REST Reserve a Ten bytes, allocates 10 bytes
Examples:
response
buffer
Total
resb
resw
resd
1
100
1
16. Working with Simple Variables, PTR operator
CPU has instructions to copy, move, and sign-extend integer values.
These instructions require operands to be the same size.
However, we may need to operate on data with size other than that originally declared.
The PTR operator forces expression to be treated as the specified type:
.DATA
num DWORD
.CODE
mov
mov
ax, WORD PTR num[0] ; Load a word-size value from
dx, WORD PTR num[2] ; a doubleword variable
PTR operator re-casts the DWORD-sized memory location pointed by num[ index ] expression into a WORD-sized value.
17. Copying Data Values
The primary instructions for moving data from operand to operand and loading them into registers are
MOV (Move)
XCHG (Exchange)
CWD (Convert Word to Double)
CBW (Convert Byte to Word).
18. The MOV Instruction
MOV instruction is a copy instruction.
MOV copies the source operand to the destination operand without affecting the source.
; Immediate value moves
mov
ax, 7
mov
mem, 7
mov
mem[bx], 7
7 of 11
; Immediate to register
; Immediate to memory direct
; Immediate to memory indirect
24/08/2015 10:26
Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
; Register moves
mov
mem, ax
; Register to memory direct
mov
mem[bx], ax ; Register to memory indirect
mov
ax, bx
; Register to register
mov
ds, ax
; General register to segment register
; Direct memory moves
mov
ax, mem
mov
ds, mem
; Memory direct to register
; Memory to segment register
; Indirect memory moves
mov
ax, mem[bx] ; Memory indirect to register
mov
ds, mem[bx] ; Memory indirect to segment register
; Segment register moves
mov
mem, ds
; Segment register to memory
mov
mem[bx], ds ; Segment register to memory indirect
mov
ax, ds
; Segment register to general register
19. More MOV Instruction Types
The following example shows several common types of moves that require not one, but two instructions.
; Move immediate to segment register
mov
ax, DGROUP ; Load AX with immediate value
mov
ds, ax
; Copy AX to segment register
; Move memory to memory
mov
ax, mem1
mov
mem2, ax
; Load AX with memory value
; Copy AX to other memory
; Move segment register to segment register
mov
ax, ds
; Load AX with segment register
mov
es, ax
; Copy AX to segment register
20. XCHG Instruction, Exchanging Integers
The XCHG (exchange data) instruction exchanges the contents of two operands.
There are three variants:
XCHG reg, reg
XCHG reg, mem
XCHG mem, reg
You can exchange data between registers or between registers and memory, but not from memory to memory:
xchg
xchg
xchg
ax, bx
memory, ax
mem1, mem2
; Put AX in BX and BX in AX
; Put "memory" in AX and AX in "memory"
; Illegal, can't exchange memory locations!
The rules for operands in the XCHG instruction are the same as those for the MOV instruction...
...except that XCHG does not accept immediate operands.
21. The XCHG Examples
In array sorting applications, XCHG provides a simple way to exchange two array elements.
Few more examples using XCHG:
xchg
xchg
xchg
xchg
xchg
8 of 11
ax, bx ; exchange 16-bit regs
ah, al ; exchange 8-bit regs
eax, ebx ; exchange 32-bit regs
[response], cl ; exchange 8-bit mem op with CL
[total],
edx ; exchange 32-bit mem op with EDX
24/08/2015 10:26
Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
Without the XCHG instruction, we need a temporary register to exchange values if using only the MOV instruction.
22. Memory-to-memory exchange
To exchange two memory operands, use a register as a temporary container and combine MOV with XCHG. For example,
.DATA
val1
val2
WORD 1000h
WORD 2000h
.CODE
mov
xchg
mov
ax, [val1]
ax, [val2]
[val1], ax
; AX = 1000h
; AX = 2000h, val2 = 1000h
; val1 = 2000h
23. BSWAP Instruction Swap Bytes
The XCHG instruction is useful for conversion of 16-bit data between little endian and big endian forms.
xchg
al, ah
For example, the following XCHG converts the data in AX into the other endian form.
Pentium provides BSWAP instruction to do similar conversion on 32-bit data:
BSWAP 32-bit register
Note: BSWAP works only on data located in a 32-bit register.
BSWAP swaps bytes of its operand. For example,
bswap eax
Result of BSWAP EAX
24. Extending Signed and Unsigned Integers
Since moving data between registers of different sizes is illegal, you must sign-extend integers to convert signed data to a larger
size.
Sign-extending means copying the sign bit of the unextended operand to all bits of the operand's next larger size.
This widens the operand while maintaining its sign and value.
The four instructions presented below act only on the accumulator register (AL, AX, or EAX), as shown:
Instruction
Sign-extend
CBW (convert byte to word)
AL to AX
CWD (convert word to doubleword)
AX to DX:AX
CWDE (convert word to doubleword extended) AX to EAX
CDQ (convert doubleword to quadword)
9 of 11
EAX to EDX:EAX
24/08/2015 10:26
Data Types and Memory Allocation
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
25. Sign Extending Signed Value
Consider:
.DATA
mem8
mem16
mem32
.CODE
.
.
.
mov
cbw
mov
cwd
mov
cwde
mov
cdq
SBYTE
SWORD
SDWORD
-5
+5
-5
al, mem8
;
;
;
;
;
;
;
;
;
ax, mem16
ax, mem16
eax, mem32
Load 8-bit -5 (FBh)
Convert to 16-bit -5 (FFFBh) in AX
Load 16-bit +5
Convert to 32-bit +5 (0000:0005h) in DX:AX
Load 16-bit +5
Convert to 32-bit +5 (00000005h) in EAX
Load 32-bit -5 (FFFFFFFBh)
Convert to 64-bit -5
(FFFFFFFF:FFFFFFFBh) in EDX:EAX
Sign extending instructions efficiently convert unsigned values as well, provided the sign bit is zero.
This example, for instance, correctly widens mem16 whether you treat the variable as signed or unsigned.
The processor does not differentiate between signed and unsigned values.
For instance, the value of mem8 in the previous example is literally 251 (0FBh) to the processor.
It ignores the human convention of treating the highest bit as an indicator of sign.
The processor can ignore the distinction between signed and unsigned numbers because binary arithmetic works the same in either
case.
The programmer, not the processor, must keep track of which values are signed or unsigned, and treat them accordingly.
26. Sign Extending Unsigned Value
If sign extension was not what you had in mind, that is, if you need to extend the unsigned value, explicitly set the higher register to
zero:
.DATA
mem8
mem16
.CODE
.
.
.
mov
sub
BYTE
WORD
al, mem8
ah, ah
251
251
; Load 251 (FBh) from 8-bit memory
; Zero upper half (AH)
mov
sub
ax, mem16 ; Load 251 (FBh) from 16-bit memory
dx, dx
; Zero upper half (DX)
sub
mov
eax, eax ; Zero entire extended register (EAX)
ax, mem16 ; Load 251 (FBh) from 16-bit memory
27. Sign Extending with MOVSX and MOVZX
The 80386/486/Pentium processors provide instructions that move and extend a value to a larger data size in a single step.
MOVSX moves a signed value into a register and sign-extends it with 1.
MOVZX moves an unsigned value into a register and zero-extends it with zero.
10 of 11
24/08/2015 10:26
Data Types and Memory Allocation
mov
movsx
bx, 0C3EEh
ebx, bx
movzx
dx, bl
http://www.c-jump.com/CIS77/ASM/DataTypes/lecture.html
;
;
;
;
;
Sign bit of bl is now 1: BH == 1100 0011, BL == 1110 1110
Load signed 16-bit value into 32-bit register and sign-extend
EBX is now equal FFFFC3EEh
Load unsigned 8-bit value into 16-bit register and zero-extend
DX is now equal 00EEh
MOVSX and MOVZX instructions usually execute much faster than the equivalent CBW, CWD, CWDE, and CDQ.
28. The XLATB Instruction
Belongs to the family of x86 data transfer instructions.
XLATB translates bytes The format is XLATB
To use xlat instruction,
EBX should be loaded with the starting address of the translation table
AL must contain an index in to the table.
Index value starts at zero
The instruction
reads the byte at this index in the translation table, and
stores this value in AL.
The original index value in AL is lost
Translation table can have at most 256 entries (due to AL)
See also XLAT.ASM sample.
11 of 11
24/08/2015 10:26