Lab 5 - Introduction to Assembly Language
Task: Conditional jumps
You will solve the exercises starting from the hello_world.asm
file located in the drills/tasks/conditional-jumps
directory.
Modify the program so that the message is displayed only if the content of the
eax
register is greater than that ofebx
. Also, modify the values of the registers to continue displaying the message"Hello, World!"
.Modify the program to also display
"Goodbye, World!"
at the end.Using jump instructions, modify the program to display
"Hello, World!"
N times, where N is given through theecx
register. Avoid infinite looping.
TIP: After successful completion, the program should display:
Hello, World!
Hello, World!
Hello, World!
Hello, World!
Hello, World!
Hello, World!
Goodbye, World!
If you're having difficulties solving this exercise, go through this reading material.
Task: Grumpy Jumps
You will solve the exercises starting from the grumpy_jumps.asm
file located in the drills/tasks/grumpy-jumps
directory.
Modify the values of the
eax
andebx
registers so that when the program is run, the messageWell done!
is displayed. Follow theTODO
comments.Why does the wrong message still appear? Modify the source so that the wrong message is not displayed anymore.
TIP: To determine the necessary values for the
eax
andebx
registers, we recommend using GDB.
If you're having difficulties solving this exercise, go through this reading material.
Task: Sets
You will solve the exercises starting from the sets.asm
file located in the drills/tasks/sets
directory.
You need to implement operations on sets that can contain elements between 0 and 31. An efficient way to do this (both in terms of space and speed) would be to represent sets so that a register represents a set. Each bit in the register represents an element in the set (if bit i is set, then the set contains element i).
TIP: For example: if
eax
contains the representation of the set{0,2,4}
, the register value would be2^0 + 2^2 + 2^4 = 1 + 4 + 16 = 21
. Educate yourself about the available instructions on the x86 architecture.
You have two defined sets. What values do they contain? Perform the union of the two sets.
Use the
or
instruction to add two new elements to the set.
TIP: Take advantage of the fact that the current sets, although they have "space" for 32 bits, only use 8 bits. If you
or
with a number greater than 255 (0xff
,2^8-1
) which has two active bits, you will effectively add two new elements to the set.
Perform the intersection of the two sets.
Determine the elements missing from the
eax
set for it to be complete.
TIP: You need to take the complement of the number using the
not
instruction.
Remove an element from the first set.
Find the difference between the sets.
NOTE: In order to display the answer, you can use the
PRINTF32
macro. For example:PRINTF32 `The union is: \x0`
PRINTF32 `%u\n\x0`, `EAX`
If you're having difficulties solving this exercise, go through this reading material.
Task: Min
You will solve this exercise starting from the min.asm
file located in the drills/tasks/min
directory.
Calculate the minimum of the numbers in 2 registers (eax
and ebx
) using a comparison instruction, a jump instruction, and the xchg
instruction.
If you're having difficulties solving this exercise, go through this reading material.
Task: Fibonacci
You will solve this exercise starting from the fibonacci.asm
file located in the drills/tasks/fibonacci
directory.
Calculate the Nth Fibonacci number, where N is given through the eax
register.
NOTE: The fibonacci series goes as follows :
0, 1, 1, 2, 3, ...
where each elementf[i] = f[i-2] + f[i-1]
, except for the first two elements.TIP: For example, if the value stored in
eax
is equal to5
, a correct solution will display3
and for7
, it will display8
.
If you're having difficulties solving this exercise, go through this reading material.
Task: Carry Flag - Overflow Flag
You will solve this exercises starting from the of.asm
, cf.asm
and cf_of.asm
files located in the drills/tasks/cf-of
directory.
Using the add
instruction on the al
register:
Set the
OF
flagSet the
CF
flagSet both flags simultaneously.
If you're having difficulties solving this exercise, go through this reading material.
Introduction
Before we actually start learning to read code written in assembly language, and then write our first programs, we need to answer a few questions.
What is the Assembly Language?
As you probably know, the basic role of a computer - specifically, of the processor - is to read, interpret, and execute instructions. These instructions are encoded in machine code.
An example would be:
1011000000001100011001100011000111011111111111100100
This sequence of bits doesn't tell us much in particular. We can convert it to hexadecimal to compress it and group it better.
\xB0\x0C\x66\x31\xD2\xFF\xE4
Furthermore, for many of us, this sequence still doesn't mean anything. Hence the need for a more understandable and usable language.
Assembly language allows us to write text programs which will then be translated, through an utility called an assembler, specific to each architecture, into machine code. Most assembly languages provide a direct correspondence between instructions. For example:
mov al, 12 <-> '\xB0\x0C'
xor dx, dx <-> '\x67\x31\xD2'
jmp esp <-> '\xFF\xE4'
NOTE: Because assembly language depends on architecture, it is generally not portable. Therefore, processor manufacturers have tried to keep the instructions unchanged from one generation to another, so that even when adding new processors to the line-up, they would maintain compatibility within the same processor family (for example, Intel processors 80286, 80386, 80486 etc. are all part of the generic Intel x86).
Why Learn Assembly Language?
Besides the very high didactic value, in which you understand what "stack overflow" consists of, data representation, and what is specific to the processor you are working with, there are a few applications where knowledge of assembly language and, implicitly, architecture are necessary or even critical.
Debugging
It's quite likely that at least one of the programs you've written in the past generated the following result:
Segmentation fault
Sometimes, you will encounter a series of data similar to the following:
Page Fault cr2=10000000 at eip e75; flags=6
eax=00000030 ebx=00000000 ecx=0000000c edx=00000000
esi=0001a44a edi=00000000 ebp=00000000 esp=00002672
cs=18 ds=38 es=af fs=0 gs=0 ss=20 error=0002
For someone who knows assembly language, it's relatively easy to begin troubleshooting using a debugger like GDB or OllyDbg, because the message provides almost all the information they need.
Code Optimization
Think about how you would write a C program to perform AES encryption and decryption. Then, inform the compiler that you want to optimize your code. Evaluate the performance of that code (size, execution time, number of jump instructions, etc.). Although compilers are often labeled as "black magic", there are situations where you simply know something about the processor you're working with better than they do.
Furthermore, just understanding assembly code is enough to evaluate a code and optimize its critical sections. Even if you don't write code in assembly language, you'll be aware of the code generated from the C instructions you use.
Reverse Engineering
A large portion of common applications are closed-source. All you have when it comes to these applications is a pre-compiled binary file. Some of these may contain malicious code, in which case they need to be analyzed in a controlled environment (malware analysis/research).
Embedded and Others
There are cases where constraints on code and/or data size are imposed, such as specialized devices for a single task, with little memory. This category includes drivers for devices.
Fun
For more details, discuss with your laboratory assistant to share his personal experience with assembly language and practical use cases.
x86 Family
Almost all major processors from Intel share a common ISA (Instruction Set Architecture). These processors are highly backward compatible, with most instructions unchanged over generations, but only added or extended.
NOTE: An ISA defines the instructions supported by a processor, register size, addressing modes, data types, instruction format, interrupts, and memory organization. Processors in this family fall into the broad category of CISC (Complex Instruction Set Computers). The philosophy behind them is to have a large number of instructions, with variable length, capable of performing complex operations, over multiple clock cycles.
Registers
The basic working units for x86 processors are registers. These are a suite of locations within the processor through which it interacts with memory, I/O, etc.
x86 processors have 8 such 32-bit registers. Although any of these can be used in operations, for historical reasons, each register has a specific role.
Name | Role |
---|---|
eax | accumulator; system calls, I/O, arithmetic |
ebx | base register; used for memory-based addressing |
ecx | counter in loop instructions |
edx | data register, used for I/O, arithmetic, interrupt values; can extend eax to 64 bits |
esi | source in string operations |
edi | destination in string operations |
ebp | base or frame pointer; points to the current stack frame |
esp | stack pointer; points to the top of the stack |
In addition to these, there are some special registers that cannot be directly accessed by the programmer, such as eflags
and eip
(Instruction Pointer).
eip
is a register that holds the address of the current instruction to be executed.
It cannot be directly modified, programmatically, but indirectly through jump, call, and ret instructions.
The eflags
register contains 32
bits used as status indicators or condition variables.
We say that a flag is set if its value is 1
. The ones commonly used by programmers are:
Name | Expanded Name | Description |
---|---|---|
CF | Carry Flag | Set if the result exceeds the maximum (or minimum) unsigned integer value |
PF | Parity Flag | Set if the low byte of the result contains an even number of 1 bits |
AF | Auxiliary Carry Flag | Used in BCD arithmetic; set if bit 3 generates a carry or borrow |
ZF | Zero Flag | Set if the result of the previous instruction is 0 |
SF | Sign Flag | Has the same value as the sign bit of the result (1 negative, 0 positive) |
OF | Overflow Flag | Set if the result exceeds the maximum (or minimum) signed integer value |
NOTE: If you follow the evolution of registers from 8086, you'll see that initially they were named
ax
,bx
,cx
etc., and were 16 bits in size. From 80386, Intel extended these registers to 32 bits (i.e., "extended"ax
→eax
).
Instruction Classes
Although the current set of instructions for Intel processors has hundreds of instructions, we will only look at a small portion of them. More precisely, some of the 80386 instructions.
All x86 processors instructions can fit into 3 categories :
- data movement instructions
- arithmetical/logical instructions
- program control instructions
We will only display some of the more important instructions of each category since many of them are alike.
Data Movement Instructions
These instructions are used to transfer data between registers, between memory and registers, and to initialize data:
Name | Operands | Description |
---|---|---|
mov | dst, src | Moves the value from source to the destination(erasing what was in the destination before) |
push | src | Moves the value from the source onto the "top" of the stack |
pop | dst | Moves the value from the "top" of the stack into the destination |
lea | dst, src | Loads the effective address of the source to the destination |
xchg | dst, src | Swaps (Exchanges) the values between the source and the destination |
Arithmetic and Logic Instructions
These instructions perform arithmetic and logic operations:
Name | Operands | Description |
---|---|---|
add | dst, src | Adds the source and the destination, storing the result in the destination |
sub | dst, src | Subtracts the source from the destination, storing the result in the destination |
and | dst, src | Calculates logical AND between the source and the destination, storing the result in the destination |
or | dst, src | Calculates logical OR between the source and the destination, storing the result in the destination |
xor | dst, src | Calculates logical XOR between the source and the destination, storing the result in the destination |
test | dst, src | Calculates logical AND between the source and the destination without storing the result |
shl | dst, <const> | Calculates the logical shifted value from the destination with a constant number of positions, storing the result in the destination |
Program Control Instructions
These instructions are used to control the flow of programs:
Name | Operands | Description |
---|---|---|
jmp | <address> | Jumps unconditionally to the specified address(directly, by register, by labels) |
cmp | dst, src | Compares the source with the destination(more details below) |
jcond | <address> | Jumps conditionally to the specified address depending on the state of the flag(set/not set)/condition variable |
call | <address> | Calls the subroutine located at the specified address |
NOTE: The 'cmp dest, src' instruction effectively calculates
dest - src
behind the scenes(as in subtracting the source from the destination). We are talking about an unsigned subtraction, without storing the result.
Therefore, when talking about the code:
cmp eax, 0
jl negative
The jump to the negative
label will be made only if the value in eax is less than 0
.
The eax - 0
subtraction is evaluated and if the result is negative(and so, eax is negative), the jump will be made.\
When have comparisons with 0
, the same thing can be done more efficiently using the test
instruction:
test eax, eax
jl negative
Guide: First look at Assembly instructions
To follow this guide, you will need to use the instructions.asm
file located in the guides/instructions/support
directory.
Diving right into the demo, we can see one of the most important instructions that helps us, programmers, work with the stack and that is push
.
We discussed what the push
instruction does in the reading section.
Considering its call, we can understand that it takes the 0
value(as a DWORD
, a number stored on 4
bytes) and moves it onto the "top" of the stack.
That push
is followed by a new instruction:
popf
IMPORTANT: The
popf
instruction is used for setting, depending on how many bytes we pop from the stack(in our case, 4 bytes), theEFLAGS
register(setting the entire register when popping 4 bytes and only the 2 lower bytes of the register when popping 2 bytes). You can read more about thepopf
instruction here and here.
Having in mind what the popf
instruction does, try to guess what would adding the following line of code at line 15 and the mystery_label
label at the line(of the current file, before adding the instruction) 53 would make the program do.
jnc mystery_label
Moving on, we can see that the 0
value is set to the eax
register using the mov
instruction.
Can you give example of another two ways of setting the value in eax
to 0
without using mov
?
HINT: Think about the logical operators.
Next, by using the test
instruction we can set the flags
based on the output of the logical and
between eax
and itself.
After resetting the flags, we store 0xffffffff
in the ebx
register(which is actually the largest number it can store before setting the carry flag) and then use the test
instruction yet again.
Similarly, what do you think adding the following line of code after the test
instruction would produce ?
jnz mystery_label
We reset the flags once again and now we take a look at working with the smaller portions of the eax
register.
Can you guess the output of the following command, put right under the add al, bl
instruction ?
What about the flags ?
Which flag has been set ?
PRINTF32 `%d\n\x0`, eax
Similarly, try to answer the same questions from above, but considering the next portions of the code.
After thoroughly inspecting this example, you should have a vague idea about how setting the flags works.
Guide: Discovering Assembly
To follow this guide, you will need to navigate to the guides/discovering-assembly/support
directory.
Open the
ex1.asm
file and read the comments. Assemble it by using themake
utility and run it. Using gdb, go through the program line by line (thestart
command followed bynext
) and observe the changes in register values after executing themov
andadd
instructions. Ignore the sequence ofPRINTF32
instructions.Open the
ex2.asm
file and read the comments. Assemble it by using themake
utility and run it. Using gdb, observe the change in theeip
register when executing thejmp
instruction. To skip thePRINTF32
instructions, add a breakpoint at thejump_incoming
label (thebreak
command followed byrun
).Open the
ex3.asm
file and read the comments. Assemble it by using themake
utility and run it. Using gdb, navigate through the program using breakpoints. Follow the program flow. Why is15
displayed first and then3
? Because of the jump at line 9. Where does the jump at line 25 point to? To thezone1
label.Open the
ex4.asm
file and read the comments. Assemble it by using themake
utility and run it. Using gdb, go through the program. Why isn't the jump at line 12 taken? Because theje
instruction jumps if theZF
bit in theFLAGS
register is set. This bit is set by thecmp
instruction, which calculates the difference between the values of theeax
andebx
registers without storing the result. However, theadd
instruction at line 11 clears this flag because the result of the operation is different from 0.