Assembly and Machine Programming
#lecture note based on 15-213 Introduction to Computer Systems
H2 Abstraction Level Overview
H3 A bit of history
… (whatever)
Intel x86! - Complex Instuction Set Computer (CISC) - lots of instructions
The AMD’s x86-64 - 64 bit extension.
H3 Assembly programmer abstraction level
- Architecture aka ISA (instruction set architecture) - what one needs to understand to write assembly or machine code
- e.g. x86, Itanium, x86-64, ARM, RISC V
- Microarchitecture - how the architecture is implemented
- PC aka Program Counter - address of next instruction. In x86-64 it’s
RIP
- Register - frequently used data on the CPU
- Condition code - status from most recent arithmetic / logical operation
- Memory - array of bytes (see Memory Layout)
- Code
- Data
- Stack
- …
H2 Registers
H3 Register datatypes
- Integers - 1 / 2 (aka word) / 4 (aka double word) / 8 (aka quad word) bytes.
- Floating point - 4 / 8 / 10 bytes
- … maybe some other specialised types
H3 Common Registers
rip
always point to next instruction
In x86-64… these are 8-byte registers
rax
- always return registerrcx
rbx
rdx
rsi
rdi
rsp
- stack pointerrbp
- base pointer
For the above, replace r
with e
to get lowest 4 bytes in the register as a 4-byte int.
r8
r9
r10
r11
r12
r13
r14
r15
For the above, append d
to get lower 4 bytes
Some of these track back to IA32 registers, some even get there name from 16-bit registers.
full 4-byte | lower 16-bits | second lowest byte | lowest byte | origin (mostly obsolete) |
---|---|---|---|---|
eax | ax | ah | al | accumulate |
ecx | cx | ch | cl | counter |
edx | dx | dh | dl | data |
ebx | bx | bh | bl | base |
esi | si | source index | ||
edi | di | destination index | ||
esp | sp | stack pointer | ||
ebp | bp | base pointer |
bold still relevant!
For %r8
to %r15
, the naming for the lower positions follow this pattern
full | 32 | 16 | 8 |
---|---|---|---|
r8 | r8d | r8w | r8b |
Basically, each register has sub register kind of thing inside:
H3 Register reference, constants, and addresses in assembly
- immediate viz. constatn integer
$0x42
$-342
- register
%rax
%r15
- memory
(%rax)
- deference%rax
as a pointer to somewhere in memory
H3 Memory addressing format (Address Mode Expression)
Most general form: D (Rb, Ri S)
which corresponds to Mem[Reg[Rb] + S * Reg[Ri] + D]
D
is a constant displacementRb
is base registerRi
is index register (except for%rsp
)S
is scale (usually 1 | 2 | 4 | 8)
Things can be missing! Their base case value is whatever makes sense (the identity)
H2 Basic Assembly
H3 Instructions to know
Aside
- Each instruction in x86-64 can have 1 to 15 bytes
- Actual assembly programme may have funny
.somethign
stuff called directives
; Src := Source
; Dst := Destination
movq Src, Dst ; copy 4 bytes from Src to Dst
leaq Src, Dst ; compute Src as address mode expression and put in Dst
addq Src, Dst ; Dst = Dst + Src
subq Src, Dst ; Dst = Dst - Src
imulq Src, Dst ; Dst = Dst * Src
salq Src, Dst ; Dst = Dst << Src
sarq Src, Dst ; Dst = Dst >> Src, arithmetic
shrq Src, Dst ; Dst = Dst >> Src, logical
xorq Src, Dst ; Dst = Dst ^ Src
andq ; simile
orq ; simile
incq Dst ; Dst ++
decq Dst ; Dst --
negq Dst ; Dst = -Dst
notq Dst ; Dst = ~Dst
ret ; return
H3 Idea of word & Data length
It originated from 16-bit architecture where 16-bit is called a word.
- word - 16-bit
- double words - 32-bit
- quad words - 64-bit
Notice that there are often suffix in assembly that indicates size
intel data type | asm suffix | size in byte |
---|---|---|
byte | b | 1 |
word | w | 2 |
double word | l | 4 |
quad word | q | 8 |
single precision | s | 4 |
double precision | l | 8 |
H3 Instruction for different data size
movz S, R ; move with zero extension viz. R = ZeroExtended(S)
movzbw ; byte to word
movzbl ; byte to double word
movzwl ; word to double word
movzbq ; byte to quad word
movzwq ; word to quad word
H3 Control Flow
Things are done with GOTO (which are jumps) conditioned on some flags.
Aside
Using
Test ? Then : Else
doesn’t get compiled to run onlyThen
xorElse
based onTest
. Instead bothThen
andElse
get computed. Useif else
orGOTO
allows computing single thing.
H4 Flags
Remember that “condition codes” in the CPU diagram? Those get updated by operations and can be used to condition jumps.
Condition codes and how they get updated by an operation instruction Src, Dst
that corresponds to t = f(a, b)
.
Note the processor sets stuff without knowing signed or unsigned, programmer needs to choose the flag depending on context.
CF
- Carry flag (for unsigned)- Set if carrry from most significant (think unsigned add)
ZF
- Zero flag- Set if
t == 0
- Set if
SF
- Sign flag (for signed)- Set if
t < 0
viz. left-most bit is1
- Set if
OF
- Overflow flag (for signed)- Set if signed overflow, viz `(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)
Exception
lea doesn’t set anything!!
Some instructions to set flags:
; calculates b - a and sets flags, (b - a result discarded)
; similar to `sub a, b`, but does't change b
cmp a, b
; computes b & a only to set flags (only SF and ZF) (also result discarded)
; similar to `and a, b`, but does't change b
test a, b
H4 Jumps
There are many jump instructions, each one depending on condition.
They tend to be in the form of jX
, where X
could be the following. 1
Name A.k.a. Jump if... After CMP...
JMP Always
JS Negative (SF=1)
JNS Not negative (SF=0)
JO Signed overflow (OF=1)
JNO No signed overflow (OF=0)
JE JZ Zero (ZF=1) Equal
JNE JNZ Not zero (ZF=0) Not equal
JB JC, JNAE Unsigned overflow (CF=1) Unsigned below
JAE JNC, JNB No unsigned overflow (CF=0) Unsigned above or equal
JA JNBE CF=0 and ZF=0 Unsigned above
JBE JNA CF=1 or ZF=1 Unsigned below or equal
JL JNGE SF!= OF Signed less
JGE JNL SF= OF Signed greater or equal
JG JNLE ZF=0 and SF=OF Signed greater
JLE JNG ZF =1 or SF!= OF Signed less or equal
$$
\begin{array}{llll}
\hline \text { Name } & \text { A.k.a. } & \text { Jump if… } & \text { After CMP… } \
\hline \text { JMP } & & \text { Always } & \
\hline \text { JS } & & \text { Negative }(\mathrm{SF}=1) & \
\text { JNS } & & \text { Not negative }(\mathrm{SF}=0) & \
\text { JO } & & \text { Signed overflow }(\mathrm{OF}=1) & \
\text { JNO } & & \text { No signed overflow }(\mathrm{OF}=0) & \
\hline \text { JE } & \text { JZ } & \text { Zero }(\mathrm{ZF}=1) & \text { Equal } \
\text { JNE } & \text { JNZ } & \text { Not zero }(\mathrm{ZF}=0) & \text { Not equal } \
\text { JB } & \text { JC, JNAE } & \text { Unsigned overflow }(\mathrm{CF}=1) & \text { Unsigned below } \
\text { JAE } & \text { JNC, JNB } & \text { No unsigned overflow }(\mathrm{CF}=0) & \text { Unsigned above or equal } \
\hline \text { JA } & \text { JNBE } & \mathrm{CF}=0 \text { and } \mathrm{ZF}=0 & \text { Unsigned above } \
\text { JBE } & \text { JNA } & \mathrm{CF}=1 \text { or } \mathrm{ZF}=1 & \text { Unsigned below or equal } \
\hline \text { JL } & \text { JNGE } & \mathrm{SF} \neq \text { OF } & \text { Signed less } \
\text { JGE } & \text { JNL } & \mathrm{SF}=\text { OF } & \text { Signed greater or equal } \
\text { JG } & \text { JNLE } & \mathrm{ZF}=0 \text { and } \mathrm{SF}=\mathrm{OF} & \text { Signed greater } \
\text { JLE } & \text { JNG } & \text { ZF }=1 \text { or } \mathrm{SF} \neq \text { OF } & \text { Signed less or equal } \
\hline
\end{array}
$$
Example:
cmp b, a
jle 0xsomewhere ; if a ≤ b, jump to 0xsomewhere
conditional expression in C
Something like
val = Test(x) ? A(x) : B(x);
ends up computing both A and B, potentially leading to:
- unsafe behaviour
- bad performance
- side effect
Aside: do while loop in C
We can rewrite
do { Body } while (Condition);
into
loop: Body if (Condition) goto loop
H4 Conditional Set
There’s also a setX
instruction with the same options of X
as jump. A set instruction sets the lowest byte of a destination to 0
or 1
based on condition.
See the table for jump for available suffixes.
Example:
cmp b, a
sle %al ; set %al (lowest byte of %rax) to 1
movzbl %al, %eax ; make rest of %rax 0
H4 Switch Statements
Compiler can do different things depending on what the cases are. Closeby cases (like case 1 ... case 2 ... case 3
) may get a jump table.
A jump table is like an array of targets (8 byte pointers) that point to different code blocks for the different cases.
Assembly examples:
; direct jump
jmp .Target
; indirect jump by table starting at .Table
jmp *.Table(, %rdi, 8)
H2 Machine Procedure
This is when we have some sequence of instructions that can get called multiple times (essentially some function)
There are a few things we need to do to make this work
- Passing control - Be able to go to the beginning of the procedure and go back where we were after return
- Passing data - pass arguments and get back return value
- Memory management - allocate when running procedure, deallocate when returning
These mechanisms are implemented using instructions. The exact implementation depends on designers. The design would be the Application Binary Interface (ABI)
H3 The x86-64 Stack
It looks like this ()
To push onto the stack, we can run some push
operation, which is equivalent to decrementing the stack pointer and writing something there.
pushq %rax
; does the same thing as
subq $8, %rsp
movq %rax, (%rsp)
For pop, we do the opposite: moving the value on top of the stack and putting it in some register.
H3 Calling Procedure
Pushing and popping from stack enables procedure call control flow. When doing a call
instruction, push the address of the next instruction, aka return address, onto the stack. When returning with the ret
instruction, go popping the stack allows us to find out where to go back to.
H3 Passing Data
It’s been decided that the first 6 arguments go in registers rdi, rsi, rdx, rcx, r8, r9
in that order, and the rest go on the stack. The return register is always rax
.
H3 Call Frames
Each call gets its frame. The frame is used for:
- local variables
- temporary space
- argument for next frame
H4 Caller Saved vs Callee Saved Registers
- caller-saved: caller needs to save before calling
rax
, for return valuerdi, rsi, rdx, rcx, r8, r9
for arguments (in that order)r10, r11
- callee-saved: callee needs to restore before returning
rbx, r12, r13, r14, r15
rbp
(may be use as frame pointer)rsp
special form, must be restored to point back at the stack top