CSCE 230: Computer Organization
Lecture 1: Binary Numbers
-
a number with only two possible symbols, 0 or 1, a bit
-
we study binary numbers because computers use binary numbers
-
computers use binary numbers because digital circutits only support two states, on and off
-
we frequently speak of n-bit binary numbers, a word
-
a byte is a part of a word, frequently 8-bits, but historically has been 6
-
two types of binary numbers,
-
unsigned
-
positive numbers or 0
-
a 4 bit unsigned binary number has up to 16 possibilities
-
$b_3b_2b_1b_0$
-
$b_0$ is the lsb, $b_3$ is the msb
-
in decimal, it is $b_0 2^0 + b_1 2^1 + b_2 2^2 + b_3 2^3$
-
-
an $n$ bit unsigned binary number has up to $2^n - 1$ possibilities, $[0, 2^n - 1]$
-
Decimal conversion, use divide by two and remainders, first remainder is lsb
-
Addition of two binary numbers:
-
When sum is within the range $[0, 2^n - 1]$
-
start with the LSB, adding each bit, carrying as necessary
-
$1 + 1 = 10$ where $0$ is the one-bit sum and $1$ is the carry-out.
-
-
When sum is out of the range $[0, 2^n - 1]$ (overflow)
-
When possible, just add a bit.
-
You loose information.
-
-
To detect overflow, if the msb has a carry-out, there is overflow.
-
-
Subtraction of two numbers:
-
$a - b = a + (- b)$, becomes the addition of two signed binary numbers
-
-
-
signed
-
msb is the sign, 1 is negative, 0 is positive
-
In all, the unsigned representation of the positive numbers is the same
-
In signed magnitude and 1's complement, there is negative zero
-
signed magnitude
-
change the sign, that's it.
-
$[-(2^{n - 1} - 1), 2^{n - 1} - 1]$
-
-
1's complement
-
Complement each bit of the unsigned postive number (flip each bit)
-
$[-(2^{n - 1} - 1), 2^{n - 1} - 1]$
-
-
2's complement
-
Take the ones complement and add 1
-
this has slightly greater range
-
$[-(2^{n - 1}), 2^{n - 1} - 1]$
-
Addition:
-
Within range:
-
Start at the LSB, summing as before
-
-
Overflows:
-
overflow if MSB carry-over and MSB-1 carry-over are not the same
-
overflow if both have same sign, and the sign is different in the final result
-
-
-
-
-
Lecture 2: More Binary Numbers
-
Subtraction of signed binary numbers (2's complement)
-
$a - b = a + (- b)$
-
first negate b by flipping all bits, and adding 1
-
$a - b = a + b^{\prime} + 1$
-
add as normal, be careful of overflow
-
-
subtraction of unsigned binary numbers:
-
see above.
-
if the final overflow bit is different from the sign of both numbers, then there is overflow.
-
-
sign extension: Add the correct number of "1"s before the MSB or the correct number of "0"s to the same position, depending on the sign bit.
-
to extend an unsigned number, just and the correct number of "0"s before the MSB
-
Hex!
-
0-15, as 0-9, A-F
-
1 hex digit is 1 nybble (4 bits)
-
$1111110 = FE$
-
Recitation 1: Welcome
-
format
-
announcements
-
new material review
-
common mistakes
-
intro to lab
-
-
read and start lab prior to coming to recitation!
Binary Numbers
-
0b for binary numbers, nothing special for decimal, and 0x for Hex.
-
Place-value review.
-
binary is on $2^n$
-
remember that index is alwasy 0-based
Lab
-
THE HORROR THAT IS ASSEMBLY
Lecture 3: Assembly Language
-
assembly is very low level
-
made for a specific processor
-
-
high level languags include java, c, c++, python and lisps
-
can be executed on a variety of processors
-
-
In 230, we do assembly for NIOS2, and for the expiremental, we also deal with x86
-
there is an almost 1-to-1 relationship between an assembly language and an isa, conversion is supported by an assembler
-
processor is connected to memory
-
processor works on n-bits
-
memory is a sequnce of bits, organized as words.
-
we use byte addressing, assigning successive addresses to successive bytes.
-
n-bits = 32 bits
-
4 bytes per word.
-
bytes: 0, 1, 2, 3, 4, …
-
words: 0, 4, 8, 12, 16, …
-
-
little endian versus big endian
-
little-endian
-
first word is 3, 2, 1, 0
-
addresses start at the least significant bit
-
-
big-endian
-
first word is 0, 1, 2, 3
-
addresses start from the most significant bit
-
-
most archs use little endian, including NIOS and x86
-
TCP/IP and many older computers use big-endian
-
hton
-
ntoh
-
-
-
Word alignment:
-
a word address has an aligned address if the address is a multiple of the number of bytes in a word
-
for n=32: 0, 4, 8, 12, 16,…
-
unaligned for n=32: 1, 2, 3, 5, …
-
some archs don't support unaligned word addresses
-
-
Assembly Basics:
-
java/c:
-
int declarations
int r2 = 1; int r3 = r2; int r4 = r2 + r3;
-
-
Assembly:
.text .global _start _start: movi r2, 1 ; Sets Register 2 to 1 movi r3, r2 ; Set Register 3 to the contents of register 2 add r4, r2, r3 ; Set Register 4 to r2 + r3 .data .end ;; Registers 2-23 are general purpose ;; There are 32 registers, those that are not general purpose are special purpose ;; There are 6 control registers, used for I/O ;; PC is also a register, the program counter. This points to the address of the next instruction to be executed
-
A register stores one word of data
-
register namse are case sensitive in NIOS2 assembly, all others are case-insensitive
-
assembly follows the mnemonic operand* pattern
-
two types of instructions, regular instructions (directly implemented in hardware) and psuedo instructions (implemented by regular insturctions (used to be called macro instructions))
-
two types of instruction sets:
-
Reduced – RISC
-
small number of simple instructions
-
instructions can fit in a single word
-
ARM
-
OpenRISC
-
NIOS2
-
POWER
-
-
Complex – CISC
-
large number of both simple and complex instructions
-
instructions fit in one or more words
-
x86(_64)
-
VAX
-
-
-
NIOS2 Instructions
-
arithmetic
-
add
-
addi
-
sub
-
subi
-
mul
-
muli
-
div
-
divu
-
divi
-
diviu
-
-
logical
-
data copy
-
control transfer
-
unconditional branching
-
jmp
-
br
-
label must be ±32 kb from current instruction
-
-
-
conditional branching
-
blt, bltu
-
bgt, bgtu
-
ble, bleu
-
bge, bgeu
-
beq
-
bne
-
-
-
misc
-
shift
-
rotate
-
-
-
Remember to INF loop at the end of a program if you are unable to send a software signal to shut off
-
ex:
_start: movi r2, 1 movi r3, r2 movi r4, 0 bgtu r2, r3, _true _false: movi r4, r3n br _cont _true: movi r4, r2 _cont: nop br _cont
-
ex:
_start: movi r2, 1 movi r3, 10 movi r4, 0 _loop: add r4, r4, r2 add r2, r2, 1 bleu r2, r3, _loop _cont: nop br _cont
-
-
Lecture 4: Assembly Continued
-
types
-
arithmetic
-
branching
-
logical
-
and
-
and/r,r,r
-
andi/r,r,v
-
andhi/r,r,v
-
-
or
-
or
-
ori
-
orhi
-
-
xor
-
xor
-
xori
-
xorhi
-
-
-
data
-
move
-
mov
-
movi
-
movui
-
movio – move label address to register
-
-
load/store
-
assembler directives
-
.egu
-
.data
-
.byte
-
.hword
-
.word
-
.ascii
-
.stop
-
-
memory addresses
-
displacement addresses: V(rj), number(register) = v + rj
-
number is 16 bit signed
-
absolute mode v(r0)
-
register indirect 0(rj)
-
-
instructions
-
ex:
X: .byte 1
-
can be addressed using absolute mode X(r0)
-
address to register, and then register indirect
-
-
ex:
ARRAY: .byte 1,2,3,4
-
movi r2, 0; ARRAY(r2)
-
movi r2, 1; ARRAY(r2)
-
movia r2, ARRAY; 0(r2)
-
movia r2, ARRAY; 1(r2)
-
-
ldw
-
ldh
-
ldb
-
ldhu
-
ldbu
-
stw
-
sth
-
stb
-
-
-
-
subroutines
-
Lecture 5: More Assembly
-
Exam on February 16, clases split afterward
-
More addressing
-
addressing modes
-
immediate mode – addi
-
register mode – add
-
displacement mode – ldw r5, num(r6)
-
register indirect mode – ldw r5, (r6)
-
absolute mode – ldw r5, num(r0)
-
-
ex:
#define N 10 int num[N] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; int i = 0;
.egu N 10 start: movi r2, 0 loop: ldb r3, NUM(r2) ldw r4, SUM(r0) add r4, r4, r3 addi r2, r2, 1 stw r4, SUM(r0) blt r2, N, loop end: br end .data NUM: .byte 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 .skip 2 SUM: .word 0
.egu N 40 start: movi r2, 0 loop: ldw r3, NUM(r2) ldw r4, SUM(r0) add r4, r4, r3 addi r2, r2, 4 stw r4, SUM(r0) blt r2, N, loop end: br end .data NUM: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 SUM: .word 0
-
Actual Subroutines
-
instructions
-
callr
-
call
-
ret
-
-
On call, next address is put in the ra (return address) register, and then the subroutine is jumped to.
-
At return, the return-address is jumped back to.
-
ex (factorial):
start: movi r2, 6 call factorial end: br end factorial: movi r3, 1 movi r4, 1 loop: mul r3, r3, r4 addi r4, r4, 1 ble r4, r4, loop ret factorial2: movi r3, 1 mov r4, r2 loop2: mul r3, r3, r4 subi r4, r4, 1 bne r4, r0, loop2 ret
-
.stack, a lifo
-
r27 is the stack pointer, sp
-
add element
-
subi, sp, sp, 4
-
stw ri,(sp)
-
-
remove top element
-
ldw ri,(sp)
-
addi sp, sp, 4
-
-
to save registers in a subroutine
-
pass parameters to a subroutine
-
ex:
movi r5, subi sp, sp, 4 stw r5, (sp) call factorial ldw r5, (sp) addi sp, sp, 4 end: br end factorial: ldw r3, 4(sp) movi r4, 1 loop: mul r4, r4, r3 subi r3, r3, 1 bne r3, 0, loop subi sp, sp, 4 stw r4, (sp) ret
-
-
-
Lecture 6: Hardware, and the background of logic circuits
-
a logic circuit implements a function of logic signals (logical variables) that take two possible values that take two possible values 0 and 1 corresponding to different voltage levels.
-
or looks like an arrowhead with a curved back, $f = x_1 + x_2$
-
and looks like a box with a leading half-circle, $f = x_1 \cdot x_2$
-
not takes one, wedge with leading circle, $f = \overline{x}$
-
synthesis of logic functions:
-
truth table
-
sum-of-products form
-
minimize
-
-
Ex: Bit multiplexer
-
takes S, A, B, single output
-
S is control Signal
-
data are inputs a and b, each one bit
-
if S = 0, a, else, B
-
truth table
s a b out (f(s, a, b)) term 0 0 0 0 0 0 1 0 0 1 0 1 $\overline{s} \cdot a \cdot \overline{b}$ 0 1 1 1 $\overline{s} \cdot a \cdot b$ 1 0 0 0 1 0 1 1 $s \cdot \overline{a} \cdot b$ 1 1 0 0 1 1 1 1 $s \cdot a \cdot b$ -
Product Form terms
-
$f(s, a, b) = \overline{s} \cdot a \cdot \overline{b} + \overline{s} \cdot a \cdot b + s \cdot \overline{a} \cdot b + s \cdot a \cdot b$
-
not and or
-
-
$f = (\overline{S}a) + (Sb)$
-
minimize, using logic rules:
-
Commutative: $w + y = y + w$, $wy = yw$
-
Distributive: $wy + wz = w(y + z)$, $w + yz = (w + y)(w + z)$
-
Associative: $(w + y) + z = w + (y + z)$, $(wy)z = w(yz)$
-
Idempotent: $w + w = w$, $ww = w$
-
Involution: $\overline{\overline{w}} = w$
-
Complement: $w + \overline{w} = 1$, $w\overline{w} = 0$
-
DeMorgan's: $\overline{w + y} = \overline{w}\cdot\overline{y}$, $\overline{wy} = \overline{w} + \overline{y}$
-
-
-
Ex:
-
Truth table:
x_1 x_2 x_3 f term 0 0 0 1 $\overline{x_1}\overline{x_2}\overline{x_3}$ 0 0 1 1 $\overline{x_1}\overline{x_2}x_3$ 0 1 0 1 $\overline{x_1}x_2\overline{x_3}$ 0 1 1 0 1 0 0 1 $x_1\overline{x_2}\overline{x_3}$ 1 0 1 1 $x_1\overline{x_2}x_3$ 1 1 0 0 1 1 1 0 -
Product of Terms:
-
$f = \overline{x_1}\overline{x_2}\overline{x_3} + \overline{x_1}\overline{x_2}x_3 + \overline{x_1}x_2\overline{x_3} + x_1\overline{x_2}\overline{x_3} + x_1\overline{x_2}x_3$
-
-
Minimized:
-
$f = \overline{x_1}\overline{x_2} + \overline{x_1}x_2\overline{x_3} + x_1\overline{x_2}$
-
$f = \overline{x_2} + \overline{x_1}x_2\overline{x_3}$
-
$f = \overline{x_2} + \overline{x_1}\cdot\overline{x_3}$
-
-
-
Other Logical Gates:
-
xor, or with a line behind it
-
nand, and with a not on the output
-
nor, or with a not on the output
-
xnor, xor with a not on the output
-
-
Implementation:
-
propagation delay: a very small amount of time between input change and output change, $T_p$, non-zero.
-
PLA(Programable Logic Array): Appendix A.11.1
-
FPGA(Field Programmable Gate Array): A form of programmable logic device, the most powerful and flexible of the main programmable logic devices.
-
VHDL is the hardware description language used
-
Quartus2 is the software used to write and develop VHDL, can minimize functions automagically
-
Modelsim is modeling software to simulate logic circuits
-
-
Recitation 3: Subroutines and Logic Design
-
NIOS2 is little-endian
-
.text always starts assembly
-
.end is always at the end
-
.global _start is the main entry point
-
.data starts the data section
-
Make sure to mention TA's poor comunication skills
Lecture 7: VHDL
-
Damn bastardized child of Ada.
-
example: $f(s, a, b) = \overbar{s}a + sb$
library ieee; use ieee.std_logic_1164.all; entity mux2inputbit is part ( s, a, b : in std_logic; result: out std_logic ); end mux2inputbit; architecture implementation1 of mux2inputbit is begin result <= (not s and a) or (s and b); end implementation1;
-
use ieee and ieee.std_logic_1164.all for the std_logic type
-
entity defines an interface for a part, including the signals
-
architecture defines the implementation
-
case insensitive
-
double-dach comments
-
-
use quartus to:
-
compile
-
minimize
-
rtl view before minimize
-
technology map view after minimize
-
-
Datatypes
-
std_logic: 9 possible values, only use 2: 0 and 1; others are Z, X, W, L, H, -, U
-
std_logic_vector: an array of std_logic; (n downto m); V : out std_logic_vector(3 downto 0); V <= "1000"; V(2 downto 1) <= "10"; V(0) <= '1'
-
-
Operators
-
logic: and, or, not, xor, nand, nor, xnor
-
concatenation operator: &
-
Lecture 8: More VHDL
-
Combinations circuits, output is a function of the current inputs
-
multiplexor: see mux2inputb.vhd
-
2 data input, 1 bit wide.
-
control input
-
-
multiplexor: m data inputs ($m = 2^k$), $\lg m$ control inputs, n-bit-wide max
-
for 4 data inputs, 1 bit wide, we need 2 control inputs, see mux4input1bit.vhd
-
for 4 data inputs, 2 bit wide, we need 2 control inputs, see mux4input2bit.vhd
-
-
-
Sequential circuits: output depends on both input and the current state
-
SR Latch
-
see srlatch.vhd
-
-
Gated SR latch
-
see gatedsrlatch.vhd
-
-
D latch
-
see dlatch.vhd
-
-
D flip-flop
-
D latch triggered only on rising edge of the clock.
-
d latch who's output p is connected to another d latch whos clock is notted (triggered on falling edge)
-
to make above triggered on rising edge, not the clock before the first, and renot on second
-
-
Lecture 9: VHDL Sequential Statements
-
Normally al statements within an architecture are concurrent.
-
process is concurrent, however the statements within it are sequential.
-
select/vhdl conditionals are not allowed, use if or a case statement.
-
Lecture 10 (First Day Split): Machine-Level Programming 1: X86 Basics
-
History of processors and archs
-
C, assembly, machine code
-
assembly basics: registers, operands, move
-
intro to x86-64
History
-
totally dominates the computer market
-
evolutionary design
-
compatible back to the 8086 from 1978
-
Added more features as time goes on
-
-
Is CISC
-
many different instructions with many different formats
-
only a small subset encountered within GNU/Linux programs
-
-
Hard to match the performance of RISC
-
But Intel has done it in terms of speed, but less so for power
-
-
Milestones
-
8086, 1978
-
386, 1985
-
Pentium 4F, 2004
-
Core i7, 2008
-
-
64 bit means you can have more memory
-
Multi-core lets you run more at the same time (parallelism), thanks to Moore's law, we can have more cores on the same space
-
Introduced multimedia operations
-
AMD recruted DEC's designers and built Opteron and developed x86-64
-
Intel tried IA64, but ended up building EM64T, nearly identical to x86-64
C, Assembly, Machine Code
-
Architecture: the ISA. The parts of a processor design that one needs to understand to write assembly code, specification, registers
-
Microarchitecture: implementation of the architecture, chache sizes, core frequencies
-
ISAs include x86, Nios, MIPS, SPARC, Power
-
Generally have Program Counter, Registers, Condition Codes, in memory, you have object code, program data, os data, the stack and possibly a heap
Assembly Basics
-
C (gcc -S) to assembly (as) to object (ld) to executable
-
compile, assemble, link
-
remember that the stack is important, thus there are push and pop instructions
-
Integers are 1, 2 or 4 bytes, they are data values or untyped pointers (addresses)
-
floating point data are 4, 8 or 10 bytes
-
no aggregate like arrays or structures
-
arith functions
-
data transfer
-
control flow
-
variable length instructions are a PITA
-
Stacks have
%esp
for the stacp pointer -
follows a similar pattern as NIOS
-
control flow, call, return, various branching ops
-
program counter is
%eip
-
stacks are important for languages that support recursion.
-
code must be reentrant
-
you must have a place to store all of the relavent data
Lecture 11: More X86
-
only 6 registers in x86
-
caller saves – the caller saves the registers before calling
-
eax – ax, ah, al – accumulator
-
ecx – cx, ch, cl – counter
-
edx – dx, dh, dl – data
-
-
callee saves – the callee saves the registers before using
-
ebx – bx, bh, bl – base
-
esi – si – source index
-
edi – di – destination index
-
-
esp – sp – stack pointer
-
ebp – bp – base pointer
Moving data
-
movl source, dest
-
imm, reg/mem
-
reg, reg/mem
-
mem, reg
-
-
operand types
-
immediate – prefix with '$', integer data of 1, 2 or 4 bytes
-
register, one of the 8 registers
-
memory (4 consecutive bytes a address given by register),
(%eax)
or other address modes
-
-
Memory Adressing
-
Normal,
(reg)
, the register specifies the address -
Displacement
n(reg)
, reg + d is the address -
General
n(reg,in,scale)
reg + s*in + d is the address
-
Lecture 12: Still More X86
-
64 bit has registers r8 to r15, and their half-word variants
-
-l instructions hav -q instructions also for 8 byte words (quad words)
-
with the registers, we have enough to pass most arguments in registers rather than on the stack
-
leal, load effective address, source is mode address expression, set destination to address denoted by expression
-
arith operations take src, dest, where dest = dest op source
-
be careful with arith, can be done with atypical functions
-
condition codes are used to determine equality and the like, these are in cf, zf, sf, of registers
-
cd is unsigned overflow
-
zf if zero
-
sf if negative
-
of if signed overflow
-
can be set with cmpl/cmpq
-
cf if unsigned carry out
-
zf if a = b
-
sf if a - b < 0
-
of if signed carry out
-
-
Or with testl/testq
-
based on a & b
-
-
can be set with setX instructions
-
-
Jumping go based on tthe status of various flags, named in the X of jX
Lecture 13: C Programming, Part 1
-
using gcc
-
pointers are used for indirection
-
ampersand and a variable name give it's address
-
-
is used to deref a pointer
-
-
malloc is used to allocate memory from the heap, returns the address of the allocated chunk
-
takes an argument to specify the size to allocate
-
done at run time
-
Lecture 14: C Programming, Part 2
#include <stdio.h>
#include <stdlib.h>
#define SIZE 10
struct node {
int value;
struct node *next;
};
typedef struct node NODE;
void display_list(int, NODE *, int);
int main(void) {
NODE *head, *current, *prior;
int i;
for (i = 0 ; i < SIZE; i++) {
current = (NODE*) malloc(sizeof(NODE));
current->value = i*i;
current->next = NULL;
if (i == 0)
head = current;
else
prior->next = current;
prior = current;
}
current->next = head;
display_list(SIZE*2, head, SIZE);
}
void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}
-
argc is an argument count
-
arg is an argument vector
#include <stdio.h>
#include <stdlib.h>
struct node {
int value;
struct node *next;
};
typedef struct node NODE;
void display_list(int, NODE *, int);
int main(int argc, char** argv) {
int i, size;
NODE *head, *current, *prior;
if(argc != 2)
{
fprintf(stderr, "usage: %s number\n", argv[0]);
exit(1);
}
size = atoi(argv[1]);
for (i = 0 ; i < size; i++) {
current = (NODE*) malloc(sizeof(NODE));
current->value = i*i;
current->next = NULL;
if (i == 0)
head = current;
else
prior->next = current;
prior = current;
}
current->next = head;
display_list(size*2, head, size);
}
void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}
#include <stdio.h>
#include <stdlib.h>
struct node {
int value;
struct node *next;
};
typedef struct node NODE;
void display_list(int, NODE *, int);
int main(int argc, char** argv) {
int size = 0, temp;
FILE * fptr;
NODE *head, *current, *prior;
if(argc != 2)
{
fprintf(stderr, "usage: %s file\n", argv[0]);
exit(1);
}
if (!(fptr = fopen(argv[1], "r")))
{
fprintf(stderr, "error: %s, file not found\n", argv[1]);
exit(1);
}
while(fscanf(fptr,"%d", &temp) == 1)
{
current = (NODE*) malloc(sizeof(NODE));
current->value = temp;
current->next = NULL;
if (size == 0)
head = current;
else
prior->next = current;
prior = current;
size++;
}
current->next = head;
fclose(fptr);
display_list(size*2, head, size);
}
void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}
#include <stdio.h>
#include <stdlib.h>
struct node {
int value;
struct node *next;
};
typedef struct node NODE;
void display_list(int, NODE *, int);
void free_list(int, NODE *);
int main(int argc, char** argv) {
int size = 0, temp;
FILE * fptr;
NODE *head, *current, *prior;
if(argc != 2)
{
fprintf(stderr, "usage: %s file\n", argv[0]);
exit(1);
}
if (!(fptr = fopen(argv[1], "r")))
{
fprintf(stderr, "error: %s, file not found\n", argv[1]);
exit(1);
}
while(fscanf(fptr,"%d", &temp) == 1)
{
current = (NODE*) malloc(sizeof(NODE));
current->value = temp;
current->next = NULL;
if (size == 0)
head = current;
else
prior->next = current;
prior = current;
size++;
}
current->next = head;
fclose(fptr);
display_list(size*2, head, size);
free_list(size, head);
}
void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}
void free_list(int size, NODE* mynode)
{
NODE *current = mynode, *next;
next = current->next;
current->next = NULL;
current = next;
while (current->next != NULL) {
next = current->next;
printf("Free %d,%p\n", current->value, current->next);
free(current);
current = next;
}
printf("Free %d,%p\n", current-value, current->next);
free(current);
}
Lecture 15: C Programming, Part 3
-
be careful of endian-ness and dangling references
-
worms can run by itself, and can propagate a fully woring versionto other computers
-
virus adds themselves tto other programs. Cannot run independently
-
return oriented programming involves finding gadgets in the code
-
load stack with addresses to these gadgets
-
at the end of each gadget a return is called
Lecture 16: Accessing I/O devices in C
-
memory mapped I/O
-
nios timers: 6 16 bit timer registers
-
provides information about status, and for control mechanism
-
to is a timeout signal set to by the timer when the count is 0, can be reset by writing a 0 to it
-
run is to 1 by the timer if counting
-
start/stop are used to start/suspend the timer by setting the relevant bit to 1
-
managed by simply setting the various registers/addresses to specific values
-
-
Be careful of pointer types and thus type sizes
-
Be careful of sign extension
Lecture 17: Interrupt Handling
-
Single CPU, Multiple Tasks
-
interrupt handling requires all registers and context to be saved
-
exceptions are a transfer of control away from the normal program. These are software exceptions or hardware interrupts
-
threading allows for a simulation of multiple processes
-
threads are used for specific tasks
-
multithreading happens with scheduling techniques
Lecture 18: Hardware IO with the DE1
-
Final Project due the 20th
-
Midterm on the 6th over intel assembly and C programming
-
Use eclipse todeal with the mechanics
-
must open, install interrupt handler, and can then read/write to the device
-
use the HAL for the interface, can use keys 1-3 and ledg2-8
-
must set interrupt mask
Lecture 19: ABIs and Memory Layout
-
an ABI specifies
-
how data is arranged in memory
-
supported types and sizes
-
behavior and structure of the stack
-
function calling conventions
-
Will specify whether numbers are straight binary or two's complement
-
needed for
-
developing embedded systems
-
developing runtime systems
-
explaining runtime behaviors
-
-
-
Memory Layout
-
On linux, everything gets 4 gb, layed out in various ways
-
understanding it will help you find errors
-
Recitation Notes
-
Buliding a vending machine
Lecture 20: The NIOS2 Datapath
-
document avaliable on line
-
3 types of instructions in NIOS
-
R: rsrc1 (31:27), rsrc2 (26:22), rdst(21:17), opx (16:6), op code (5:0)
-
I: rsrc (31:27), rdst (26:22), immediate operand (21:6), op code (5:0)
-
J: address (31:6), 0(5:0)
-
involves dark voodo magic
-
-
-
MIPS also has R, I, J and they are similar, but layed out slightly differently
-
data paths are a tad confusing, but make sense if you examine them carefully
-
this is von Neumann
-
classical model
-
Steps
-
fetch
-
decode
-
read registers (if not call or j-type)
-
execute (most use the ALU, but in different ways)
-
write back to either registers or memory
-
-
Must be completed in one cycle
-
implications:
-
constraint: need separate memories for instructions and data (Harvard architecture)
-
choice: No caches are needed (to simplify design)
-
choice: register file has 2 output ports (to speed up reading 2 operand registers) and one input port
-
choice: programmable alu instead of separate functional units
-
-
-
ALUs must be controlled, in many cases, this is done with a nybble to choose the function
Lecture 21: Memory Hierarchy
-
Overview
-
Storage technologies and trends – pyramidal
-
Locality of reference
-
caching in the memory hierarchy
-
-
Tech and Trends
-
Registers
-
within processors
-
-
RAM
-
traditionally packaged as a chip
-
basic unit is a one-bit cell
-
multiple chips form the memory
-
static ram
-
4 or 6 transistor circuit, keeps value as long as it's powered
-
insensitive to noise and radiation
-
faster and moe expensive than dram
-
-
dynamic ram
-
stored with capacitor and transistor for access
-
value must be refreshed every 10-100ms
-
more sensitive to disturbances
-
slower and cheaper than sram
-
dxw organization
-
access is by row and then column
-
double data-rate synchronous dram (ddr sdram)
-
-
-
Non-volatile
-
types
- rom
-
programmed during production
- prom
-
can be programmed once
- eprom
-
can be bulk erased using uv or xray
- eeprom
-
electronic erase capability
- flash memory
-
eeproms with partial erase capability, wears out after about 100000 erasings
-
used for:
-
firmware programes
-
solid state disks
-
disk caches
-
-
-
Disk Drives
-
very mechanical
-
each platter is double sided, and has a head for each side
-
uses one of several interfaces
-
organized in tracks (concentric circles) and sectors (chords of each track)
-
aligned tracks form a cylinder
-
recording density is bits/inch
-
track density is tracks/inch
-
areal density bits/sq.inch
-
bytes/sector * sectors/track * tracks/surface * surfaces/platter * platters/disk
-
access = seektime + rotationtime + transfertime
-
logical blocks are maped to physical sectors by hardware/firmware
-
-
SSDs
-
are flash, take a long time to write
-
but are very fast for reads
-
generally have wear-leveling logic
-
-
-
busses
-
system bus
-
memory bus
-
io bridge interfaces memory bus and system bus
-
-
speeds have become dramatically faster, while storage hasn't caught up
-
locality of reference – programs tend to use data and instructions with addresses near or equal to those they have used recently
-
temporal – think loops, like a loop counter
-
spatial locality – nearby addresses tend to be referenced close together in time
-
Lecture 22: Pipelining
-
steps
-
fetch (f)
-
decode/read (r)
-
ALU/pc adder (c)
-
memory access (m)
-
write (w)
-
-
single-cycle
-
takes single cycle for frcmw
-
-
multi-cycle
-
takes a cycle for each step of frcmw
-
-
pipelining lets the process for a new instruction start at each pulse, effectively being both single cycle and multicycle simltaneously
-
has hazards: situations that delay or discard an instruction in the pipeline, we say that the pipeline is 'stalled', or that there is a 'bubble' in the pipeline
- structure hazards
-
multiple instructions using the same component at the same time
-
deleteing memory
-
when two instructions must read memory at the same time
-
pipeline stalling, delay instructions by a clock cycle, staggering, simple, but inefficient
-
component duplication for to code and normal memory
-
single memory, duplicated i/o
-
-
due to pc adder
-
when changing pc address due to branching
-
multiple pc adders
-
-
- data hazards
-
data dependency between two instructions
-
when registers are accessed at the same time
-
pipeline stalling, is very, very inefficient
-
operand forwarding, don't delay second instruction, but instead send the values of the relevant registers to the next instruction in the pipeline, much faster
-
software handling, puts nops detween relevant operations
-
-
- control hazard
-
control transfer instruction
-
Lecture 23: Caching in the Memory Hierarchy
-
levels:
-
registers
-
l1 cache
-
l2 cache
-
main memory
-
secondary storage (disks, solid state)
-
remote storage (tapes, distributed file systems, the web)
-
-
caches provide a smaller, faster storage device to act as a staging area for subset of data in a larger, slower device
-
these are used because of locality
-
three types of misses
-
cold (start up, always happens)
-
conflict miss (maps two potential blocks to the same place)
-
capacity miss (when active cache blocks is larger than the cache)
-
-
small and sram based, generally 2 levels
-
entries in caches have valid and tag
-
cache addresses have 3 parts
- block offset
-
4 bits
- set index
-
3 bits
- tag
-
remainder
-
addresses have s, b, tag, and if valid bit is set, you can use that cache line
-
where there is only one line per set, this is a direct mapped cache
-
when no match, then the old line must be replaced
-
on miss, grap from memory, cpu only accesses data from the cache
-
-
set associative, two lines per set
-
allows for two different group of info in the cache
-
Lecture 24: More Caching
-
review of last lecture
-
E-way set assosiactive cashe lets you have E lines per set
-
writes can have problems when multiple copies exist
-
L1, L2, Ln
-
Main memory
-
disk
-
other
-
-
On write hit
- write-through
-
write straight to memory
- write-back
-
defer writing until line relpacement, needs a dirty bit
-
On write-miss
- write-allocate
-
load into cache, update line in cache
- no-write-allocate
-
writes immediately to memory
-
typicall
-
write-through, write-allocate
-
write-back, no-write-allocate (most common)
-
-
metrics
-
miss rate
-
hit time
-
miss penalty
-
-
huge numbers game
-
to write friendly code, make the common case go fast
-
minimize the misses in inner loops
-
repeated references to variables are good
-
stride-1 reference patterns are good
-
Virtual Memory
-
Address Range may be up to a specific number, but only a certain portion is available
-
virtual memory lets you swap pages of memory in and out of slower devices
-
pages are the primary unit, and don't always have to be in memory
-
chunks of a process called pages, and chunks of memory are frames
-
the OS maintains a page table for each process, contains frame location for each page, address consists of a page number and offset within the page
Lecture 25: More Memory Management and Paging
-
taking process and splitting memory up into bits that can be swapped in and out of memory, stored on disk temporarily if necessary
-
this is done with virtual memory, locality and the like and managed by the operating system, sometimes with the help of an mmu
-
page is a virtual address, frame is a physical address
-
each page can go in any free frame
-
the page table is used to manage where each page is located, either in a frame or in the swap space
-
then addresses have a logical address, the first n bits is a page number, the remaining is an offset
-
the page number is replaced with the address start from the page table
-
-
replacement policy
-
deals with the selection of pages to replace and when new pages must be brought in
-
the more elaborate the policy, the greater the hardware and software overhead to implement it
-
policies
- optimal
-
selects the page for which the next reference is the longest (may not be known ahead of time), produces three page faults after the frame allocation has been filled
- least-recently-used
-
the one that hasn't been used most recently is replaced, many faults can be caused. This is very difficult to ipmlement, as most implementations require a great deal of overhead
- first-in-first-out
-
exactly as it sounds
-
-
translation lookaside buffer – a type of cache, caching the page table to make access to memory easier
-
page number and frame number pairs are cached
-
virtual access is checked, if tlb has it, get physical address, if not, check page table and then get physical address, unless non-address value is returned
-
-
page faults:
-
disk to memory
-
update page table
-
update tlb
-
get physical address
-
-
converting physicals to cache blocks:
-
convert VA to PA, check l1 cache, if available, get block, otherwise, update cache
-
in the event of tlb hit, no page fault is possible
-