Operating Systems Principles

Lecture 1: Course Overview

Course on OS topics, including:
- organization and structure
- control, communication, synchronization of concurrent processes
- processor/job scheduling
- memory organization and management
- resource management
- deadlock avoidance, detection, recovery
- file system concepts/structure (assuming time)
- protection/security (assuming time)
- substantial programming
Required textbook, Silberschatz et al.
C & C++ Used
Homework must run on CSE
Homework will have some conceptual problems, but mostly programming

Lecture 2: Introduction

What is an OS?

Acts as intermediary between user and hardware
Executes user programs and makes solving problems easier
Makes the system more convenient to use
Enables efficient use

Computer System Structure

Hardware: basic resources, CPU, Memory, I/O
OS: Controls/coords hardware among applications and users
Applications: How resources are used to solve problems
Users: People, machines, other computers, etc.

What they Do

Depends on point of view
Most want ease of use and good performance, don't care about resource utilization
Shared machines must keep all users happy
Other machines have dedicated resources but rely on shared also

Operating System Definition

Resource allocator, manages all resources
Control program – prevents errors and improper use
No universal definition
Kernel runs at all times

Startup

CPU loads UEFI/Bios, initializes hardware/bootstrap
Finds boot device/bootloader
Loads kernel, starts execution
Starts init process

Operation

I/O & CPU can execute concurrently
Done through memory mapping

Interrupts

Used to get attention, either from hardware or from software

Structure

Multiprogramming/Batch
Timeshare/Multitask

Lecture 3: Operations, Continued

Keeping user processes from writing where they shouldn't
- Dual mode allows OS to protect things
- User mode is for user process
- kernel mode for privileged
- Mode bit potentially provided by hardware, some instructions are only available in kernel mode
Switching happens before scheduling
- A watchdog is used by the kernel.
- Only the kernel resets the timer
- Prevents (potentially) inf loops and other sorts of take-over

Responsibilities of OS

Process management
- Program is passive, process is active
- Needs resources to accomplish tasks
- Termination requires reclaiming resources
- Requires handling of program counters
- Most systems have many processes concurrently
- Activities
  - Creating/deleting user/system processes
  - Suspending/resuming processes
  - Providing process synchronization
  - Providing process communication
  - Handling deadlocks

Memory Management

Execution requires instructions are in memory
Data needed by the program must be in memory
Memory management determines what's in memory & what's not
Activities
- Keeping track of what's used and what's using it
- Deciding what to move in and out of memory
- Allocating/deallocating memory

Storage management

Uniform view of information storage
Abstracts to logical storage unit
Device used to control each medium
Activities
- Creating/deleting files/directories
- Primitives manipulate files/directories
- mapping files to secondary storage
- Backup to non-volatile storage media
- Access authorization
Different levels of storage, ordered by speed
- Registers
- Cache
- Main Memory
- Solid-State
- Magnetic media

I/O System

Hides hardware details from users
Activities
- Memory management of I/O
- Device/driver interface
- Hardware Drivers

Protection & Security

Protection controls access of processes/users to OS resources
Security defends system against attacks
Distinguish among users to determine who can do what
- Potentially many methods
- Unix model significant

Distributed Environments

Collection of many machines networked together
Often using TCP/IP
Different levels of network
Net Operating System provides features between systems, and presents illusion of singleness

Client-Server model

Many systems are servers, responding to requests from clients
Many different uses

Peer-to-peer

No distinction between client/server
Nodes often act as both
Must join, registering with some lookup service, or using discovery protocol

Lecture 4: Environments, Continued

Distributed Systems

Hide many network features from user

Virtualization

Lets an OS run an application inside another OS
Many virtual machines share hardware resources

Cloud Computing

Composed of traditional OSs, VMs & management tools
Requires special security
Load balancers spread traffic

History

Started for Real-time, moved on from there

Chapter 3: Processes

OS interleaves multiple process, allocates resources & supports interprocess communication
Processes are programs in execution, instance presently running that can be assigned to and executed on a processor
Formally: a unit of activity that is the execution of a sequence of instructiors, a current state and associated set of system resources
OS has program code (text section), current context, stack, data section & heap (malloc'd memory)
Program is passive, stored on disk. Process is active, also known as a job
Stack grows down from top, text from bottom, then data, then heap grows up from that, stack & heap may meet.
Stack includes parameters, return address, local variables, etc.

Lecture 5: Processes, Continued

Process execution follows a number of states
- New – being created
- Running – being executed
- Waiting – waiting for an event to occur
- Ready – waiting to be assigned to a processor
- Terminated – finished
- Normal 5-state diagram
Process elements
- Identifier
- State
- Priority
- Program Counter
- Memory pointers
- context data
- I/O information
- Accounting information
Struct called Process Control Block
- Created and managed by OS
- Keeps state, counter, register contents, scheduling information, memory information, account information, I/O status
Process tables used to manage processes
- Process image, user data, program, stack, process control block included (kinda)
- Process table size limited, max number of processes that can run.
Process images
- Somewhere in main memory. At least a small portion of which must be, at least
- Each page of process image is known
Process Creation
- Assigns PID
- Allocates space
- Initializes process control block
- sets up appropriate links
- Creates/expands other structs
Switching Processes
- Clock interrupts
- I/O interrupts
- Memory/cache/page faults
- Traps (errors/exceptions)
- System calls for IPC, I/O, transfer to OS

Lecture 6: Process, Continued

Limited capacity for processes and limited side
Process image has different attributes
- user data
- user programs
- stack, process control block
Process image located somewhere in memory, known by page table
CPUs switch from process to process for a number of reasons
- System Calls
- Timeslice expiration
- I/O,
- Etc
Process switched based on a handful of queues
- Context switching
- Saves context of process running
- Overhead, OS doesn't do anything "useful"
- Time dependent on hardware support
In detail
- Save context
- Update PCB of process
- Move PCB to appropriate queue
- Select next process for execution
- Update PCB of selected process
- Restore context
- Update memory management structures
Threads
- Allow a program to have multiple threads of control, therefore multiple storage for thread details in PCB
Process scheduling decides what gets CPU time
- Objective depends on perspective/goals
- Maximize CPU use
- Process schedule selects from available processes
- Queues
  - Job – all
  - ready – ready/waiting
  - device – waiting for I/O or event
  - Move between them
Scheduling described by queuing diagram
- Based on what is being waited for
Kinds of schedulers
- Short-term (CPU)
- Long-term (Job) – what should be brought into ready queue

Lecture 8: Processes, Continued

Suspended processes

Two states
Schedulers discussed
Processes suspended because I/O is slow
Blocked/suspended
ready/suspended
Now have seven states in process state diagram
Moving from suspended to main memory is activation

Creating Child Processes

Fork systemcall is used
- Parent and children share all resources
- Children shair subset
- Share no resources
- fine-grained
Can execute in different ways
Fork uses COW to handle shared memory in most situations
Fork returns pid of child to parent, 0 to child
Process synchronization is possible, but ugly

Lecture 9: Process Creation & Fork

Children have unique PIDs
Parents can be found with ppid()
Multiple possible outputs
Wait allows for waiting for child process completion
Process termination:
- Children die when parents do (more or less)
- Should wait to prevent, though this doesn't hold in Unix systems
- abort can be used to kill them
- Parent is alive, not waiting, process is a zombie
- Termination withou waiting give orphans
Ultimate parent is init, kinda

Multiprocess Applications

Why?
- Independen/cooperating
- Independent don't affect/or are affected by others
- Cooperating affect/can be affected
  - Useful for sharing information
  - Speeding things up
  - Concurrency
  - convenience
  - etc.
Chrome
- Browser
- Renderer
  - Sandboxed
- Plugin
Concurrent servers
- Dispatch/worker

Lecture 10: Multiprocess Applications, continued

Can be concurrent or independent
IPC needed to communicate between processes
- shared memory
- message passing
OS needs to create shared memory between processes to send messages w/o OS interference
Message Passing uses messages between two processes, messages passed via handful of different methods
- Can be slower than shared memory
Why not Shared Memory
- No OS involvement, up to applications for synchronization
Producer Consumer problem
- Producer produces info, consumed by consumer
- Can be unbounded or bounded
- Bounded-buffer with shared memory – solution requires both keeping track of in and out.
- Synchronization Problems significant
Message Passing
- Communication link must be established
- How? How many links
- Link capacity?
- Message size? Fixed? Variable?
- Directionality
- Direct/Indirect
- Indirect uses ports, or unique ids
- Link requires commonality, and may be associated with many processes
- Mailboxes can be shared

Lecture 11: Synchronization

Blocking is synchronous
- blocking send — sender blocked until received
- blocking receive — receiver blocks until message available
Non-blocking is asynchronous
- Non-blocking send — sends and continues
- Non-blocking receive — Receiver receives either valid message or null
Many combinations possible, if both are blocking, we have rendezvous
Direct communication, send/recieve
Links are established automatically
- Associated with one pair of processes
- Only one link, may be bidirectional
With synchronization, producer-consumber is trivial
POSIX Shared Memory
- Create with shm_open
- Give name, permisions
- Can also open existing SHM
- Must truncate to set size
- Can write to SHM
See Mach Kernel for message passing only system calls
- RPC could call other function
Pipes also allow communication
- Ordinary vs named
- Focus on ordinary

Lecture 12: Interprocess Communication: Pipes

Named pipes accessible outside of creating process
Ordinary pipes only accessed from process creating it
Parent creates pipe to communicate with child process
Ordinary pipes are normally uni-directional
- Producer writes to one end, consumer reads from other
- Requires parent-child relationship
- Array of two FDs
  - Write using pipe systemcall
  - Fork child, duplicates, etc.
- Pipe is one-way
- Close unused ends
- Messages read only once
- Must understand how processes work
Named Pipes
- More powerful – bidirectional
- no parent-child relationship
- Several processes can use them concurrently
Sockets, RPC, remote message methods
- Sockets are endpoints over the network
- Port numbers used to identify application
- Communication over TCP/UDP
  - Can be unicast/multicast

Lecture 13: Sockets

Socket is an endpoint for communication, it's an address & port
- Ports are numbered, those below 1024 are "well-known"
- A number of "special addresses" possible, including loopback (127.0.0.1, ::1)
Socket types
- Connection oriented – TCP, constant, three-way handshake
- Connectionless – UDP, packets can be missed with no issue
RPC can be used to abstract procedure calls
- Uses stubs, marshalling, and an IDL to manage this
Must manage endianness – big-endian has msb at start, little endian has msb at end
Servers must manage processes, etc
htons converts to network representation
inet_addr converts string to network representation
dup2

Lecture 14: Threads!

Many applications multithreaded
Threads within application
Multiple tasks implemented by separate threads
Process creation is heavy, thread creation is light
Multithreaded servers, creates new thread for each client
Lighter-weight than processes
Can simplify code/increase efficiency
Can increase responsiveness, make resource sharing more possible, etc.
Multicore/multiprocessor programming requires dividing activities, balance, splitting data, etc
- Parallelism is a system that can perform more than one task
- Concurrency supports more than one task making process
Data/Task parallelism
Process control block changes a bit, each thread has own registers, stack

Lecture 15: Threads!

Suspending process suspends all threads
Termination of process terminates all threads
Which thread handles signals?
Difference between user threads & kernel threads
User threads managed by library
- Executed entirely in usermode
- No kernel involvment, makes cheap to manage threads
- Cheaper to context switch
- Blocking calls block all threads
Kernel threads get support in the kernel
- Kernel schedules threads
- Blocking calls don't block all threads
POSIX, Windows, Java threads
Many-to-one
- Many user threads to one kernel thread
- One thread block blocks all
- Multiple may not be run in parallel
- Few things still use this
- Green threads, portable threads
One-to-one
- User thread maps to kernel thread
- Creating user threads creates kernel threads
- No blocking junk
- More concurrency
- User thread mapped to kernel threads/processes
Many-to-many
- Many threads multiplexed onto smaller/equal number of kernel threads
- OS can create sufficient number of kernel threads
- Thread blocks, other threads can be executed, fewer issues
Two-level model
- Similar to many-to-many, user threads bound to particular kernel threads
Thread libraries
- Provides API for creating/managing threads
- Can be two-level, user only, kernel level
- Can be asynchronous, or synchronous
- Sychronous is fork-join
  - Parent waits on child to terminate and join with parent
  - Combines result of calculation
  - Most common
Pthreads
- Either user or kernel level
- POSIX standard
- Specification
- API specifies behaviour
- Common in Unices
- pthreads are 1:1 on most linux systems
- Sharing on these
  - process instructions
  - data (mostly)
  - open file descriptors
  - signals/signal handlers
  - current working directory
  - user/group
- Unique
  - thread id
  - registers/stack pointer
  - stack
  - signal mask
  - priority
  - return value

Lecture 16: Threads, continued

Pthreads, uses too many pointers
Make sure that threads are joined/waited for, so that they things may be completed
Java Threads
- Mannaged by JVM
- Implemented using OS thread
- Must implement runnable interface or extend thread class
Implicit threading becoming more common, thread pools, OpenMP
- Creates number of threads in pool which await work
- Usually faster to service request with existing thread
- Allows threads in applications to be bound to size of pool
- Can be helpful for various mechanics
- OpenMP: Compiler directives for C/C++/Fortran/RATFOR?
  - Identifies parallel regions
  - Shared-memory environment
Processes vs Threads
- Threads are easier/faster to create
- Fork duplicates only calling threads
- exec replaces all running threads
- Signal processing happens at thread level
- Multithreaded Unices allow each thread to determine its handling
- Signal needs handled once, delivered only to first thread not blocking
Thread Cancelling
- thread to be cancelled is "target"
- termination of a thread before finished
- Asynchronous terminates immediately
- Deferred allows thread to determine if it should be cancelled periodically
- pthread cancel requests cancellation, does not do it immediately
  - Threads must reach cancellation points (by default, cleanup handler invoked to clean up)

Lecture 17: Process Synchronization

Processes can execute concurrently
Can be interrupted at any time
Concurrent access to shared data may result in inconsistency
Goal is to avoid inconsistency
Race conditions can happen – when stuff happens in-memory while something is attempting to use it…
- A situation where several processes access and manipulate the same data concurrently and the outpcome of the execution depends on the particular order in which the access takes place.
Problem: Critical sections
- Controlling access to critical sections
- May change from program to program, run to run
- Enters with entry, exits with exit, then remainder
- Only one can be in critical section at a time
Riddle:
- Parallelism allows better performance
- But concurrent use is difficult, protect with locking
- Critical sections serialize tasks
- Which eliminates parallelism
Mutual Exclusion
Progress
Bounded waiting – must be a number which bounds how many processes can wait for the system

Lecture 18: Process Synchronization

Question 4, homework: graph with 5 nodes, etc.
Solving th e Critical Section Problem
- Requires Mutual Exclusion
- Progress – If no process executing critical section, and there's some that whishes to, selection of that process cannot be postponed
- Bounded waiting – a bound must exist on the number of times other processes are allowed to enter critical section after process has made request to enter its own
OS Solutions
- Preemptive – process can be preempted when running in kernel mode – harder to design
- Non-preemptive – runs until exiting kernel mode, blocking or yielding CPU (free of race conditions)
Preemptive is the most common – prevents something from sticking around in kernel mode or refusing to yield
- non-preemptive kernels have lower performance
Special hardware can be used
- Goal is to protect via locks
- Uniprocessor can disable interrupts
- Atomic/non-interruptible hardware-level instructions, test word and set, swap contents of words
  - test-and-set/compare-and-swap can be abstracted
  - Test and set gets boolean, sets to TRUE, returns, executed atomically, sets new value to TRUE
    - Used for implementing locks
    - Can be used to implement bounded waiting also
- Mutex Locks used manage mutual exclusion
  - Protect by acquiring, then releasing

Lecture 18: Process Synchronization

Mutex Lock – spin lock
- Busy Wait
Semaphore is similar
- Integer value, can be incremented/decremented
- wait/signal
- Must be positive
- waits until S is available, decreases size (is a busy wait)
- Signal increases availability
- Due to Dijkstra
Mutual Exclusion – only single thread/process can access shared resources
- avoids race conditions
- monitors, semaphores, locks provide this
Synchronization – order access to shared resources
- provided by semaphores, monitors
Semaphores
- counting – can range over unrestricted domain
- binary semaphore – integer value ranges between 0 and 1
- Must consider correct initial value
Block/Wakeup can be used for non-blocking semaphores
- block invokes op on waiting queue
- wakeup removes process and places it onto ready queue
- Surprisingly simple
Semaphores are more complex than mutex, but better
- But deadlock & starvation can happen
- Deadlock is when processes indefinitely for event from one of waiting processes
- Starvation: processes may never be removed from semaphore queue it's suspended in

Lecture 19: Semaphores, Continued

Mutex Lock uses some assembly
Deadlock: P1 waits S, waits Q; P2 waits Q, waits S
Starvation causes indefinite blocking, where processes may never be removed from semaphore queues they're in
Priority inversion when low-priority have lock needed by higher-priority processes
- Solved by priority inheritance
Many well known & classical problems in synchronization
- All of which may be modeled using semaphores
- Bounded buffer problem
- Must keep track of buffer size
- Needs a mutex at 1, semaphore full at 0
- semaphore empty at N

Lecture 20: Classical Synchronization Problems

Midterm on Friday
- Available from 9:30 to 21:30
More complicated than simple producer-consumer problem
Bounded Buffer
- $n$ buffers
- mutex semaphore, initialized to 1
- full semaphore, initialized to 0
- empty semaphore, initialized to $n$
- Special cases: consumer wants to consume with no items; producer wants to produces with no space
- Producer
  - Wait on empty
  - Wait for mutex (make sure no other process is modifying buffer)
  - Add item
  - Signal mutex
  - Signal full
- Consumer
  - Wait on full
  - Wait on mutex
  - Consume item
  - Signal mutex
  - Signal empty
Reader/Writer problem
- Data shared among many processes
- readers read dataset only writers read/write
- Allow multiple readers to read concurrently
- But only one writer can write
- Shared data
  - Semaphore rw_mutex, 1
  - Semaphore mutex, 0
  - Int read_count, 0
- Special Cases:
- Writer:
  - Wait on rw_mutex
  - Write
  - Signal on rw_mutex
- Reader:
  - Wait on mutex
  - read count incremented
    - If 1, wait on rw mutex
  - signal mutex
  - read
  - wait on mutex
  - read_count decremented
    - if 0, signal rw_mutex
  - Signal mutex
- Starvation can happen
- Variations
  - No reader kept waiting unless writer has permission
  - Once writer is ready, performs write ASAP
  - Some kernels provide reader-writer locks
Dining Philosophers Problem
- Semaphore array, "chopstick"
- Philosophers alternate between eating and thinking
- Don't interact, will try to pick up two chopsticks to eat from bowl, need both to eat

Lecture 21: Semaphores & Monitors!

Exam up until semaphores, many forms of question
- No expectation that code will run on CSE
Semaphores present problems
- Incorrect usage
- Omission of wait or signal or both
- Deadlock and starvation are possible
- Incorrect usage is biggest problem with semaphores
- Solving usage problem is the issue
Monitors
- high-level, helps with process synchonization
- abstract
- only one process may be in monitor at a time
- Can't model some synchronization schemes
- Variables can't be access outside of monitor
Access to monitor happens through entry queue
Uses conditions $x$, $y$
- Two operations allowed, wait, and signal
  - Wait queue exists, suspension until signal relevant
- Conditional variables only have wait queue, no counter
- Conditional variables aren't sticky, if no one waits for signal, signal isn't saved
- Must check for condition on your own
- Monitors then use condition variables
- Different kinds of condition variables

Lecture 23: Monitors, Continued

Recall that not all sync problems can be handled with monitors
Remember that only one thread may run in the monitor at a time
Signal and Wait discipline
- Signal, then wait on entry queue
Signal and Continue discipline
- Signal, then keep going
- Requires a mutex, a next, and a next count
- Procedures wait on mutex, and check if anyone is waiting, signal waiting threads, otherwise, give up lock
- Signal requires processes waiting
Consumer Producer implementation with monitor is much simpler

Lecture 24: Monitors & Synchronization, Continued

Consider monitors with signal & continue
Make sure to regularly check condition waiting on

Need a pthread mutex and 2 pthread condition variables

Locking done manually, condition variables wait/signal

  class ProducerConsumer {
    int nfull = 0;
    pthread_mutex_t m;
    pthread_cond_t has_empty, has_full;
   public:
    producer() {
      pthread_mutex_lock(&m);
      while (nfull == N)
        pthread_cond_wait(&has_empty, &m);
      //fill
      nfull++;
      pthread_cond_signal(has_full);
      pthread_mutex_unlock(&m);
    }
    consumer() {
      pthread_mutex_lock(&m);
      while(nfull == 0)
        pthread_cond_wait(&has_full, &m);
      // empty slot
      nfull--;
      pthread_cond_signal(has_empty);
      pthread_mutex_unlock(&m);
    }
  }

Linux preemptive after 2.6
- Semaphores, atomic integers, spin locks, reader-writer versions provided
- On Single-CPU, spin lock by disabling preemption
Pthreads – OS independent
- Provides mutex locks, conditional variables
- Can model monitor easily
Semaphores defined as globals

Lecture 25: CPU Scheduling

Threads are what is scheduled, processes/kernel threads are about the same
Core is unit of CPU
Non-preemptive, once in running, continue until terminates, blocks for I/O, yields
Preemptive, Running process may be preempted and moved to ready, allows for better service by preventing monopolization of processor
Goal is to maximize CPU utilization
I/O burst cycle, cycle of execution and wait
CPU burst followed by I/O burst
CPU burst distribution is concern
Some processes are CPU bound, others are I/O bound
Concurrency and context switching is a significant part of the issue
Short-term selects from processes in ready queue, allocates to one of them; Queue may be ordered in various ways
1. Switch from running to waiting
2. Running to ready
3. Waiting to ready
4. Termination
1 & 4 are non-preemptive
Dispatcher is what handles switching contexts, modes, etc
Scheduling Criteria
- Response Time; Turnaround time – User-oriented
- CPU utilization; Throughput; Waiting time – System-oriented

Lecture 26: CPU Scheduling, Continued

Optimization on various criteria
First Come first Serve scheduling is possible, non preemptive, etc

Lecture 27: CPU Scheduling, Continued

Priority scheduling – priority queue used
- Aging of priorities used to combat starvation
Round Robin

Lecture 29: Deadlocks

Due to concurrent execution of processes
Permanent blcoking of processes that compete for system resources or to communicate with each other
No efficient solution
Conflicting needs for resources
Requires protection
Understand what causes deadlock – blocking
System model
- System has resources of various types
- Each resource has a number of instances
- Each process utilizes a resource by requesting, using, releasing
Deadlock then arises when
- Mutual exclusion exists
- Hold and wait is the model
- Preemption does not happen
- Circular wait happens
Allocation Graph
- $V$ is processes $P$ and resources $R$
- Request edges directed from $P$ to $R$
- Assignment edges, directed from $R$ to $P$

Lecture 30: Deadlocks, Continued

Resource allocation graph can be used to detect/recover from deadlock
- By transforming to different kind of graph
Graphs can have cycle but no deadlock, but depends on execution order
No cycles means no deadlock
If cycle, and only one instance, deadlock
If cycle, many instances, possible deadlock
Ensure system will never enter deadlock with prevention/avoidance
Or, just ignore deadlock, and pretend it doesn't occur
Mutual exclusion must hold for non-sharables
Hold-and-wait guarantees whenever a process, it doesn't hold others

Lecture 31: Deadlock Avoidance

Simplest method is processes declare max number needed
- Dynamic examination of resource allocation state to ensure no circular wait
- Resource allocation state defined by number of available, allocated, max demands
- Safe state if exists a sequence of all processes such that for each process, the resources that it requires can be satisfied by available
- Non-safe state does not guarantee deadlock
- Single instance of resource – uses resource allocation graph
  - Cyclicality means non-safe
- Multiple instance – use banker's algorithm
  - Must claim max use a-priori
  - Request may require waiting
  - All resources must be returned w/in finite time
  - n processes; m resource types
  - nxm matrix showing max matrix; allocation matrix, available vector, need matrix
  - Work, finish are vectors
    - avail copy to work
    - finish false for each process

Lecture 32: Deadlock Avoidance, Continued

Banker's Algorithm:
- Check if request is less than or equal to need, otherwise, error!
- Check if request less than or equal to available, else wait
- Allocate resources by modifying state
  - available -= request
  - allocation += request
  - need -= request
  - If this is safe, allocate, if unsafe, wait, and resource allocation state is restored
- Need is max - allocation
Recovery scheme must be used to go from deadlock to safe-state.

Lecture 34: Main Memory

CPU -> main memory -> disk
CPU interacts with processes residing in main memory
Main Memory
- Program brought from disk into memory, placed in a process
- Main memory and registers are what CPU can access directly
- Memory is addresses/read requests/write requests
- Register in single CPU cycle
- Main can take longer
- Cache is between main memory and CPU
- Protection of memory is required
Load address, Add, Store, etc.
Base and limit registers used to define physical address space
- CPU checks accesses generated in user mode to make sure it's within allowed.
Addresses are bound, represented in different ways through the execution cycle
- Compiled code addresses bind to relocatable
- Linker/loader binds to absolutes
- bindings map address spaces to other address spaces
- Compile time, if known a priori it can be generated
- Load time, code must be relocatable
- Execution time – requires hardware support, process can be moved during execution
Difference between logical & physical addresses
- Logical addresses are generally used in user code

Lecture 35: Main Memory

Logical vs Physical Space
- Logical is in the CPU, virtual address
- Same in compile time/load-time schemes
- Virtual/physical is different in execution time binding
- logical address space is all logical addresses generated by a program
- physical address is real address that MMU works with
MMU
- Hardware that maps virtual to physical addresses
- Many methods
- Simple scheme, uses relocation register, added to every address generated before sent to memory (MS DOS)
  - Base & Limit registers
- User program deals with logical addresses, execution time binding when reference is made to location in memory
- Logical address bound to physical
- Dynamic Relocation with register
  - Not loaded until called
  - Better memory space utilization (unused aren't loaded)
  - All routines in relocatable load format
  - Large amounts of code handling many infrequent special cases
  - No special support needed; implemented through design
  - OS can help by providing dynamic loading
Methods
- Fixed partitioning
- Dynamic partitioning
- Simple Segmentation
- Simple Paging
- VM Paging
Fixed Partitioning
- Equal sized partitions used
- Each process assigned unique partitions
- Spaces between partitions, in partitions are due to internal fragmentation, poor management of memory
- Swapping
  - When memory is full, processes can be swapped out of memory to a backing store, swapped back into memory for continued execution
  - Total memory used can thus exceed physical memory capacity
  - Backing store, large, fast disk
  - Major part of swap time is transfer, transfer time proportional to amount of memory swapped
  - Context switch time including swapping
    - Next process isn't in memory, swap out, and swap in, time can be high
    - Depends on disk speed, bus width, etc.
    - Reduce if we reduce size of memory swapped
      - Requires knowing how much memory is actually being used to improve
- Internal fragmentation, long context switch times, but simple
Dynamic Partitioning
- Variable number/length of partitions
- Allocated as memory is required, no internal fragmentation happens
- However, external fragmentation is possible, as there can be holes between partitions

Lecture 36: Main Memory

Internal vs external fragmentation
With external fragmentation, given $n$ blocks allocated, $1/2 n$ lost, gives $1/3$ memory unused/fragmented
External frag can be reduced by compaction
- Shuffle memory to place free memory together
- Possible only if relocation is dynamic
- Backing store has same potential issue
- Free Block allocation is part of this
  - Best fit chooses block closest in size, fragmentation of smallest size is left; compaction more frequent
  - Worst-fit: chooses largest block
  - First-fit: scans from beginning, chooses first large enough, very fast
  - First-fit & Best-fit have similar performance, first-fit most frequently adopted
Simple Segmentation
- Main memory enforces a way for programmers to think about programs
  - Main program, objects, arrays, stacks, variables
  - The stack, libraries, etc
  - Elements in segment are offsets within it
- Segments do not have to be same length, but is maximum length
- Addressing is segment number and offset
- Segments aren't equal, similar to dynamic partitioning
- Logical address consists of segment number and offset
- Segment table maps physical addresses into base and limit
  - Segment table base register points to location in memory
  - segment table length register indicates number of segments used by program
- Some hardware required for this
- This is the origin of the "segmentation fault"
- Segmentation permits non-contiguous address space

Lecture 37: Memory Management, Simple Paging

Physical address space divided into frames, which are fixed-size blocks, power of two size
Divides logical memory into blocks of same size, pages
Free frames must be tracked
Program of size $N$ pages, needs $N$ free frames
Page table translates logical to physical addresses
Backing store split into pages
Internal fragmentation still exists
Addresses include page numbers (index into page table, based on number of pages)
- Page offset also used, includes offset into page
Memory protection includes valid/invalid bits; can cause traps to kernel

Lecture 38: Memory Management, Paging, Continued

Bits as before
Implementation
- Some OS put in PCB
- Could also use dedicated registers
- Mostly, PT is in main memory
- PTBR points to table, PTLR indicates size
- Every access requires two accesses, one for table, one for data
- Solved if Associative memory/TLB used, part of instruction pipeline
  - A kind of parallel search, if p is associative, gets frame, else get from page number in memory
- TLB access is very quick; main memory takes longer

Lecture 39: Virtual Memory

Code must be in memory, but the entire program isn't always used
Entire program is not needed at the same time
Ability to execute partially leaded programs is useful, requires less I/O
Increases multiprogramming
Implemented using demand paging
See notes from 351
Stack starts at max address, grows down, heap grows up
Sparse address space with holes for growth
Libraries shared by mapping into address space
Shared memory maps pages into vasp
Pages shared by fork
Demand pages brings only pages needed
paging w/ swapping
Suppolts more users, faster response
Swapping and paging are different
Lazy swapper never swaps unless it's needed
Swapper deals with pages is a pager
MMU required for demand paging
MMU detects location of page not resident and loads into storage, without changing behavior or code
Validity bit used to indicate memory residence
Page fault trapped by OS, OS finds page and makes resident
Which page is swapped out is determined some how

Lecture 40: Virtual Memory, Continued

Fetch policies determine when a page is brought into memory
Demand paging brings in when reference is made – more page faults when startefd
Pre-paging brings in more than needed
More efficient to bring in contiguous on disk
Demand paging has 12 stages in worst case, trapping to OS, saving state, determining page fault, check reference legality, read from disc, allocate to other process, resolve disk subsistem interrupt, save registers, determine interrupt from dysk, correctp page table, wait for CPU to be allocated to new process, resolve registers, state, and resume
Memory Access time
Page Fault service time
Avearge time, (1 - P)MAT + p
Understand Copy on Write
No Free frame
- Use page replacement, terminate/swap/replace? Performance is major issue
Over allocation of memory prevented by modifying service to include replacement
Dirty bit used keep track of modifications, and reflection to disk
Replacement
- frame-allocation algorithm – how many to give, which to replace
- Page replacement
  - Wants lowest page fault on first acces and reacces
  - Run on particular string of references, compute number of faults
  - String is page numbers, not addresses
  - Results depend on number of frames available
  - First In First Out also an option
  - Belady's anomaly — More frames can cause more fault
  - Optimal algorithm replace page not used for longest period of time
  - LRU uses least-recently used – closest to optimal
    - Requires special approximation, instead, use a reference bit, replace unreferenced pages and reset all reference bits
  - Review Enhanced Second Class Algorithm