Today's plan
- distributed computing
- amoeba
Distributed Computing: Concepts
- Do we really care where our computer lives?
- no -- as long as we can access all our files and run all our programs
- yes -- for security
- sometimes -- for performance
- benefits of distributed computing: the right number of computers
at the right time
- challenges of distributed computing:
- providing fast access to computing power
- providing fast access to data (files)
- providing access to I/O devices
- providing secure access and computing
- the computing environment may be as important as the
computing itself -- fortunately, Unix-like systems have supported
distributed environments well, e.g. remote window clients via X
windows (1980s), secure logins such as SSH (late 1990s)
- Windows environments have supported a different subset
of distributed access, mostly access to data
Distributed Computing: Goals and Strategies
- use a collection of computers as a single computer
- sometimes, designed to enhance reliability (if one computer
is down, you can still use these other computers)
- process migration for faster performance or for access
to specialized devices (fork can make programs faster,
e.g. in make)
- remote procedure call to execute code that must be executed
on a given machine
- paging across a network (to another system's memory),
may be faster than paging to disk
- a distributed file system is usually included with any
kind of distributed computer, so all processors "see" the
same file system
Architectures
- closely coupled processors with a single shared memory (but
separate caches) and devices shared among all CPUs: MP, multiprocessor,
often now handled by traditional operating systems (Linux, Windows)
- Network of Workstations (NoW), general-purpose workstations that
agree to cooperate to get work done
- Beowulf cluster, similar to an NoW but processors not intended
for general-purpose use (maybe no graphics card, maybe no local disk),
interconnected for high performance, and often in larger numbers than
a typical NoW
- grid computing: workstations, scattered across an internet, that agree
to perform requested work while idle
Process Migration
- goals:
- CPU load balancing
- making available more RAM
- access to different I/O devices
- implementation:
- must transfer all segments (this is relatively easy with virtual memory)
- must transfer the process table entry for the process
- must transfer all open file descriptors (this is easier with
a distributed file system)
- as an alternative to transferring all the memory all at once,
can use demand paging to transfer each page of memory as needed
- management:
- the load on a CPU can vary quickly, as can the memory occupancy
- so results that are "good enough" might be just as useful as results
that are "optimal"
inter-process communication
- suppose a migrated process has to send data on a pipe to a process
that has not migrated
- the data will have to cross the network, which it would not have had
to do before the migration
- the process that did not migrate may be a driver for a device
(equivalent to a task in Minix)
Remote Procedure Call
- suppose a migrated process has to perform a system-specific call
- the process calls a stub instead:
- this client stub marshals (copies and puts into a linearized form)
the arguments, then
- sends them to a server stub
- which unmarshals them, and
- makes the call, then
- marshals any results,
- sends them back to the client stub, which
- unmarshals the results, and
- returns to the caller, which is unaware that the call required networking
- Sun RPC was (among?) the first remote procedure call systems
- a more recent version is CORBA, the Common Object Request Broker
Architecture (in CORBA, the client stub is simply a stub, and the
server stub is a skeleton)
- most or all of the stub generation can be automated and can be
made largely programming-language independent
Distributed System Management
- having to individually manage each machine might be hard
- so it might be better to have a system view, e.g. a global ps
- also, distributed tools to monitor memory and I/O devices
- must be able to execute a command on all CPUs
- multiple views: per-system view, and aggregate view
A simple distributed system
- use ssh to start remote processes
- use NFS (or Andrew, Coda, etc) to share data
- use X windows to provide remote display
- not location-transparent, not automatic, does not work as
a single system
Amoeba
- see
http://www.cs.vu.nl/pub/amoeba/amoeba.html (see especially the description in PDF
format).
- main goal was to have a collection of computers behave as
a single system
- amoeba supports automatic process migration (for processes
created using fork) as well as threads within processes
- the entire amoeba system is used as a single computer by
any number of users
- each user has a display system, which might be a workstation
or an X terminal
Amoeba Implementation
- microkernel implements processes, communications, objects, device
I/O, and memory management
- servers implement file service (e.g. the Bullet file server, which
serves file contents but knows nothing about names)
- the Fast Local Internet Protocol (FLIP) provides efficient
local communication and supports location-independence
- RPC is supported via the Amoeba Interface Language (AIL)
- group communication supports common parallel processing constructs,
e.g. barrier synchronization, where processes in a computation
block until all processes have reached the barrier
- no swapping or paging
- sound familiar?
Amoeba Concepts
- objects are abstract data types
- for example, a directory object supports operations such as
create a name/value pair, and look up a value given a name
- each object is managed by a server process (a server may manage
multiple objects) to which messages are sent via RPC
- each object has a capability, which is assigned when
the object is created -- a kind of digital signature
- the capability is included with every request to a server
Amoeba application services
- the Bullet file server only stores contents, not names
(but can be used, e.g. to store directories, whose contents are names)
- no blocks -- all files stored contiguously on disk, and loaded
or written (or sent across the network) as a unit
- files are identified by a capability
- the directory server stores pairs of (name, capability), and can
therefore be used to index things other than data on disk
- each directory entry may actually have multiple capabilities,
which allows for replicated services and for selecting the "nearest",
"best", or "available" object
- Orca programming language makes parallel programming more explicit
- Posix compatibility library (Ajax)
- TCP/IP server provides standard networking
Process Migration
- e.g. see
http://citeseer.ist.psu.edu/steketee94implementation.html
- for a short summary of process migration issues, see
http://www.cs.panam.edu/~meng/Course/CS6334/Note/master/node24.html
- Amoeba's process migration was added after the basic operating
system development
- processes may only migrate among machines with the same
architecture (homogeneous process migration)
- the process migration server will only migrate processes
with a valid process migration capability
- migration servers on the source and destination work with process
servers on the source and destination to get the process set up and
running again
- messages to a process that is migrating are failed in such
a way that the sending FLIP will try again after a short delay
- messages to a process that has migrated receive a "moved"
message, which causes the sending FLIP to search for the process's
new location
- an execution token is passed by the sender to the
destination once the transfer is complete, so that only one copy
of the process may be running at any given time