Today's plan
- grid computing
- beowulf clusters
Grid Computing
- a few problems, many from grid.org:
- Search for Extra-Terrestrial Intelligence (SETI@Home), analyzing
radio signals for patterns
- research into cancer, anthrax, smallpox: screening good candidate
drugs that might "fit" (and therefore fight) disease proteins
- humane proteome folding: using genome information to compute
protein folding to figure out protein function
- e.g. the cancer research project
- "allows your computer to screen
molecules that may be developed into drugs to fight cancer. Each
individual computer analyzes a few molecules and then sends the
results back over the Internet for further research."
- "download a very small, no cost, non-invasive software program
that works like a screensaver: it runs when your computer isn't being
used"
- "There is no cost to participate and no impact on your computer
use. The project software cannot detect or transfer anything on your
machine but project-specific information."
- also the World
Community Grid
Grid Computing properties
- massively parallel problems, only works when there is much
computation per unit of communication (problem can be partitioned,
e.g. search for molecules or encryption keys)
- good PR is needed to enroll volunteers -- need prizes ("points"),
perhaps publicity (lottery-like system)
- individual processors will become unavailable -- need strategies
to deal with that, e.g. replicate processing
- cannot trust that malicious individuals (e.g. your competitors)
won't try to contaminate your results:
- could replicate processing on unrelated computers
- could double-check (on a trusted computer) results that are "too good
to be true"
- not a conventional operating system, but does need OS-like support
for replicated computation, process migration,
Cluster Computing
- Multiple computers used to solve a single problem are a cluster
- clustering can be used for high availability, e.g. replicated servers
- clustering can also be used for high performance
http://www.beowulf.org/overview/index.html:
Beowulf Clusters are scalable performance clusters based on commodity
hardware, on a private system network, with open source software
(Linux) infrastructure. The designer can improve performance
proportionally with added machines.
-
Common uses are traditional technical applications such as
simulations, biotechnology, and petro-clusters [petroleum visualization];
financial market modeling, data mining and stream processing;
and Internet servers for audio and games.
Beowulf Clusters
- commodity hardware, often x86 based, usually homogeneous (all machines
have similar capabilities)
- programming model is independent of the specifics of the underlying
hardware (commodity processors, commodity networks), even though specific
network design (device driver) may be customized for higher performance
- programming model based on standard libraries such as MPI (Message
Passing Interface) and PVM (Parallel Virtual Machine), as well as
free OS and programming tools
- each processor runs linux, and standard protocols are used for
communication
- PVM supports the distribution of work, where a single
multiprocessing program executes transparently on any available
processors
comparison with other clusters
- clusters are not new, but
- have often been one-of-a-kind or proprietary
- beowulf allows the creation of inexpensive, custom supercomputers
- based on the realization that off-the shelf (COTS) hardware and
software could be combined into a supercomputer