ClusterNFS: Simplifying Linux Clusters
Gregory R. Warnes, Ph.C.
Fred Hutchinson Cancer Research Center
May 11, 1999
The cNFS Mascot, Henry the Hydra
Outline
-
Why Linux Clusters?
-
High Performance + Low Cost
-
What Makes Clusters Hard?
-
Admins: Maintenance!
-
Users: Distributing Tasks
-
The Tools
-
MOSIX: Easier for Users
-
ClusterNFS: Easier for Admins
-
MOSIX + ClusterNFS in Action: The BioHive Cluster
-
Cluster Recipe
-
Ideas and Future Plans
Why Use Linux Clusters?
-
High performance
-
Close to 1:1 speedup (modulo CPU speed differences) for our parallel application.
-
Perfect 1:1 speedup for batches of independent simulations
-
Low Cost
-
Diskless Dual Celeron-500: $800/ea
-
Diskless Athlon-850: $1000/ea
-
Relatively Easy to Build and Maintain
Why Linux Clusters? High performance
Elapsed Time
Why Linux Clusters? High performance
Speedup
Why Linux Clusters? Low Cost
What Makes Clusters Hard?
-
Setup - Administrator
-
Setting up a 4 node cluster by hand is pretty easy.
-
Setting up a 16 node cluster by hand is mind-numbing and prone to errors.
-
Maintenance - Administrator
-
Ever tried to update a package on every node in the cluster?
-
How about 3 configuration files?
-
How do you know if you missed one machine?
-
Running Jobs - Users
-
Running a parallel program or a set of sequential programs requires that
the users figure out what hosts are available and manually assign tasks
to nodes.
-
Users usually don't want to see this much detail.
The Tools: MOSIX
-
Description:
-
MOSIX is an enhancement to the Linux kernel that provides adaptive (on-line)
load-balancing and memory ushering between x86 Linux machines. It uses
preemptive process migration to assign and reassign the processes among
the nodes to take best advantage of the available resources.
-
Translation:
-
MOSIX moves processes around the cluster to balance the load, using faster
machines first.
-
Source:
-
Amnon Barak, CS Department of the Hebrew University of Jerusalem
-
URL:
-
http://www.mosix.cs.huji.ac.il
The Tools: ClusterNFS (cNFS)
-
Description:
cNFS is a patch to the standard Universal-NFS server (uNFS) code that
``parses'' file requests to determine an appropriate match on the server.
Whenever a client requests the file filename,
the server check for the existence of one of the following files, returning
the first match:
filename$$UID=xxxx$$ |
user's id |
filename$$GID=xxxx$$ |
user's group id |
filename$$HOSTNAME=ssss$$ |
client hostname |
filename$$IP=xxx.xxx.xxx.xxx$$ |
client ip number |
filename$$CLIENT$$ |
always matches |
filename |
default |
-
Example:
-
When client machine bee3 asks for file /etc/hostname
it gets the contents of /etc/hostname$$HOST=bee3$$.
-
Source:
-
Gregory Warnes, Fred Hutchinson Cancer Research Center
-
URL:
-
http://queenbee.fhcrc.org/ClusterNFS
MOSIX + ClusterNFS in Action: the BioHive Cluster
http://queenbee.fhcrc.org
Making Clusters Easy : MOSIX + ClusterNFS
-
Setup - Administrator
-
Setup server
-
Compile rootNFS kernel. Make floppies.
-
Plug in switch
-
Plug in nodes. Insert Floppy.
-
Boot.
-
Maintenance - Administrator
-
Changes made to server immediately take effect on all clients.
-
Adding a node requires changing or copying 8 files and making a bootdisk.
-
Running Jobs - Users
-
Users log into a ``master'' node, MOSIX distributes tasks.
Making Clusters Easy for Users: MOSIX
MOSIX (http://www.mosix.cs.huji.ac.il)
is a dynamic load-balancing system that transparently migrates tasks between
machines.
-
Users log into ``master'' node
-
Jobs started on the master node automagically migrate to fastest
/ least loaded machine.
-
Parallel jobs need not specify nodes
-
Sequential jobs started as if on SMP
-
Job Control (ps, top, kill) occurs as if whole cluster is one system
Users never need to know details of cluster configuration.
Diskless Servers: Traditional Method
Server:
-
BOOTP server
-
NFS server
-
Separate root directory for each client
Client:
-
BOOTP to obtain IP
-
TFTP or boot floppy to load kernel
-
rootNFS to load root file system
Diskless Servers: Traditional Method
This method requires a separate root directory structure for each node.
-
Hard to Set Up
-
Lots of directories with slightly different contents.
-
Even with symlinks this gets messy fast.
-
Difficult to Maintain
-
Changes must be propogated to each directory.
-
No easy way to see what differs between directories.
Diskless Servers: ClusterNFS Method
Server:
-
BOOTP server
-
ClusterNFS server
-
Single root directory for server and clients
Client:
-
BOOTP to obtain IP
-
TFTP or boot floppy to load kernel
-
rootNFS to load root file system
Diskless Servers: ClusterNFS Method
ClusterNFS allows all machines (including server) to share the root filesystem
-
All files are shared by default.
-
Files for all clients are named filename$$CLIENT$$
-
Files for specific clients are named filename$$IP=xxx.xxx.xxx.xxx$$or
filename$$HOST=host.subdomain.domain$$.
Diskless Servers: ClusterNFS Method
ClusterNFS Recipe (Sketch)
On the Server
-
Install and configure Debian Linux
-
Install ClusterNFS
-
Download and Compile MOSIX and Kernel, enabling BOOTP and
RootNFS.
-
Copy the Kernel to Floppies
-
Add entries for each client to
-
/etc/hosts,
-
/etc/mosix.map,
-
/etc/bootptab,
-
/etc/exports, and
-
/etc/hosts.allow.
-
Create files that are the same for all clients, filename$$CLIENT$$.
-
Create files that are specific to individual clients filename$$IP=xxx.xxx.xxx.xxx$$
-
reboot server to restart all services
The complete recipe is available at Recipe.html.
ClusterNFS Recipe
On the Client
-
Insert boot floppy
-
Boot
-
Record Ethernet MAC address displayed by kernel
-
Add to \etc\bootptab on
server
-
Reboot
The complete recipe is available at Recipe.html.
Plans and Ideas
-
Need to write this up as a paper.
Wanted: Volonteer to do the writing in exchange for
co-authorship.
-
Instant Cluster:
Pile of Windows Boxes
+ Pile of Boot Floppies
+ 1 Linux Server runing ClusterNFS
= Instant Linux Cluster
No configuration on the client!
-
Use of DHCP instead of BOOTP?
-
Auto-configuration: Unrecognized MAC address causes server to
-
Assign a new IP number and hostname
-
Add appropriate entry to /etc/bootptab
-
Create new machine specific files using template scripts, say filename$$TEMPLATE-SCRIPT$$,
-
??
(c) 2000 by Gregory R. Warnes