vnfe5.phas.ubc.ca
opteron/infinipath cluster
|
Send
bug reports, suggestions etc. to Matt. |
- See HERE for a recent snapshot of usage on the
cluster.
- See HERE for recent node load factors.
- See HERE for usage summary by user.
- See HERE for /home and /home2 usage
summary by user.
The vnfe5.phas.ubc.ca cluster
is a 44 compute node machine with an
Infinipath (Infiniband) interconnect.
Each compute node contains
- 2 x Dual-Core Opteron Processor 2216 @ 2.4 GHz (4 cpu cores per
node)
- 4 GB RAM
- Approx. 66 GB local storage mounted as /var/scratch
- A Pathscale Infinipath (Infinband) network interface card (NIC)
for high bandwidth IPC (MPI)
- 1000 MB Ethernet for TCP/IP
The compute nodes are named node001,
node002,
... node043, node045, and are
connected with each other and with the additional nodes listed below
through both Infinipath and 1000 MB Ethernet switches.
Additionally, there are two special-purpose nodes:
- A dedicated head node, vnfe5.phas.ubc.ca.ca,
which has the same
processor, RAM and NIC configuration as the root nodes. This is
the only node that has a network connection to the external world.
- A dedicated I/O node, storage01,
that hosts 4.5 TB of RAID
storage which is NFS-mounted by the head node and all of the compute
nodes.
The OS on all nodes is CentOS 4.5
Linux, currently running with a 2.6.9
kernel.
IMPORTANT!! As is the
case with all of our current and past clusters, it is crucial that all
users be cognizant and considerate of the needs and usage patterns of
other users. In addition, it is every user's responsibility to
practice "responsible computing", which includes, but is not limited to
keeping their disk usage under control, and ensuring that their jobs
are not significantly impacting overall system performance,
particularly command line responsiveness on the head nodes. Bear in
mind that this cluster is first and foremost for use by Joerg Rottler
and his group members: we numerical relativists are guests on the
machines, and need to act as such!
The software environment on the head node includes four separate
compiler suites (commands for compilation of F77, F90, C and C++ code,
respectively, are given in parenthesis):
- PGI 7.0 (pgf77,
pgf90, pgcc, pgCC)
- Intel
9.1 (ifort, ifort, icc, icc)
- Pathscale 3.0 (pathf90,
pathf90, pathcc, pathCC)
- GNU
3.4.6 (f77, <NONE>, cc, c++)
There are man pages for all
but the GNU
compilers.
Here are sample invocations for simple (single source file) builds of
optimized F77 and C executables that link against one of the group's
standard libraries using each compiler suite
PGI
%
pgf77 -L/usr/local/pgi/lib -fast foo.f -lbbhutil -o foo
% pgcc -L/usr/local/pgi/lib -fast
foo.c -lbbhutil -o foo
Intel
% ifort
-L/usr/local/intel/lib -O3 foo.f -lbbhutil -o foo
% icc -L/usr/local/intel/lib
-O3 foo.c -lbbhutil -o foo
Pathscale
%
pathf90 -L/usr/local/pathscale/lib -O3 -fno-second-underscore foo.f
-lbbhutil -o foo
% pathcc -L/usr/local/pathscale/lib
-O3 foo.c -lbbhutil -o foo
GNU
% f77
-L/usr/local/lib -O3
-fno-second-underscore foo.f
-lbbhutil -o foo
% gcc -L/usr/local/lib -O3 foo.c
-lbbhutil -o foo
If you are using the tcsh you
can use the following aliases to set the values of environment
variables such as F77, F90, CC, CXX,
LDFLAGS etc
to appropriate values for the various compilers:
- popt, pdbg, p-mpich: For
optimized, debug, and optimized-parallel builds, respectively, using
the PGI compilers.
- iopt, idbg: For
optimized and debug builds, respectively using the Intel compilers.
- path-opt, path-dbg, path-mpich: For
optimized, debug and optimized-parallel builds, respectively,
using the Pathscale compilers.
- gopt, gdbg: For
optimized and debug builds, respectively, using the GNU compilers.
Note that execution of any of these aliases results in an echo of which
variables are set, and to what values. Should you wish to execute one
of these aliases in your ~/.cshrc
file, in order to define a
default compilation environment at login-time, you should be sure to
redirect standard output and standard error to supress the echoing. E.g.
pathopt >& /dev/null
Also observe that the Pathscale folk take their licensing seriously. When a user
invokes one of the compilers (pathcc,
path90, etc), a lease is issued and, independent of the time it
takes to compile, the lease will not expire for something like 5
minutes. Since we currently have only a single-concurrent-lease
license, this means that no other user will be able to use the compiler
for at least 5 minutes. Thus, don't be surprised to see the error
message:
** Subscription: Unable to find a server. The PathScale products cannot run without a subscription.
Please see http://www.pathscale.com/subscription/1.1/msgs.html for details.
For more information, please rerun with -subverbose
when trying to use a Pathscale compiler. Unfortunately, at the
current time, there's nothing an ordinary user can do about this but
wait it out.
Serial (single processor) job
submission using PBS |
Follow these steps to submit and run single-processor jobs on the
cluster under PBS:
- Build your executable using your favorite compiler suite, as
sketched above.
- Create a PBS script file. You can see the contents of a
basic template file, which needs occurences of 'XXX' replaced
appropriately, HERE.
- Submit the job to the queue using qsub. E.g.
% qsub serial.pbs
Currently, there is only a
single queue on the system, which handles both serial and parallel jobs.
- Monitor the status of your job using qstat, delete it from the queue
using qdel etc
See the man pages for qsub, qstat, qdel etc. for full details
concerning the syntax and semantics of the PBS commands
Parallel (MPI based) job
submission using PBS |
TO COME!!