/README.QsNet
https://code.google.com/ · Unknown · 49 lines · 38 code · 11 blank · 0 comment · 0 complexity · c875bb9030a266495a389ccae54496cc MD5 · raw file
- Running Elan MPI jobs on QsNet Clusters
- ---------------------------------------
- If built with --with-qshell or --with-mqshell, pdsh may be used to run
- MPI jobs on a QsNet interconnect. This option requires that you have
- the Elan user space libraries installed (qsnetlibs or qswelan-r RPM for
- Linux) and that your kernel be patched to run the 'elan3' or 'elan4' and
- 'rms' device drivers. Pdsh can run independently of the RMS product (the
- 'rms' kernel module, which is used by pdsh, is a distinct facility from
- the RMS product).
- Quadrics has provided a PDSH FAQ which may answer some common questions
- about getting the qshell module to run MPI jobs. Please see
- http://web1.quadrics.com/twiki/bin/view/FAQs/SetupPDSH
- rms pdsh module
- ---------------------------------------
- Pdsh can also be run via the Quadrics RMS 'allocate' command such that
- allocate takes care of the node reservations and passes a batch ID through
- to pdsh via the RMS_RESOURCEID environment variable. Pdsh retrieves
- the list of allocated nodes out of the RMS database using the rmsquery
- command. This functionality is provided by the "rms" pdsh module
- (--with-rms).
- slurm pdsh module
- ---------------------------------------
- Similar to the rms pdsh module, the slurm module allows pdsh to target
- nodes based on SLURM allocations, either targetting an already running job
- or by running under ``srun --allocate'' The SLURM jobid can be passed to
- pdsh using the `-j' option provided by the module, or via the SLURM_JOBID
- environment variable, which is set by --allocate.
- The `/etc/elanhosts' config file
- ---------------------------------------
- Pdsh uses a simple config file, /etc/elanhosts, to describe hosts
- containing Elan adapters (and on which Elan MPI jobs may be run). The
- config file is also used by the daemons qshd and mqshd to initialize
- the Elan network error resolver thread. Parsing of the /etc/elanhosts
- file is accomplished by using the libelanhosts(3) library, upon which
- pdsh depends (when building for QsNet). See the elanhosts(5)
- man page for a description of the /etc/elanhosts file format.
- The libelanhosts package may be obtained from
- ftp://ftp.llnl.gov/pub/linux/libelanhosts/