/README.QsNet

https://code.google.com/ · Unknown · 49 lines · 38 code · 11 blank · 0 comment · 0 complexity · c875bb9030a266495a389ccae54496cc MD5 · raw file

  1. Running Elan MPI jobs on QsNet Clusters
  2. ---------------------------------------
  3. If built with --with-qshell or --with-mqshell, pdsh may be used to run
  4. MPI jobs on a QsNet interconnect. This option requires that you have
  5. the Elan user space libraries installed (qsnetlibs or qswelan-r RPM for
  6. Linux) and that your kernel be patched to run the 'elan3' or 'elan4' and
  7. 'rms' device drivers. Pdsh can run independently of the RMS product (the
  8. 'rms' kernel module, which is used by pdsh, is a distinct facility from
  9. the RMS product).
  10. Quadrics has provided a PDSH FAQ which may answer some common questions
  11. about getting the qshell module to run MPI jobs. Please see
  12. http://web1.quadrics.com/twiki/bin/view/FAQs/SetupPDSH
  13. rms pdsh module
  14. ---------------------------------------
  15. Pdsh can also be run via the Quadrics RMS 'allocate' command such that
  16. allocate takes care of the node reservations and passes a batch ID through
  17. to pdsh via the RMS_RESOURCEID environment variable. Pdsh retrieves
  18. the list of allocated nodes out of the RMS database using the rmsquery
  19. command. This functionality is provided by the "rms" pdsh module
  20. (--with-rms).
  21. slurm pdsh module
  22. ---------------------------------------
  23. Similar to the rms pdsh module, the slurm module allows pdsh to target
  24. nodes based on SLURM allocations, either targetting an already running job
  25. or by running under ``srun --allocate'' The SLURM jobid can be passed to
  26. pdsh using the `-j' option provided by the module, or via the SLURM_JOBID
  27. environment variable, which is set by --allocate.
  28. The `/etc/elanhosts' config file
  29. ---------------------------------------
  30. Pdsh uses a simple config file, /etc/elanhosts, to describe hosts
  31. containing Elan adapters (and on which Elan MPI jobs may be run). The
  32. config file is also used by the daemons qshd and mqshd to initialize
  33. the Elan network error resolver thread. Parsing of the /etc/elanhosts
  34. file is accomplished by using the libelanhosts(3) library, upon which
  35. pdsh depends (when building for QsNet). See the elanhosts(5)
  36. man page for a description of the /etc/elanhosts file format.
  37. The libelanhosts package may be obtained from
  38. ftp://ftp.llnl.gov/pub/linux/libelanhosts/