Copyright 1998-1999, University of Notre Dame.
Authors: Jeffrey M. Squyres, Kinis L. Meyer with M. D. McNally 
         and Andrew Lumsdaine

This file is part of the Notre Dame LAM implementation of MPI.

You should have received a copy of the License Agreement for the Notre
Dame LAM implementation of MPI along with the software; see the file
LICENSE.  If not, contact Office of Research, University of Notre
Dame, Notre Dame, IN 46556.

Permission to modify the code and to distribute modified code is
granted, provided the text of this NOTICE is retained, a notice that
the code was modified is included with the above COPYRIGHT NOTICE and
with the COPYRIGHT NOTICE in the LICENSE file, and that the LICENSE
file is distributed with the modified code.

LICENSOR MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED.
By way of example, but not limitation, Licensor MAKES NO
REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY
PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE COMPONENTS
OR DOCUMENTATION WILL NOT INFRINGE ANY PATENTS, COPYRIGHTS, TRADEMARKS
OR OTHER RIGHTS.

Additional copyrights may follow.


Installation instructions for LAM 6.3
=========================================

This file contains the installation instructions for LAM/MPI version
6.3.  There are also some tips for writing/developing/running
parallel programs, especially in parallel environments that are
clusters of workstations.  Here's a brief table of contents:

     * For the impatient
     * Unpacking the distribution
     * Applying patches
     * Configuration
       - 64 bit LAM
     * Building LAM
     * Boot schema
     * Using LAM
       - Typical usage
       - Starting LAM
       - Common filesystems
       - Using LAM with AFS
       - Using LAM with ssh
     * Troubleshooting
       - Problems with building LAM
       - Problems with running LAM and/or user programs
       - Insufficient shared resources
     * Clearing disk space
     * Tuning LAM


For the impatient
-----------------

If you don't want to read the rest of the instructions, the following
should do the trick for most situations:

     % gunzip -c lam-6.3.tar.gz | tar xf -
     % cd lam-6.3
     % ./configure --prefix=/path/to/install/in
     [...lots of output...]
     % make
     [...lots of output...]

Note that "make install" is not necessary; LAM copies over executables
to the $prefix tree as it builds itself.  If you do not specify a
prefix, LAM will use /usr/local/lam-6.3.

Now go read the RELEASE_NOTES file; it contains all the information
about the new features of this release of LAM/MPI.

Common causes of failure:

     - No C++ compiler installed; use --without-mpi2cpp configure option
     - C++ compiler does not support required C++ features; use
       --without-mpi2cpp configure option
     - No Fortran compiler installed; user --without-fc configure option
	

Unpacking the distribution
--------------------------

The LAM distribution is packaged as a compressed tape archive,
lam-6.3.tar.Z or lam-LAMRV.tar.gz.  It is available from the main
LAM web site: http://www.mpi.nd.edu/lam/.

Uncompress the archive and extract the sources. 

     % gunzip -c lam-6.3.tar.gz | tar xf -

or 

     % uncompress -c lam-6.3.tar.Z | tar xf -


Configuration
-------------

LAM uses a GNU configure script to perform site and architecture
specific configuration.

Change directory to the top level LAM directory (lam-6.3) and run the
configure script.

     % ./configure {options}

or 

     % sh ./configure {options}

By default the configure script sets the LAM install directory LAMHOME
to /usr/local/lam-6.3.  This can be overridden with the --prefix
option (see below).

The configure script will create several configuration files:
config.mk, share/h/lam_config.h, share/h/rpi.tcp.h, and
share/h/rpi.shm.h. You may wish to inspect these files for a sanity
check, but ./configure usually guesses correctly.

The configure script recognizes the following options (shown here in
alphabetical order):

--enable-echo

     Will echo all of the commands that configure executes.  This is
usually for debugging purposes only, and is not recommended for end
users.

--prefix=PREFIX

     Sets the installation location LAMHOME for the LAM binaries,
libs, etc., to directory PREFIX.  PREFIX must be specified as an
absolute directory name.

--with-cc=CC

     Use the C compiler CC.  The C compiler can also be selected by
setting the "CC" environment variable before running configure.  This
compiler will be used both to compile LAM, and as the default compiler
for the hcc(1) and mpicc(1) wrapper compilers.

--with-cflags=CFLAGS

     Use the C compiler flags CFLAGS.  The flags passed to the C
compiler can also be selected by setting the "CFLAGS" environment
variable before running configure.  These flags are used to compile
LAM, ROMIO, and some example programs that come with LAM.  If CFLAGS
are not specified, ./configure will pick optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-ldflags=LDFLAGS

     Use the LD linker flags LDFLAGS.  If this flag is not set on the
./configure command line, the value for CFLAGS is used.  These flags
are used to link LAM executables and all example programs that come
with LAM.  If LDFLAGS (and CFLAGS) are not specified, ./configure will
pick optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-cpp=CXX

     Use the C++ compiler CXX.  The C++ compiler can also be selected
by setting the "CXX" environment variable before running configure.
This compiler will be used to compile the MPI 2 C++ bindings and as
the default compiler for the hcp(1) and mpiCC(1) wrapper compilers.

--with-cppflags=CXXFLAGS

     Use the C++ compiler flags CXXFLAGS.  The flags passed to the C
compiler can also be selected by setting the "CXXFLAGS" environment
variable before running configure.  These flags will be used when
compiling the MPI 2 C++ bindings, as well as some example programs
that come with LAM.  If CXXFLAGS are not specified, ./configure will
pick optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-fc=FC

     Use the Fortran compiler FC.  Specify FC=no (or --without-fc) to
disable Fortran support if you do not have a Fortran compiler or do
not require such support.  This compiler will be used both to compile
LAM, and as the default compiler for the hf77(1) and mpif77(1) wrapper
compilers.

--with-fflags=FFLAGS

     Use the Fortran compiler flags FFLAGS when compiling LAM.  The
flags passed to the Fortran compiler can also be selected by setting
the "FFLAGS" environment variable before running configure.  These
flags will be used only when compiling some example programs that come
with LAM.  If FFLAGS are not specified, ./configure will pick
optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-rpi=RPI

     Build with request progression interface (RPI) transport layer
RPI [RPI=tcp]. RPI must one of: tcp, sysv, or usysv.  If this option
is not specified, the RPI transport layer defaults to tcp.  Please
refer to the RELEASE_NOTES file for descriptions of the RPI transport
layers.

--with-pthread-lock

     Use a process shared pthread mutex to lock access to the shared
memory pool rather than the default SYSV semaphore.  This option is
only valid with the "usysv" RPI, and on systems which support process
shared pthread mutexes.

--with-purify

     Causes LAM to zero out all data structures before using them.
This option is not necessary to make LAM function correctly (LAM
already zeros out relevant structure members when necessary), but it
is very helpful when running MPI programs through memory checking
debuggers, such as purify and the Solaris Workshop bcheck.  See the
"Zeroing out LAM buffers before use" section of the RELEASE_NOTES file
for more information.  The default is to not enable this option.

--with-rsh=RSH

     Use RSH as the remote shell command. For example if you want to
use the secure shell ssh then specify --with-rsh="ssh -x" (note that
the "-x" is necessary to prevent the ssh 1.x series of clients from
sending its standard banner information to standard error, which will
cause recon/lamboot/etc. to fail).  This shell command will be used to
launch commands on remote nodes from binaries such as lamboot, wipe,
etc.  The command can be one or more shell words, such as a command
and multiple command line switches.

This value can be overridden at recon/lamboot/etc. run time with the
LAMRSH environment variable.  See the RELEASE_NOTES file for more
details.

--with-select-yield

     Force the use of select() to yield the processor. 

--with-shared

     Build shared libraries. This is currently only supported on LINUX
with the gcc compiler and on Solaris with the Sun Workshop compiler.
If more people want this option, please let us know.  Note that this
option is incompatible with --with-romio because ROMIO expects to find
libmpi.a, not libmpi.so.

--with-shm-maxalloc=BYTES

     Use BYTES as the size of the maximum allocation from the shared
memory pool.  If no value is specified, configure will set the size
according to the value of shm-poolsize (below).  See "Usysv and Sysv
transports", below.

--with-shm-poolsize=BYTES

     Use BYTES as the size of the shared memory pool.  If no size is
specified, configure will determine a suitably large size to use.  See
"Ususv and Sysv transports", below.

--with-shm-short=BYTES

     Use BYTES as the maximum size of a short message when
communicating via shared memory.  Default is 8 KB.

--with-signal=SIGNAL

     Use SIGNAL as the signal used internally by LAM. The default
value is "SIGUSR2". To set the signal to "SIGUSR1" for example,
specify --with-signal=SIGUSR1.

--with-tcp-short=BYTES

     Use BYTES as the maximum size of a short message when
communicating over TCP.  Default is 64 KB.

--without-mpi2cpp

     Build LAM without the MPI-2 C++ bindings (the default is to build
them).  The C++ bindings are known to only work on certain systems.
Consult the mpi2c++/README file for more information.

--with-romio

     Build LAM with ROMIO support.  Romio is known to only work on
certain systems.  Consult the romio/README file for more information.
Note that this option is incompatible with --with-shared, because
ROMIO expects to find libmpi.a, not libmpi.so.


--without-shortcircuit

     Disable the send/receive short circuiting optimization. This
optimization has not been tested as thoroughly as we would like, hence
this option to disable it.


Example: 

 % ./configure --with-rpi=usysv --with-cc=/bin/cc --with-cflags=-O4 -with-fc=no

Compile for the usysv RPI using the C compiler /bin/cc with options
-O4 and disable Fortran support.


64 bit LAM
----------

LAM has been verified as being 64 bit clean under Solaris 7.  To
compile LAM with the 64 bit architecture, you will likely need to add
compiler and linker flags with configure.  For example, if you are
using the Solaris Workshop 5.0 compilers on Solaris 7, you can use the
following:

     % ./configure --with-cflags='-xarch=v9' --with-ldflags='-xarch=v9'

Other compilers/architectures will have their own flags to enable 64
bit compilation; consult the documentation for your compiler.  Of
course, you can also add in any debugging/optimization flags in the
cflags and ldflags strings as well.


Building LAM
------------

Once the configuration step has completed, build LAM by doing:

     % make

in the top level LAM directory. This will build the LAM binaries and
libraries, and install them together with header files, system files
and man pages in LAMHOME/bin, LAMHOME/include, LAMHOME/lib,
LAMHOME/boot and LAMHOME/man.  Preexisting files in these directories
may be overwritten.

-- Name shifted MPI library

By default, the name shifted PMPI_* entry points are not built into
the LAM MPI library.  To build a libpmpi containing these entry points
run make with the the target 'profile' after building the executables
and libraries as described above.

     % make profile 

-- Building LAM, ROMIO, and MPI 2 C++ examples

LAM and the ROMIO and MPI-2 C++ packages all include example code that
can be built with a single top-level make:

     % make examples

This will do the following (where TOPDIR is the top-level directory of
the LAM source tree):

  1. Build the LAM examples.  They are located in:

        TOPDIR/examples

  2. If you configured LAM with --with-romio, the ROMIO examples will
     be built.  See the notes about ROMIO in the RELEASE_NOTES file.
     They are located in:

        TOPDIR/romio/test

  3. If LAM was configured to build the C++ examples (i.e., if you did
     not configure with --without-mpi2cpp), the MPI 2 C++ examples
     will be built.  They are located in:

        TOPDIR/mpi2c++/contrib

Additionally, the following three commands can be used to build each
of the packages' examples separately (provided that support for them
was compiled in to LAM) from TOPDIR:

     % make lam-examples
     % make romio-examples
     % make mpi2c++-examples


Boot schema
-----------

A boot schema is a description of a multicomputer on which LAM will be
run.  You can create boot schema files (see bhost(5) for syntax) for
typical configurations of the local multicomputer(s).  Place these
files under boot/ in the installation directory.  They will be found
by LAM tools such as lamboot(1), recon(1) and wipe(1) if you do not
specify a filename on the command line to use instead of the default.

The default boot/bhost.def file comes with a single line:

	localhost

So that if you simply do "lamboot", you will get a LAM with one node
(the localhost) booted.

You can re-write the boot/bhost.def file if you are frequently going
to boot LAM to the same configuration.  For example, if you frequently
use 4 workstations: inky, blinky, pinky, and clyde, you can have a
boot/bhost.def files as follows:

	inky
	blinky
	pinky
	clyde lamrocks

You can also specify different remote usernames on the remote nodes;
the username "lamrocks" is used on the machine "clyde" in the above
example.


Using LAM
---------

If the LAM installation directory is moved after it is built, users
must set the LAMHOME environment variable to the new location.  On
each UNIX machine, users must add the LAM executable directory to
their shell's search path.  LAM executables are found under
LAMHOME/bin.  These steps must be taken on each and every machine that
might be part of a multicomputer running LAM.  Set the variables in
the shell's start-up file, **not the .login file***.


--- Typical usage

LAM is a daemon-based implementation of MPI.  This means that a daemon
process is launched on each machine that will be in the parallel
environment.  Once the daemons have been launched, LAM is ready to be
used.  A typical usage scenario is as follows:

     - Boot LAM on all the nodes
     - Run MPI programs
     - Shut down LAM

LAM does not need to be booted in order to compile MPI programs.

LAM is a user-based MPI environment; each user who wishes to use LAM
must boot their own LAM environment.  LAM is not a client-server
environment where a single LAM daemon can service all LAM users on a
given machine.  There are no future plans to make LAM client-server
oriented (unless someone volunteers to write it).

As a side-effect of this design, each user must have an account on
each machine that they wish to use LAM on. 


--- Starting LAM

The recon(1) tool checks if LAM can be started on the given boot
schema.  There are several prerequisites that enable LAM to be started
on a remote machine:

     * The machine must be reachable and operational. 
     * The user must have an account on the machine. 
     * The user must be able to rsh(1) to the machine (permissions
        must be set in either the /etc/hosts.equiv file or the user's
        .rhosts file on the machine).
     * The LAM executables must be locatable on that machine, using
       the shell's search path and possibly the LAMHOME environment
       variable, as described above.
     * The shell's start-up script must not print anything on standard
       error. The user can take advantage of the fact that rsh(1) will
       start the shell non-interactively. The start-up script can exit
       early in this case, before executing many commands relevant
       only to interactive sessions and likely to generate output.

*All* of these prerequisites must be met before LAM will function
properly.  If recon does not complete successfully, the "-d" option
will give verbose descriptions of what it tried to do, and suggestions
to fix the problem.

Also keep in mind that just because recon works, lamboot itself may
still fail.  This usually happens when the "hboot" program (that
lamboot invokes on remote nodes) fails for some reason.  Again, the
"-d" option to lamboot will enable extremely verbose output, and
suggest solutions to common problems.

Users should read the lam(7) manual page to get started using LAM
tools and libraries.

Additionally, the University of Notre Dame offers a "Getting Started
with LAM" tutorial, that, although somewhat biased towards the Notre
Dame computing environment, is a good starting point to getting
familiar with LAM.  

     http://www.mpi.nd.edu/mpi_tutorials/lam/


--- Common filesystems

A common environment to run LAM is in a Beowulf-class or other
workstation cluster.  Simply stated, LAM can run on a group of
workstations connected by a network.  As mentioned above, there are
several prerequisites, however (the user must have an account on all
the machines, the user can rsh [or ssh, or whatever other remote shell
transport capability is desired -- see above for how to change the
underlying remote shell transport] to all the machines, etc.).

This raises the question for LAM system administrators: where to
install the LAM binaries, header files, etc.?  There are two main
choices:

1. Have a common filesystem, such as NFS, between all the machines to
be used.  Install the LAM files such that the LAMHOME environment
variable can be set to the *same value* on each node.  This will
*greatly* simplify user's .cshrc/.profile scripts -- the value of
LAMHOME can be set without checking which machine the user is on.  It
also simplifies the system administrator's job; when the time comes to
patch or otherwise upgrade LAM, only one copy needs to be modified.  

For example, consider a cluster of four machines: inky, blinky, pinky,
and clyde.  If the LAM binaries et al. are installed on inky's local
hard drive in the directory /home/lam, the system administrator has
two main choices:

  - mount inky:/home/lam on the remaining three machines, such that
/home/lam on all machines is effectively "the same".  That is, the
following directories all contain the LAM binaries:

	inky:/home/lam
	blinky:/home/lam
	pinky:/home/lam
	clyde:/home/lam

  - mount inky:/usr/local/src/lam-6.3 on *all four* machines in some
other common location, such as /home/lam (a symbolic link can be
installed on inky instead of a mount point for efficiency).  This
strategy is typically used for environments where one tree is NFS
exported, but another tree is typically used for the location of
binaries.  For example, the following directories all contain the LAM
binaries:

	inky:/home/lam
	blinky:/home/lam
	pinky:/home/lam
	clyde:/home/lam

Notice that there are the same four directories as the previous
example, but on inky, the directory is *actually* located in
/usr/local/src/lam-6.3.  There is a bit of a disadvantage in this
approach; each of the remote nodes have to incur NFS (or whatever
filesystem is used) delays to access the LAM directory tree.  However,
both the administration ease and low cost (relatively speaking) of
using a networked file system usually greatly outweighs the cost.


2. If you are concerned with networked filesystem costs of accessing
the LAM binaries, you can install LAM on the local hard drive of each
node in your system.  Again, it is *highly* advisable to install LAM
in the *same* directory on each node so that LAMHOME can be set to the
same value, regardless of the node that a user has logged on to.

This approach will save some network latency of accessing the LAM
binaries, but is only used where users are very concerned about
squeezing every spare cycle out of their machines.


--- Using LAM with AFS

AFS has some peculiarities, especially with file permissions when
using rsh.  However, most sites tend to install the Transarc rsh
replacement (i.e., the one that passes tokens to the remote machine)
as the default rsh, so when you "rsh" to a remote machine (with recon
or lamboot), your AFS token will be passed to the remote LAM daemon
automatically.  If your site does not install the Transarc replacement
rsh as the default, consult the documentation on "--with-rsh" (above)
to see how to set the path to the rsh that LAM will use.

Once you use the replacement rsh, you should get a token on the other
side.  This means that your LAM daemons are running with your AFS
token, and you should be able to run any program that you wish,
including those that are not system:anyuser accessible.  You will even
be able to write into your filespace (as you would expect).

Keep in mind, however, that AFS tokens have limited lives, and will
eventually expire.  This means that your LAM daemons (and user MPI
programs) will lose their AFS permissions after some specified time
unless you renew your token (with the "klog" command, for example) on
the originating machine before the token runs out.  This can play
havoc with long-running MPI programs that periodically write out file
results; if you lose your AFS token in the middle of a run, and your
program tries to write out to a file, it won't have permission to,
which may cause Bad Things to happen.

If you need to run long MPI jobs with LAM on AFS, it is usually
advisable to ask your AFS administrator to increase your default token
life time to a large value, such as 2 weeks.


--- Using LAM with ssh

Note that you can change the remote transport agent that LAM uses to
spawn the LAM daemons.  While rsh is the default, it can be changed to
other agents, such as ssh.  

ssh is a popular choice because of the added security that it provides
over the .rhosts security provided by rsh.  And since ssh can pass AFS
tokens, it presents an attractive, highly secure, yet
fully-AFS-authenticated method, for invoking LAM.

If you choose to use ssh, the 1.x series of ssh will require the use
of the "-x" command line flag to prevent ssh from printing its
standard banner information to stderr.  lamboot/recon/etc. interprets
information on stderr to mean that a remote invocation has failed;
ssh's "-x" will prevent this.  (We do not have access to SSH 2.x
clients -- they may require a similar command line flag).

Note that using ssh (or any other agent) only changes the way that LAM
is *invoked*.  Once LAM is invoked, it sets up its own sockets for
communication that are outside of ssh (and are therefore not
encrypted).  ssh provides stronger security only during lamboot and
wipe.  Once the LAM daemons are launched, all MPI meta information is
passed through separate channels (such as startup of user programs)
which are independent of ssh.


Troubleshooting
---------------

--- Problems with building LAM

It is highly recommended that you execute the following steps *in
order*.  Many people have similar problems with configuration and
initial setup of LAM, and most common problems have already been
answered in one way or another.

1. Check the LAM FAQ:

        http://www.mpi.nd.edu/lam/faq/

2. Check the mailing list archives.  Use the "search" features to
check old posts and see if others have asked the same question and
had it answered:

	http://www.mpi.nd.edu/MailArchives/lam/

3. If you do not find a solution to your problem in the above
resources, and your problem specifically has to do with *building*
LAM, send the following information to the LAM mailing list (see the
next section below about sending mail to the LAM mailing list):

a. The output of "uname -a"
b. The config.mk file (in the top level LAM build directory)
c. The config.log file (also in the top level LAM build directory)
d. The share/h/lam_config.h file
e. The output from when you ran "./configure"
f. The output from when you ran "make"

To capture the output of the configure and make steps you can use the
script command or the following technique if using a csh style shell:

     % ./configure {options} |& tee config.LOG
     % make install          |& tee make.LOG

or if using a Bourne style shell:

     % ./configure {options} 2>&1 | tee config.LOG
     % make install 2>&1          | tee make.LOG


--- Sending mail to the LAM mailing list

Due to problems with spam, only subscribers are allowed to post to the
list.  To subscribe to the list, send mail to the following address:

	majordomo@mpi.nd.edu

with the line "subscribe lam" in the body of the e-mail (the contents
of the Subject line are irrelevant).  You may unsubscribe at any time
by sending an e-mail with "unsubscribe lam" in the body of the message
to the same majordomo address.

After you have subscribed (and received a confirmation e-mail), you
can send mail to the list at the following address:

	lam@mpi.nd.edu

NOTE: People tend to only reply to the list; if you subscribe, post,
and then unsubscribe from the list, you will likely miss replies 

Also please be aware that lam@mpi.nd.edu is a list that goes to a few
hundred people around the world -- it is not uncommon to move a
high-volume exchange off the list, and only post the final resolution
of the problem/bug fix to the list.


--- Problems with running LAM and/or user programs

Check the LAM FAQ and mailing list archive resources mentioned in the
previous section (Problems with building LAM).  If you do not find the
solution to your problem there, send mail to the LAM mailing list:
lam@mpi.nd.edu.

Some typical problems with rsh include the following:

     * Incorrect permissions on a user's home directory
     * Incorrect permissions on $HOME/.rhosts
     * No entry (or incorrect entry) in $HOME/.rhosts

Some typical problems with a user's environment include the following:

     * User's .cshrc (or .profile) does not set the LAMHOME
       environment variable
     * User's .cshrc/.profile does not put $LAMHOME/bin in the path
     * Inaccessible permissions on the program that you are trying to
       run


Clearing disk space
-------------------

After LAM has been built, all of the objects can be removed by running
the make(1) utility with the "clean" target in the source directory.

     % make clean 

If you're *really* desperate for more space, a bit more space can be
reclaimed by running:

     % make distclean

If further space is required, the entire source directory can be taken
off-line (indeed, "make distclean" returns the LAM source tree to the
same state as it was when it was unpacked from the original
distribution tarball).  Only the installation directory need be
maintained on-line.


Tuning LAM
----------

There are various constants defined in the LAM header files which
relate to message transfer protocols, shared memory allocation, and so
on.  Some of these are configurable via the configure script; it is
hoped that in time, more and more options will be configurable.

This section is intended to describe some of these constants so that
LAM users can experiment with tuning the MPI library.  It also
provides some description of the transport layer internals which may
help LAM users better understand the behavior and performance they see
from the LAM MPI library.


--- Short/long protocol

LAM MPI uses a short/long message protocol. If a message is "short",
it is sent together with a header in one transfer to the destination
process.  If the message is "long", then a header (possibly with some
data) is sent to the destination.  The sending process then waits for
an acknowledgment from the receiver before sending the rest of the
message data.  The receiving process sends the acknowledgment when a
matching receive is posted.

The crossover point from "short" to "long" message is configurable in
each transport.  See the transport specific section tcp, sysv, or
usysv for further information.


-- Shortcircuit send/receive

Typically, when a message is sent or received, LAM creates a request
structure, fills it with information about the message, links the
request into a list of messages, and calls a progression "engine" to
effect the data transfer.

When there are no active requests and a blocking (standard mode) send
or receive is done, the overhead of creating the request and linking it
into the list can be bypassed (shortcircuited) and the progression
"engine" called directly to effect the transfer.

This optimization has not been tested as thoroughly as we would like,
so a configure option is provided to disable it.  If you suspect that
the optimization may be causing problems, you can disable it with the
--without-shortcircuit option to the configure script.  To date,
however, we have found it to be very reliable.


--- TCP transport

The crossover point from "short" to "long" message is configurable via
the constant TCPSHORTMSGLEN in share/h/rpi.tcp.h (relative to the top
of the LAM build tree).  It can also be set from the configure script
via the --with-tcp-short option.  The default is 64KB.


--- Usysv and sysv transports

Descriptions of the usysv and sysv transports can be found in the "RPI
transport layers" section of the RELEASE_NOTES file.

Configuration constants for the usysv and sysv transports are found in
share/h/rpi.shm.h (from the top of the LAM build directory).

In these transports, processes on different nodes communicate via TCP
sockets.  The crossover point from "short" to "long" messages for
these communications is configurable via the constant TCPSHORTMSGLEN.
It can also be set from the configure script via the --with-tcp-short
option.  The default is 64KB.

Processes located on the same node communicate via shared memory.  The
transport allocates one SYSV shared segment shared by all processes in
the tasks which are on the node.  This segment is logically divided
into two areas.

The "postbox" area contains postboxes for "short" message
communication.  A postbox is used for communication one-way between
two processes.  The space allocated per postbox is SHMSHORTMSGLEN +
CACHELINESIZE.  SHMSHORTMSGLEN is configurable (via the configure
option --with-shm-short).  It is the the crossover point from "short"
to "long" messages in shared memory communication; the default value
is 8 KB.

CACHELINESIZE must be the size of a cache line or a multiple thereof.
The default setting is 64 bytes.  You shouldn't need to change it.
CACHELINESIZE bytes in the postbox are used for a cache-line sized
synchronization location.

The size of the postbox area is np (np-1) (SHMSHORTMSGLEN +
CACHELINESIZE) bytes.

The rest of the shared memory area is used as a global pool from which
space for long message transfers is allocated.  Allocation from this
pool is locked.  The default lock mechanism is a SYSV semaphore but
the configure option --with-pthread-lock can be used to change this to
a process shared pthread mutex lock.  The size of this pool is
configurable via the constant LAM_MPI_SHMPOOLSIZE, and by the configure
option --with-shm-poolsize.

The configure script will try to determine a size for the pool if none
is explicitly specified.  You should always check this to see if it is
reasonable.  Larger values should improve performance especially when
an application passes large messages, but will also increase the
system resources used by each task.

The total size of the shared segment allocated is 2 CACHELINESIZE +
LAM_MPI_SHMPOOLSIZE + np (np-1) (SHMSHORTMSGLEN + CACHELINESIZE).  The
2 CACHELINESIZE bytes are for the global pool lock.


--- Use of the global pool

When a message larger than 2 SHMSHORTMSGLEN is sent, the transport
sends SHMSHORTMSGLEN bytes with the first packet.  When the
acknowledgment is received, it allocates (message length -
SHMSHORTMSGLEN) bytes from the global pool to transfer the rest of the
message.

To prevent a single large message transfer from monopolizing the
global pool, allocations from the pool are actually restricted to a
maximum of LAM_MPI_SHMMAXALLOC bytes.  Even with this restriction, it
is possible for the global pool to temporarily become exhausted.  In
this case, the transport will fall back to using the postbox area to
transfer the message.  Performance will be degraded, but the
application will progress.

LAM_MPI_SHMMAXALLOC is configurable via the configure option
--with-shm-maxalloc or editing rpi.shm.h.


--- Synchronization

The usysv and sysv transports differ only in the mechanism used to
synchronize the transfer of messages via shared memory.  The usysv
transport uses spin locks with back-off, while the sysv transport uses
SYSV semaphores.

Both transports use a few SYSV semaphores for synchronizing the
deallocation of shared structures or for synchronizing access to the
shared pool.

The usysv transport should be superior to the sysv transport on
multiprocessors.  On uniprocessors, which is better depends on the OS
and the means used for processor yielding.  On a Linux uniprocessor,
for example, using semaphores (sysv transport) appears to be vastly
superior to spin-locking.


--- Usysv transport spin-locks

The usysv transport uses spin locks with back-off.  When a process
backs off, it attempts to yield the processor.  If the configure
script found a system provided yield function such as yield() or
sched_yield(), this is used. If no such function is found, then
select() on NULL file descriptor sets with a timeout of 10us is used.

The use of select() to yield can be forced by the --with-select-yield
option to the configure script.


--- Sysv transport semaphores

The sysv transport allocates a semaphore set (of size 6) for each
process pair communicating via shared memory.  On some systems, you
may need to reconfigure the system to allow for more semaphore sets if
running tasks with many processes communicating via shared memory.
