-
Introduction
This software (MPJ Express - MPJE) is a reference implementation of the
MPI bindings defined for
the Java language. The current version of this software is following the
mpiJava 1.2 API specification .
We plan to add support for the MPJ API in a subsequent release. It is
important to note that the difference between these two APIs is in the
naming schemes
for classes and methods. The functionality provided to users
is essentially the same for both APIs.
This release contains
the source code and binaries of MPJ Express library, as well as the
runtime infrastructure. We have developed a test suite that imports
various test cases from mpiJava; it also has a number of new test cases. This test
suite checks the functionality of almost every MPI function. See
the section on "MPJ Express test suite" for further
details.
This software
has been tested on various UNIX and Windows operating systems. See the
section "Tested Platforms" for a list of tested platforms.
There are two fundamental ways of running MPJE applications.
The first, and the recommended way is using the MPJ Express
runtime infrastructure,
alternatively the second way involves the 'manual' start-up of
MPJE processes.
The MPJ Express runtime infrastructure consists of daemons and
the mpjrun module. The idea is, that the users of MPJ Express first
start daemons on a number of compute-nodes, which in this document means the
machines that execute MPJE processes. These can also be thought
as the compute-nodes of a cluster. Once the daemons are running
on compute-nodes, then the users can use the mpjrun module
(using mpjrun.sh or mpjrun.bat scripts) on the cluster's head-node,
which contacts the daemons, starting the MPJE
application, and
transports output back to the head-node so that users can view the
progress of their programs during execution. The MPJ Express runtime infrastructure is able to run
the code as JAR or class files. The runtime infrastructure provides
the notion of local loaders and remote loaders. A user may prefer to use
local loaders if
the compute-nodes and head-node have a shared file system, and the MPJ Express
JAR files as well as the user application is available locally on the
compute-nodes. On the other hand, remote loaders can be used in cases
where there is no shared file system on the compute-nodes, and the MPJ Express
JAR files and the user applications have to be fetched from the head-node.
The second way, which is referred in this document as 'manual', is to
run the shell script 'runmpj.sh'
that uses SSH to execute the code. This script is able to
run JAR or class files, but it is only possible to use
this script on UNIX-based operating systems.
For Windows, running test cases and applications manually means
starting each MPJE process by using the java command.
The MPJ Express infrastructure does not deal with security in the current
release. The MPJ Express daemons could be a security concern, as these are
Java applications listening on a port to execute user-code. It is therefore
recommended that the daemons run behind a suitably configured firewall, which
only listens to trusted machines.
In a normal scenario, these daemons would be running
on the compute-nodes of a cluster, which are not accessible to outside
world. Alternatively, it is also possible to start MPJE processes
'manually', which could help avoid runtime daemons. In addition,
each MPJE process starts at least one server socket, and thus
is assumed to be running on machine with configured firewall. Most MPI
implementations assume firewalls as protection mechanism from the outside
world.
-
Getting started
-
The pre-requisite for using MPJ Express is Java 1.5 (stable) or higher.
Make sure that you use the stable version because there is a bug in
Java 1.5 beta that affects MPJ Express. If you are interested in compiling the
source code of MPJ Express, see section "Compiling
the MPJ Express source code and test suite
-
Download
MPJ Express
and unpack it. This should create a folder named "mpj-v<version_number>".
- Set MPJ_HOME and PATH environmental variables.
-
Create a new working directory for MPJE programs. This document
assumes that the name of this directory is mpj-user. The location
of this directory is not important in the context of execution of
the code. This directory will hold users MPJE programs, machines
file, and configuration file (for manual execution)
-
Start the daemons
-
cd mpj-user
- Write a machines file simply stating a machine name or
IP address on
each line. Save this file as 'machines' in mpj-user directory. More
details on the format of machines file can be
found here
- Installing and starting daemons
- Linux:
mpjboot machines
- Windows: on each machine listed in machines file:
-
Run $MPJ_HOME/bin/installmpjd-windows.bat
-
Goto
Control-Panel->Administrative Tools->Services-> MPJ Daemon
and start the service. It is important to
start the daemon as a user process instead of a SYSTEM process.
Click here to see how can this be done.
- To test if the daemons have started on compute-node
- For Linux Only: Each daemon produces a
MPJ-Daemon<machine_name>.pid
file in $MPJ_HOME/bin directory.
- Each daemon produces a log file named
daemon-<machine_name>.log
in $MPJ_HOME/logs directory.
-
Running test cases
-
cd mpj-user
- Linux:
mpjrun.sh -np 2 -jar $MPJ_HOME/lib/test.jar
- Windows:
mpjrun.bat -np 2 -jar %MPJ_HOME%/lib/test.jar
- You may view sample output.
- Running your first MPJE application
- Write a MPJE program, and save
it as World.java. This document is assuming that you
have a 'machines' file in mpj-user directory.
-
cd mpj-user
- Compile
-
Linux:
javac -cp .:$MPJ_HOME/lib/mpj.jar World.java
-
Windows:
javac -cp .;%MPJ_HOME%/lib/mpj.jar World.java
- Execute
-
Linux:
mpjrun.sh -np 2 World
-
Windows:
mpjrun.bat -np 2 World
-
You may also make a JAR file 'hello.jar' that contains
World.class (see section "Writing and
compiling MPJE programs" for details) and execute it
- Linux:
mpjrun.sh -np 2 -jar hello.jar
- Windows:
mpjrun.bat -np 2 -jar hello.jar
-
Writing and compiling MPJE programs
-
Running MPJE programs with MPJ Express runtime
One of the challenging aspects of a Java messaging system is creating a
portable mechanism for bootstrapping MPJE
processes across various platforms.
If the compute-nodes are running a UNIX-based OS, it is possible to remotely
execute commands using RSH/SSH, but if the compute-nodes were running
Windows, these utilities would not be available.
The MPJ Express runtime provides a
unified way of starting MPJE processes on
compute-nodes irrespective of what operating system they may be using.
The runtime
system consists of two modules. The 'daemon' module runs on compute-nodes
and listens for requests to start MPJE processes. The daemon is simply
a Java application listening on an IP port, which starts a new JVM every time
there is a request to run a MPJE processes. The 'mpjrun' module
acts as a client to the daemon module. This module is
started on, for example, the cluster head-node,
and will contact daemons and return
standard output for the user to view.
With Java, it is possible to run applications using class files, or
class files bundled as a JAR file. The MPJ Express runtime allows the
execution of
MPJE applications both as JAR files and class files. With MPJ Express,
the users
may want to load MPJE JARs and classes either remotely
or locally
on the compute-nodes. With remote loader, it is possible to load
all classes (application and MPJ Express code) from the head-node. This
is useful
in scenarios when there is no shared file system and the code is constantly
being modified at the head-node. With local loader, it is possible to load
all classes (application and MPJ Express code) from the compute-node.
This might
be useful if there is a shared file system. As all classes are loaded
locally, this might provide better performance in comparison to
remote loader. The default loader used in MPJ Express runtime
infrastructure is remote loader.
'mpjrun' module provides -jar switch to execute JAR files and no switch
is required to execute class files. The users can select local loading
with the switch -localloader. The -wdir switch can be used to run the code
in the appropriate directory on the remote node.
When running JAR files using -localloader,
the users should put the JAR in the CLASSPATH using -cp switch.
MPJ Express uses the
Java Service Wrapper Project software to install daemons as a native
OS service. This essentially means that there is some platform
specific code used in order to achieve this. Currently, MPJ Express is
distributing
only Linux and Windows specific native code, but if you are
interested in running MPJ Express daemons on other platforms like
AIX, FreeBSD, HP-UX, HP-UX64. IRIX, MacOS, etc., then
you can download the platform specific code from
Java Service Wrapper Project . Some PATH variables in the
scripts for these platforms will have to be changed.
Feel free to contact us , if you need
any help regarding
this. The rest of this section explains how to install, start, stop, and
uninstall MPJ Express daemons on Linux and Windows. In addition, it
also shows how to run your MPJE programs using mpjrun module
on these platforms.
-
cd mpj-user
- This document assumes mpj-user as the working directory
for a user. The name mpj-user itself has no significance.
We assume that the user will create a machines file in this
directory. In addition, we assume that user's MPJE
program (World.class or hello.jar) will be present in this
directory when mpjrun script is invoked.
-
Write a machines file. This file is used by scripts like
mpjboot, mpjhalt, mpjrun.bat and mpjrun.sh to find out which machines to
contact. The 'machines' file format is explained in this subsection.
-
'machines' file is simply a file stating machinename, IP addresses,
or aliases of the nodes where you wish to execute MPJE processes.
This file is also used by mpjboot and mpjhalt to start and stop
daemons on Linux machines. Suppose you want to run a process
each on 'machine1' and 'machine2', then your machines file would be
as follows
machine1
machine2
Note that in the real-world, 'machine1' and 'machine2' would be
fully-qualified names, IP addresses, or aliases of your machines.
-
If you are executing mpjrun in directory called mpj-user,
a command like
mpjrun.sh -np 2 World
would assume a
'machines' file is present in this directory. If you have a list
of machines in a file (let us say) 'mymachines.txt' or in another
directory, then you can use -machinesfile switch to point mpjrun
to machines file. If you want to point mpjrun to mymachines.txt,
the exact command would be mpjrun.sh -machinesfile mymachines.txt
-np 2 World
. This is also applicable to mpjrun.bat
-
Multiple processes may be run on a machine. mpjrun would first see
how many processes the user has requested. Let us say the user has
requested two processes; then it would try to read the first two
entries in the machines file. If there were fewer than two entries
in machines file, then mpjrun would start two processes on the only
entry in machines file. Thus, it is not necessary to put the
names of machines twice in 'machines' file. The script should still
work if machine names have been repeated.
If you want to run two processes on localhost,
'machines' file would look like this
localhost
-
Installing, starting, stopping, and uninstalling MPJ Express daemons
- On Linux:
-
Starting daemons on a set of compute-nodes
-
mpjboot machines
This should work if $MPJ_HOME/bin has been successfully added
to $PATH variable. This script will SSH into each of the machine
listed in machines file, change directory to $MPJ_HOME/bin,
and execute mpjdaemon start
command to start
the daemon.
-
You will be asked for a password on remote machines if ssh-agent
has not been configured to allow login without asking for
password/pass-phrase. You may get some guidance
here about setting up password-less SSH
access to compute-nodes. But script will work even if you
do not have password-less SSH access.
-
$MPJ_HOME variable should be available on remote nodes. This
may be achieved by putting export statements in ~/.bashrc
file of the remote node.
-
Making sure that the daemon is running
-
Linux Only: Each daemon produces a
MPJ-Daemon<machine_name>.pid file in
$MPJ_HOME/bin directory.
-
Each daemon produces a log file named
daemon-<machine_name>.log
in $MPJ_HOME/logs directory.
-
You may optionally run the daemon as a service
- (Only as root)
Copy mpjdaemon script to /etc/init.d directory and add it to
default runtime level. On Gentoo GNU/Linux, this is,
-
rc-update add mpjdaemon default
-
It is also possible to run the daemon as a non-root user.
-
Shutting down MPJ Express daemons on a set of compute-nodes
-
On Windows:
-
Installing MPJ Express daemons
-
Click/run %MPJ_HOME%/bin/installmpjd-windows.bat
-
Starting daemons
-
Goto
Control Panel->Administrative Tools->Services->MPJ Daemon
and start the service. It is important to start the daemon as
a user process (preferably the currently logged in user)
instead of a SYSTEM process. To start the daemons
as user process, goto Control Panel->Administrative
Tools->-Services, right-click MPJ Daemon service, click
Properties, click "Log On" tab, For the option "Log on as:",
select This account and put in the user name and password of
this account, and start the service.
-
Making sure that the daemon is running
-
Linux Only:
Each daemon produces a
MPJ-Daemon<machine_name>.pid
file
in $MPJ_HOME/bin directory.
-
Each daemon produces a log file named
daemon-<machine_name>.log
in $MPJ_HOME/logs directory.
-
Stopping daemons
-
Goto Control Panel->Administrative Tools->Services->MPJ Daemon
and stop the service.
-
Uninstalling daemons
-
Click/run %MPJ_HOME%/bin/uninstallmpjd-windows.bat to
uninstall the daemon. This will have to be repeated
manually for each machine running the daemon.
- Configuring MPJ Express daemons using the configuration file
(Optional)
- There is a configuration file $MPJ_HOME/conf/wrapper.conf
that can be used to configure MPJ Express daemons. It is
important to note that
any options specified in this file would only affect MPJ Express
daemons,
not user applications. The JVM for MPJ Express daemons and
user applications
are different. For providing options to user processes,
JVM arguments or application arguments should be specified to
mpjrun.sh or mpjrun.bat script.
For a complete list of options that
can be used in wrapper.conf to configure MPJ Express
daemons, have a look
here
-
Running your MPJE program.
-
Running class files
-
Linux:
mpjrun.sh -np 2 World
-
Windows:
mpjrun.bat -np 2 World
-
Running JAR files
-
Linux:
mpjrun.sh -np 2 -jar hello.jar
-
Windows:
mpjrun.bat -np 2 -jar hello.jar
-
Passing arguments to the JVM running MPJE program
-
mpjrun.bat or mpjrun.sh script accepts all JVM arguments and would
forward these to the JVMs running MPJE processes on
compute-nodes. For instance, if the users would like to
pass -Xms512M and
-Djava.library.path=/tmp as two arguments to World program, the
exact command would be
- For Linux:
mpjrun.sh -np 2 -Xms512M -Djava.library.path=/tmp World
- For Windows:
mpjrun.bat -np 2 -Xms512M -Djava.library.path=c:/tmp World
-
Passing arguments to MPJE application.
-
Any arguments after "-jar <jarname>" or "classname" is treated as
application argument by mpjrun.sh and mpjrun.bat scripts.
MPI.Init(String[] args)
returns a String array that contains user
specified arguments. If the user has specified two arguments:
apparg1 and apparg2, then MPI.Init(..) returns an
array which has length 2, apparg1 at index 0, and apparg2 at index 1.
-
Running MPJE programs without MPJ Express runtime (manually)
We do not recommand starting programs manually as normal procedure.
This section documents the procedure for manual start-up, mainly to
allow developers the flexibility to create their own initiation
mechanisms for MPJE programs. The runmpj.sh script can be considered
one example of such a mechanism.
-
cd mpj-user
- This document is assuming mpj-user as the working directory
for users. The name mpj-user itself has no significance.
We assume that users will create configuration file in this
directory. In addition, we assume that the user's MPJE
program (World.class or hello.jar) will be present in this
directory at the time of execution of MPJE processes
-
Write a configuration file called 'mpj.conf' as follows.
- A typical configuration file that would be used to start two
MPJE processes is as follows. Note the names 'machine1' and
'machine2' would be replaced by aliases/fully-qualified-names/
IP-addresses of the machines where you want to start
MPJE processes
# Number of processes
2
# Protocol switch limit
131072
# Entry in the form of machinename@port@rank
machine1@20000@0
machine2@20000@1
-
The lines starting with '#' are comments. The first entry which is
a number ('2' above) represents total number of processes. The second
entry, which is again a number ('131072' above) is the protocol
switch limit. At this message size, MPJ Express changes its
communication
protocol from eager-send to rendezvous. There are a couple of entries,
one for each MPJE process, each in the form of
machinename(OR)IP@PORT_NUMBER@RANK. Using this, the users of
MPJ Express
can control where each MPJE process runs, what server port it uses,
and what should be the rank of each process. The rank specified here
should exactly match the rank argument provided while manually
starting MPJE processes (using java command). When the users
decide to run their code using mpjrun, this file is generated
programmatically.
- Sample configuration files can be found in $MPJ_HOME/conf directory.
If you wish to start MPJ processes on localhost, see
$MPJ_HOME/conf/local2.conf file.
-
Each MPJ process uses two ports. Thus, do not use consecutive ports
if you are trying to execute multiple MPJE processes on same node. A sample
file for running two MPJE processes on same machine would be
# Number of processes
2
# Protocol switch limit
131072
# Entry in the form of machinename@port@rank
localhost@20000@0
localhost@20002@1
-
Running your MPJE program.
-
The script runmpj.sh requires password-less SSH access to machines
listed in the configuration file. This script will not work if
your machines are not setup for this. You may get some guidance
here
regarding setting up SSH so that
no password/passphrase is required at login.
This is the only script in this
software which
requires password-less access. An alternative to using runmpj.sh
is the manual start-up (using java command directly -- see directions
below)
-
Running class files
-
Linux:
runmpj.sh mpj.conf World
-
Alternatively,
the directions for Windows should work also.
- Windows and Linux:
-
For all the machines listed in mpj.conf, login to each Windows
or Linux
machine, change directory to %MPJ_HOME% (Windows) or
$MPJ_HOME (Linux)
, and type,
-
Linux:
java -cp .:$MPJ_HOME/lib/mpj.jar World <rank> mpj.conf niodev
-
Windows:
java -cp .;%MPJ_HOME%/lib/mpj.jar World <rank> mpj.conf niodev
- The <rank> argument should be 0 for process 0 and
1 for process 1. This should match to what has been
written in configuration file (mpj.conf). Check the
entry format in the configuration file to be sure
of the rank.
-
Running JAR files
-
Linux:
runmpj.sh mpj.conf hello.jar
-
Alternatively, the directions for Windows should work
-
Windows and Linux:
-
For all the machines listed in mpj.conf, login to each Windows
or Linux
machine, and type,
-
Linux:
java -jar hello.jar <rank> mpj.conf niodev
-
Windows:
java -jar hello.jar <rank> mpj.conf niodev
- The <rank> argument should be 0 for process 0 and
1 for process 1. This should match to what has been
written in configuration file (mpj.conf). Check the
entry format in the configuration file to be sure
of the rank.
-
Passing arguments to the JVM running MPJE program
-
Edit $MPJ_HOME/bin/runmpj.sh shell script to pass the arguments
to the JVM.
-
Passing arguments to MPJE application.
-
Edit $MPJ_HOME/bin/runmpj.sh shell script to pass the arguments
to the application. MPI.Init(String[] args)
returns a String array that contains user
specified arguments. If the user has specified two arguments:
apparg1 and apparg2, then MPI.Init(..) returns an
array which has length 2, apparg1 at index 0, and apparg2 at index 1.
-
MPJ Express test suite
MPJ Express contains a comprehensive test suite to test the
functionality of almost
every MPI function. This test suite consists mainly of mpiJava test cases,
MPJ JGF benchmarks, and MPJ microbenchmarks.
The mpiJava test cases were originally developed by IBM and later translated to
Java. As this software follows the API of mpiJava,
these test cases can be used with a little modification.
MPJ JGF benchmarks are developed and maintained by
EPCC at the University of Edingburgh .
MPJ Express is redistributing these benchmarks as part of its test suite.
The original copyrights and license remain intact as can be seen
in source-files of these benchmarks in $MPJ_HOME/test/jgf_mpj_benchmarks.
Further details about these benchmarks
can be seen here.
MPJ Express also redistributes micro-benchmarks developed by
Guillermo Taboada .
Further details about these benchmarks can be obtained
here
The suite is located in $MPJ_HOME/tests directory. The test cases have
been changed from their original versions, in order to automate testing.
TestSuite.java is the main class that calls each of the test case present
in this directory. The build.xml file present in test directory, compiles
all test cases, and places test.jar into the lib directory.
By default, JGF MPJ benchmarks and MPJ micro-benchmarks are disabled.
Edit $MPJ_HOME/test/TestSuite.java to uncomment these tests
and execute them. Note, after changing TestSuite.java, you will have
to recompile the testsuite by executing 'ant' in test
directory.
-
cd mpj-user
- This document is assuming that mpj-user is the working
directory for MPJ Express users and the name mpj-user itself
has no significance. For this section, mpj-user should
contain 'mpj.conf', which is the configuration file required if you
are running the code without runtime(manually). If you are using the
runtime, then this directory should contain 'machines' file, which contains
a list of machines where MPJ Express daemons are running.
-
Running test cases with MPJ Express runtime
- Running test cases without the runtime (manually)
- Write a configuration file called 'mpj.conf'. Further details about
writing configuration file and its format can be found in
the section "Running MPJE programs without
the runtime (manually)"
-
Start the tests
- Linux:
runmpj.sh mpj.conf $MPJ_HOME/lib/test.jar
-
'runmpj.sh' requires password-less SSH access to machines in the
configuration file. To see how this can be done, look
here .
-
Alternatively, the directions for Windows should work.
- Windows and Linux:
-
For all the machines listed in mpj.conf, login to each Windows
or Linux
machine, type,
-
Linux:
java -jar $MPJ_HOME/lib/test.jar <rank> mpj.conf niodev
-
Windows:
java -jar %MPJ_HOME%/lib/test.jar <rank> mpj.conf niodev
- The <rank> argument should be 0 for process 0 and
1 for process 1. This should match to what has been
written in configuration file (mpj.conf). Check the
entry format in the configuration file to be sure
of the rank.
-
You may view the sample output of test cases at
http://dsg.port.ac.uk/projects/mpj/docs/res/t-<VERSION>.txt
For version 0.26, this would translate to:
http://dsg.port.ac.uk/projects/mpj/docs/res/t-0.26.txt
Click here to
view it .
-
Compiling MPJ Express source code and test suite
- Pre-requisites (For compiling and running the code)
- Java 1.5 (stable) or higher
- Verify this by executing,
- 'java -version' (should be stable 1.5
or higher)
- MPJ Express has been developed and tested using
java 1.5. But it is possible to compile the
code with '-source release' and '-target release'.
- 'javac' (should see usage information)
- Apache ant 1.6.2 or higher
- Verify this by executing 'ant'. This command should
display usage information,
- Perl (Optional)
-
MPJ Express needs Perl for compiling source code because some of the Java
code is generated from Perl templates. The build file will generate
Java files from Perl templates if it detects perl on the machine.
It is a good idea to install Perl if you want to do some development
with MPJ Express.
- Perl for Windows can be downloaded
here
- build.xml points to the Perl executable. You may
need to edit the property perl.executable to reference the
Perl executable.
- Compiling MPJ Express source code
- Being in $MPJ_HOME directory, execute
ant
- Produces mpj.jar, daemon.jar, and starter.jar in
lib directory.
- Compiling MPJ Express test-code
cd test
ant
- This produces test.jar in lib directory.
-
Tested Platforms
-
Gentoo GNU/Linux (kernel 2.6.10)
-
Debian GNU/Linux 'Sarge' (kernel 2.4.30)
-
SuSE 9.0 GNU/Linux (kernel 2.4.21)
-
Red Hat Fedora Core 4 GNU/Linux (kernel 2.6.12)
-
Red Hat Linux 7.3 GNU/Linux (kernel 2.4.19-openmosix)
-
Windows XP (Service Pack 2)
-
Windows XP with cygwin (Service Pack 2)
-
Java-docs
Java-docs can be seen in $MPJ_HOME/doc/javadocs
-
Contact and support
-
To support users of this software, we have setup a
mailing list . The users
are encouraged to subscribe and share their experiences of using
MPJ Express.
In case of any problems, please make sure that you have read the
documentation (including this README). If your
question(s) still remain unanswered, feel free to post to the list.
Some useful pointers for the mailing list are following:
-
Alternatively, the users can contact us directly by email.
-
Miscellaneous
-
Turning debugging on and off
-
MPJ Express uses log4j for logging purposes.
By default, the logging is turned
off. To turn logging on/off, the users can edit certain Java files to turn
it off/on. This can be achieved in two ways.
-
Less efficient way,
-
Edit $MPJ_HOME/src/mpi/MPI.java, and uncomment this line
to turn logging on.
//rep.setThreshold((Level) Level.OFF ) ;
By calling
setThreshold(..) method of
LoggerRepository associated with rootLogger, the threshold level
for logging can be set. If this level is Level.OFF
then all logging is dropped by the rootLogger. By default,
the level is set to Level.ALL
-
Recompile the code
-
More efficient way,
-
Edit src/mpi/MPI.java and change value of
static boolean DEBUG flag to false
-
Recompile the code
-
This approach is preferred for benchmarking MPJ
-
The runtime infrastructure also uses log4j. By default, the logging
for the runtime is turned on. To turn it off, the users can edit
src/runtime/starter/MPJRun.java and src/runtime/daemon/MPJDaemon.java
and change the value of DEBUG flag.
- MPJ Express runtime daemons can be debugged as follows:
- Edit $MPJ_HOME/conf/wrapper.conf file.
- Change the value of wrapper.logfile.loglevel from "NONE" to
"DEBUG".
- Now the output of mpjboot, mpjhalt, and other daemon activities
can be seen in $MPJ_HOME/logs/wrapper.log file. This information
is pretty useful for diagnosing and fixing daemons errors.
- Changing protocol switch limit
-
MPJ Express uses two communication protocols:
the first is 'eager-send', which
is used for transferring small messages. The other protocol is
rendezvous protocol useful for transferring large messages. The default
protocol switch limit is 128 KBytes. This can be changed prior to
execution in following ways depending on whether you are running
processes manually or using the runtime.
-
Running MPJE applications manually (without using
runtime): The users may edit
configuration file (for e.g. $MPJ_HOME/conf/mpj2.conf) to change
protocol switch limit. Look at the comments in this configuration
file. The second entry, which should be 131072 if you have not
changed it, represents protocol switch limit
-
Running MPJE applications with the runtime: Use -psl <val> switch to change
the protocol switch limit
-
For debugging purposes, sometimes it is useful to run the daemons in
console mode. This can be achieved in the following way:
-
cd $MPJ_HOME/bin
- On UNIX systems, execute
./mpjdaemon_linux_x86_32 console
. Here we are starting
the daemon on a 32 bit x86 processor. Choose the appropriate script
for your machine.
- On Windows, execute
cd %MPJ_HOME%/bin ;
wrapper.exe -c ../conf/wrapper.conf
-
With default settings attempting to start MPJ Express daemons on UltraSPARC
Solaris, PowerPC (PPC) Linux, or PPC Mac OS X would result in an error
like this:
mpjboot machines
Starting mpjd...
./mpjdaemon_linux_x86_32: line 1: ./daemon_linux_x86_32: cannot execute binary file
The reason is that by default x86 based code is called, which naturally does
not work on PPCs and UltraSPARCs.
We are currently in the process of writing smart scripts that call
the appropriate libraries based on the processor architecture and
operating system. In the meantime, this problem can be fixed in the
following way:
- Solaris
a. Edit $MPJ_HOME/bin/mpjboot and $MPJ_HOME/bin/mpjhalt
b. Comment the line ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_32 start;"
c. Uncomment the line #ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_solaris_sparc_64 start;"
d. cd $MPJ_HOME/lib
e. cp libwrapper.so_solaris_sparc_64 libwrapper.so
- PPC64 Linux
a. Edit $MPJ_HOME/bin/mpjboot and $MPJ_HOME/bin/mpjhalt
b. Comment the line ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_32 start;"
c. Uncomment the line #ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_ppc_64 start;"
d. cd $MPJ_HOME/lib
e. cp libwrapper.so_linux_ppc_64 libwrapper.so
- PPC32 Mac OS X
a. Edit $MPJ_HOME/bin/mpjboot and $MPJ_HOME/bin/mpjhalt
b. Comment the line ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_32 start;"
c. Uncomment the line #ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_macosx_ppc_32 start;"
d. cd $MPJ_HOME/lib
e. cp libwrapper.jnilib_macosx_ppc_32 libwrapper.jnilib
-
To see API differences between mpiJava-1.2.x, and MPJ Express, read
$MPJ_HOME/doc/APICHANGES.txt
-
We would like to thank:
-
Hong Ong for his
input to the initial design of niodev in particular and the software
in general,
-
Guillermo Taboada for
alpha testing of the software
-
Mohsan Jameel for testing of the software
-
There is a known (upto some extent) problem on Windows and Solaris that
results in hanging MPJ processes. Normally this will be observed when
MPJ test-cases will hang, as result, not completing or throwing any
error message.
We partially understand the problem but if some user encounters
this problem, we would request some more debugging information. The
required information can be obtained as follows.
Edit $MPJ_HOME/src/xdev/niodev/NIODevice.java and goto line 3673
and uncomment the line "ioe1.printStackTrace() ;". The line 3673
is in the MPJ Express release 0.27 and it might change in
the future. The general code snippet is like this:
catch (Exception ioe1) {
if(mpi.MPI.DEBUG && logger.isDebugEnabled() ) {
logger.debug(" error in selector thread " + ioe1.getMessage());
}
//ioe1.printStackTrace() ;
} //end catch(Exception e) ...
if(mpi.MPI.DEBUG && logger.isDebugEnabled()) {
logger.debug(" last statement in selector thread");
}
} //end run()
}; //end selectorThread which is an inner class
As a result now, when test-cases are executed again, users will
see stacks periodically. Most of these are related to socket
closed exceptions that are normal. If the code hangs now,
the latest stack trace that is not about socket being closed
is perhaps the reason of this hanging behaviour. We would request
the users to kindly email us the output so that we can fix the
problem. A stack trace that leaves MPJ Express hanging on Solaris
is as follows:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69)
at java.nio.channels.SelectionKey.isAcceptable(SelectionKey.java:342)
at xdev.niodev.NIODevice$2.run(NIODevice.java:3330)
at java.lang.Thread.run(Thread.java:595)
-
Some users have noticed that it takes a long time to bootstrap
MPJ Express processes. For example,
user@machine:~/mpj-user> mpjrun.sh -np 6 -jar $MPJ_HOME/lib/test.jar
16:15:43.400 EVENT Starting Jetty/4.2.23
16:15:43.415 EVENT Started HttpContext[/]
16:15:43.419 EVENT Started SocketListener on 0.0.0.0:15000
16:15:43.419 EVENT Started org.mortbay.http.HttpServer@23ac23ac
16:15:43.420 EVENT Starting Jetty/4.2.23
16:15:43.420 EVENT Started HttpContext[/]
16:15:43.421 EVENT Started SocketListener on 0.0.0.0:15001
16:15:43.421 EVENT Started org.mortbay.http.HttpServer@50265026
[ pause for a minute or two ]
Starting process <0> on
Starting process <1> on
[ pause for a minute or two ]
Starting process <2> on
Starting process <3> on
[ pause for a minute or two ]
Starting process <4> on
Starting process <5> on
[ job starts ]
Thanks to Andy Botting who is one of the users that identified this
problem. This problem is perhaps related to name resolution and we are
currently working to fix it.
-
The merge operation is implemented with limited functionality. The processes
in local-group and remote-group *have* to specify 'high'
argument. Also, the value specified by local-group processes should be
opposite to remote-group processes.
-
The merge operation is implemented with limited functionality. The processes
in local-group and remote-group *have* to specify 'high'
argument. Also, the value specified by local-group processes should be
opposite to remote-group processes.
-
Any message sent with MPI.PACK can only be received by using MPI.PACK
as the datatype. Later, MPI.Unpack(..) can be used to unpack different
datatypes.
-
Using 'buffered' mode of send with MPI.PACK as the datatype really does
not use the buffer specified by MPI.Buffer_attach(..) method.
-
Cartcomm.Dims_Create(..) is implemented with limited functionality.
According to the MPI specifications, non-zero elements of 'dims'
array argument will not be modified by this method. In this
release of MPJE, all elements of 'dims' array are modified without
taking into account if they are zero or non-zero.
-
Request.Cancel(..) is not implemented in this release.
-
MPJ applications should not print more than 500 characters in one line. Some
users may use System.out.print(..) to print more than 500 characters.
This is not a serious problem, because printing 100 characters 5 times
with System.out.println(..) will have the same effect as printing
500 characters with one System.out.print(..)
-
Some users may see this exception while trying to start the mpjrun module.
This can happen when the users are trying to run mpjrun.bat script. The
reason for this error is that the mpjrun module cannot contact the daemon
and it tries to clean up the resources it has. In doing so, it tries to
delete a file named 'mpjdev.conf' using File.deleteOnExit() method.
This method appears not to work on Windows possibly because of permission
issues.
Exception in thread "main" java.lang.RuntimeException: Another mpjrun
module is already running on this machine
at runtime.starter.MPJRun.(MPJRun.java:135)
at runtime.starter.MPJRun.main(MPJRun.java:925)
This issue can be resolved by deleting mpjdev.conf file. This file would
be present in the directory, where your main class or JAR file is present.
So for example, if the users are trying to run "-jar ../lib/test.jar",
then this file would be present in ../lib directory.
-
Permission issues while using MPJ Express runtime with Windows
-
Problem: The users may run into issues with starting daemons on Windows.
The reason is that when MPJE processes are started manually, the owner
is the user who started them. Thus the log files produced by these
processes are owned by the user. On the other hand, the daemon is installed
as a SYSTEM service. Thus, while starting the daemon, it may not be able
to write to the log file, because logs directory is owned by user,
whereas for daemon to be able to write to this directory, it has to be
globally accessible. Even when the daemon is started, MPJE
processes may not be able to write process log files because these
log files are owned by the user, whereas now they are required to be
globally accessible as MPJE processes started by the daemon are also
SYSTEM processes. So the problem is caused if the users switch from
running their code manually to the runtime, or possibly vice-versa.
-
Solution:
-
This can be avoided by starting MPJ Express daemons as user
process instead
of SYSTEM process. To restart the daemons as user process, goto
"Control Panel->Administrative Tools->-Services",
right-click MPJ Daemon service, click Properties, click "Log On"
tab, For the option "Log on as:", select This account and put in
the user name and password of this account, and restart the service.
It should now be started as a user process. To make sure if its running
as a user process, open process manager by pressing "ctrl-alt-delete"
and look for processes "wrapper.exe" and "java.exe". The UserName
should be the user name of this account instead of SYSTEM. There may
be other java processes running on the machine, which may end up
showing multiple java.exe on the process list. If this is the case,
then "wrapper.exe" is the only process that is representing the
MPJ Express daemon.
-
Delete all log files before the first execution
-
Execute following on cygwin
-
chmod a+w $MPJ_HOME/logs
-
chmod a+x $MPJ_HOME/lib/*.dll
-
chmod a+w $MPJ_HOME/logs/wrapper.log
-
If wrapper.log is present
-
Mixing local loading and remote loading may end up in ClassNotFoundException
for $MPJ_HOME/src/mpi/MPI.java class. This specific class is the one
shown in exception stack trace because this is the entry point to MPJE
classes. This can happen in scenarios when you have MPJE application
in your working directory, CLASSPATH contains ".", and you are
using remote loading. Under the hoods, the application is loaded
by the MPJE daemon using local loader (because working directory contains
applications) and this application tries to load MPJE classes though a
URL. Because this application is loaded using the local loader (default),
it will not be able to load MPJE classes from a URL. To avoid this error,
it is necessary not to have your applications in working directory when
using remote loader. If your programs reads some file, then it may be
a good idea to separate this file from your application classes, or copy
it to a tmp directory and specify this tmp directory as working directory
using -wdir switch.