This is a very early draft.


MATLAB is an excellent match for HTCondor. Millions of MATLAB jobs, possibly tens of millions, have been successfully run under HTCondor over the years. The specifics of working with MATLAB will vary from site to site, but here are some general guidelines.

This assumes basic familiarity with MATLAB and HTCondor. It assumes that you can set up and maintain at least a small HTCondor pool.

Licensing

Perhaps the biggest challenge to running MATLAB under HTCondor is licensing. MATLAB is proprietary software with strict licensing terms. You might have one or two licenses for your desktop computers, but to run hundreds of simultaneous MATLAB jobs you would require hundreds of licenses.

Your institution may already have acquired suitable licensing for MATLAB. If you have a fixed number of licenses available, HTCondor's Concurrency Limit support can help.

Another option uses the MATLAB Compiler to create an executable that can be run with the MATLAB Compiler Runtime (MCR). The MATLAB Compiler is an optional toolbox for MATLAB. In general, executables created with the MATLAB Compiler are not subject to MATLAB's license; you are free to run as many in parallel as possible. Check your MATLAB licensing to confirm this.

If you are using third party MATLAB add-ons, you will need to check the licensing on them as well.

Executables

The MATLAB executable and supporting libraries must be made available on any computers on which the HTCondor jobs might run. When using the MATLAB Compiler, instead of the full MATLAB executable and supporting libraries, the MCR is needed. The general techniques for making the MCR available are the same. Accomplish this by one of

executable = my_task
arguments = $$(MATLAB_PATH)
transfer_input_files= my_task.m
queue

#! /bin/sh
exec /opt/matlab/bin/matlab -nodisplay ./my_task.m

MATLAB_PATH  = /opt/matlab/bin
STARTD_ATTRS = MATLAB_PATH

executable = my_task
arguments = $$(MATLAB_PATH)
transfer_input_files= my_task.m
queue

#! /bin/sh
exec "$1"/matlab -nodisplay ./my_task.m

MATLAB Compilation

To compile foo.m with the MATLAB Compiler, you would use a command like the following. The various options are explained further below; -R tells MATLAB to behave as though the next option was passed in when the job is started.

mcc -m -R -singleCompThread -R -nodisplay -R -nojvm -nocache foo.m

Invoking

MATLAB may try to create a graphical environment. HTCondor does not support graphical environments; it does not make sense to open up a user interface for a job that will not have a user directly looking at it. You may need some combination of -nosplash, -nodisplay, and possibly -nojvm to stop MATLAB from creating a graphical environment. This should not be necessary for compiled MATLAB HTCondor jobs.

Unless you have special arrangements to use multiple CPU cores, you will want -singleCompThread so that MATLAB only uses a single core.

-nosplash Disable splash screen. (No GUI support in HTCondor)
-nodisplay Disable GUI. (No GUI support in HTCondor)
-nojvm Disable GUI? Eliminate unnecessary Java (faster?)
-singleCompThread Only use one CPU core (play nicely when sharing computer)

Parallelism

In the most common configuration, HTCondor does not directly support "parallel" jobs, jobs that might use a system like MPI or multiple threads to take advantage of multiple CPUs at once. HTCondor can launch jobs that use multiple processes or threads, but by default HTCondor will offer them a single CPU core to run on. (They may get lucky and be able to use more than one CPU core on a computer, but that should not be relied on.)

A local installation may provide additional options for parallel, usually in the form of offering a job two or more CPUs on a single computer. Local administrators should be able to describe available functionality.

Given the default configuration, it is usually better to break your work down into multiple independent jobs. For example, if you are processing 10,000 images, instead of a single MATLAB job that processes them, perhaps you could have 10,000 jobs that each process 1 image. HTCondor is then able to schedule your jobs across multiple computers or at least multiple cores on a single computer, giving you the speed benefits of parallelism.

For older MATLAB (Possibly pre 2011?), to ensure that MATLAB only uses one core, put this in your MATLAB script:

lastN = maxNumCompThreads(1);   (Pre r2009bsp1)

If you are using the MATLAB Compiler, but want to use multiple threads when doing development, you could use something like this to limit MATLAB to one thread only for compiled versions:

if isdeployed
    lastN = maxNumCompThreads(1);
end

If you do the above with R2009sp1 and newer and also use the -R -singleCompThread it will error out.

The above does not work on newer versions of MATLAB, as maxNumCompThreads is deprecated. Instead, pass the -singleCompThread option. If you are using mcc (the Matlab compiler), add -R -singleCompThread to your compiler options.

The best you can do is prevent the compiled job from using one thread per core. It will still have 5 threads with all the time on one of them. Java applications will use a few more threads. But you will have less then one thread per core which is what it will do on its own.

Example

This example assumes that MATLAB is available in /opt/Matlab/bin.

COMPLETELY UNTESTED

executable = /opt/Matlab/bin/matlab
arguments = -nodisplay -nojvm -singleCompThread  my-script.m
transfer_input_files = my-script.m
output = my-script.output
error = my-script.error
log = my-script.log
environment = "DISPLAY=:0.0 HOME=. MATLAB_PREF=."
queue

Compiled MATLAB Example (completely tested)

Given a successful mcc run, as described above, and assuming the MATLAB runtime is not pre-installed on any of the execute machines, you can transfer the runtime along with the HTCondor job. There is an example of this in CHTC in /home/gthain/CompiledMatlabExample. First, put the whole runtime into a single tar file, called m.tgz. Then, edit the run_foo.sh wrapper (which was created by the compiler) to add the lines

tar xzf m.tgz
mkdir cache
chmod 0777 cache
export MCR_CACHE_ROOT=`pwd`/cache

at the beginning of the script, but after the #! line. When MATLAB runs, it wants to create a cache directory under the user's home directory, which may not exist on an execute machine, or may conflict with other concurrently running MATLAB. Then, create a HTCondor submit description file that looks something like:

universe = vanilla

executable = run_foo.sh
arguments = ./mathworks-R2009bSP1

should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_input_files = m.tgz, foo

output = out
error  = err
log    = log
queue

Example to run MATLAB under HTCondor on Windows

[ copied from this post to the htcondor-users email list ]

I've just got this to work with Matlab R2012a 64-bit on Windows 7 64-bit under Condor 7.6.6. This is the .bat wrapper:

set SCRIPT=%1
set MATLABEXE=c:\matlab2012\bin\win64\matlab
set MATLAB=c:\matlab2012
set TMP=%CD%
set TEMP=%CD%
set USERPROFILE=%CD%
set MATLAB_PREFDIR=%CD%\My Documents\MATLAB

mkdir "My Documents"
mkdir "My Documents\MATLAB"

set PATH=%WINDIR%\system32
set PATH=c:\matlab2012\bin\win64;%PATH%

start /wait %MATLABEXE% -noFigureWindows -nodesktop -nosplash -nojvm -r %SCRIPT%

When I used the -wait option it complained about a clash with the -nojvm option. When I got rid of -nojvm it said it didn't recognise -wait although it still seemed to work. Anyway I've stuck with start /wait to be on the safe side. Pleasingly there are no error messages or warnings returned now.

Also, you may need to copy the pathdef.m file to the Condor execute folder for the toolboxes to work (if you use them) e.g.

   copy c:\matlab2012\toolbox\local\pathdef.m .

This is useful in building standalone executables using the Matlab Compiler Toolbox (the mcc command) actually on a pool PC.

The upshot is that the temporary Condor execute account(s) don't have a proper profile so you need to set up an artificial one for Matlab to work (thanks to MathWorks for pointing this out).

If you don't include a "quit" command at the end of the M-file, the Matlab interpreter will hang around for ever waiting for the next command and the jobs will never complete. Hindsight - always 20/20.

Additional Resources

Many other sites are using MATLAB under HTCondor. Here are links to the documentation from just a few.