HTCondorWiki: How To Have Execute Machines

How to have execute machines belong to multiple pools

Known to work with HTCondor version: 7.0

You may already be familiar with "flocking," which allows a submit machine to send jobs to multiple HTCondor pools. It is also possible to have execute machines belong to multiple HTCondor pools. One reason to do this would be to create a super-pool that contains all of the execute nodes from several existing HTCondor pools. Some motivation for this follows.

Suppose several departments each have their own HTCondor pool, but there is desire to share resources across departments and with other users on the campus. One perfectly good solution is to use traditional flocking to send jobs to multiple pools. Each condor_schedd daemon needs to have the pools added to its configuration's FLOCK_TO list in order to use all of the resources. Whenever a new pool is added to the federation, the FLOCK_TO lists must be updated. If users want to see the status of the resources or the usage statistics, they must know to query the individual pools in the FLOCK_TO list. If campus-wide fair-sharing is desired (except on machines that you own, of course), this becomes awkward, because each user has a separate user priority and accumulated usage within each pool. Another small annoyance is that the job's rank expression is only evaluated within individual pools, not between resources from multiple pools. Similarly, if jobs must wait for other jobs to finish (for example, because of a long MaxJobRetirementTime), it can easily happen that a job gets matched to preempt some other job on a busy machine in one pool and then has to wait for the job to retire--in the mean time, a machine may be sitting ready and idle in some other pool.

Another option is to make one pool of all the machines: replace the individual central managers with a single central manager. There will be one collector and one negotiator, so there is only one place to query pool status, one place to send jobs to, and one global matchmaker that has access to information about all of the machines. On the down side, this could result in lower quality of service to users in the existing pools, because there is only one negotiator serving everybody. It may be slower (especially if somebody puts something in their job requirements that causes an inefficient auto-clustering of jobs). Also, if this single, large pool is not well managed, downtime would prevent users from being able to access their own resources. Each department probably also wishes to retain high priority on its own machines. The startd RANK expression is a good way to do this, but it has the disadvantage of operating via preemption, which is a second-round scheduling mechanism, rather than via user priority, which is a first-round mechanism. This means (as of HTCondor 7.0), that they may sometimes find that the negotiator first hands out their machines to someone else and then in the next negotiation cycle, their job gets matched to the machine and the lower ranked job gets preempted.

All these difficulties motivate the focus of this HOWTO section, which is to combine the two approaches described so far: flocking and having one large pool. The idea is to leave the central managers of the existing pools in place, but also create one new central manager which all the execute machines also report to. This provides usage accounting across all of the resources together, and it serves as a convenient top-level pool to submit jobs to when users want to access all possible resources. Users in the departments with their own existing HTCondor pool might prefer to have their own pool remain the default pool for their job submission, but the super-pool could be added to their FLOCK_TO list. This way, they get the quality of service they were already enjoying from their own central manager, but excess jobs may be conveniently sent to all of the other resources by flocking to one place. The problem of guaranteeing high priority to department users on their own machines can also be addressed by treating matches made by the department negotiator as higher priority than those made by the super pool. Since the department negotiator has its own independent notion of user priorities, it can rely on the better user priorities of department members to guarantee them first priority on their own machines, rather than (or in addition to) relying on startd RANK to do this. This avoids the slight inefficiency of department members losing in the first round of negotiation to outsiders who happen to have better user priority but lower startd RANK.

Here is an example of the super-pool configuration for the central manager.


# Insert NegotiatorMatchExprNegotiatorName="SuperPool" into matches
# that this negotiator makes.  This is used by the startd to give
# the local negotiator priority over the super negotiator.
NegotiatorName = "SuperPool"
NEGOTIATOR_MATCH_EXPRS = NegotiatorName

# Configure authorization settings to permit startds in sub-pools to join
# the super-pool and to allow submission of jobs from all appropriate
# places.
ALLOW_READ = ?
ALLOW_WRITE = ?

Example configuration of a sub-pool:

# Insert NegotiatorMatchExprNegotiatorName="<LocalPoolName>" into matches
# that this negotiator makes.  This is used by the startd to give
# the local negotiator priority over the super negotiator.
NegotiatorName = "InsertLocalPoolNameHere"
NEGOTIATOR_MATCH_EXPRS = NegotiatorName

# For advertising to super-pool
SUPER_COLLECTOR=<insert super-collector here>
LOCAL_COLLECTOR=<insert local-collector here e.g. $(CONDOR_HOST)>

# the local negotiator should only ever report to the local collector
NEGOTIATOR.COLLECTOR_HOST=$(LOCAL_COLLECTOR)
# startds should report to both collectors
STARTD.COLLECTOR_HOST=$(LOCAL_COLLECTOR),$(SUPER_COLLECTOR)

# trust both negotiators
ALLOW_NEGOTIATOR=$(COLLECTOR_HOST)

# Ensure external users get big priority factor.
# If you don't have a uniform uid domain for all local users, then
# you will need to have some external process that updates priority
# factors using condor_userprio.
ACCOUNTANT_LOCAL_DOMAIN = $(UID_DOMAIN)

# Flocking to super-pool
FLOCK_TO=$(SUPER_COLLECTOR)

# Advertise in the machine ad the name of the negotiator that made the match
# for the job that is currently running.  We need this in SUPER_START.
CurJobPool = "$$(NegotiatorMatchExprNegotiatorName)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS) CurJobPool
STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS) CurJobPool

# We do not want the super-negotiator to preempt local-negotiator matches.
# Therefore, only match jobs if:
#      1. the new match is from the local pool
#   OR 2. the existing match is not from the local pool
SUPER_START = NegotiatorMatchExprNegotiatorName =?= $(NegotiatorName) || \
              MY.CurJobPool =!= $(NegotiatorName)

START = ($(START)) && ($(SUPER_START))