Job ClassAd Information Flow in HTCondor

This is a description of how job ClassAds are exchanged and updated in HTCondor, as of 7.5.6 in the "new" shadow/starter (i.e. everything but standard and grid universes).

schedd --> negotiator

During the negotiation cycle, the schedd sends job ads to the negotiator for matchmaking. $$ references are not expanded (because there is no target ad yet). The negotiator does not cache this information. Each job ad is used immediately and then thrown away after attempting to match it.

schedd --> shadow

The schedd sends the job ClassAd to the shadow when the shadow starts up.

The copy of the job ad given to the shadow contains all of the $$ references expanded.

While the shadow exists, it, rather than the schedd, is responsible for evaluating job policy expressions.

When condor_qedit modifies the job ad, the schedd updates the shadow's copy of the job ad. (New in 7.5.6.)

Whenever the shadow sends an update of the ad to the schedd, it reads the schedd's value for TimerRemove and updates its own copy of the ad to match it.

shadow --> startd

The shadow sends a copy of the job ad to the startd at claim activation time.

shadow --> starter

The shadow sends a copy of the job ad to the starter before startup of the job.

starter --> shadow

The starter periodically monitors resource usage by the job and sends updates of these attributes to the shadow. The frequency is controlled by STARTER_UPDATE_INTERVAL (5 minutes) and STARTER_INITIAL_UPDATE_INTERVAL (8s).

On suspension or unsuspension, an update is sent immediately.

The starter does a final update when the job completes.

If any update fails, the starter disconnects from the shadow and retries on reconnect or gives up when killed by the startd due to the claim lease expiring or getting preempted.

The attributes which are updated are: RemoteSysCpu, RemoteUserCpu, ImageSize, DiskUsage, ExceptionHierarchy, ExceptionName, ExceptionType, ExitBySignal, ExitSignal, ExitCode, ExitReason, JobCoreDumped, and JobState. (The "Exception" stuff is actually for java universe.)

starter --> startd

The starter sends periodic updates to the startd so that the startd can use updated attributes when evaluating its policy. Failed updates are simply dropped. The update interval is controlled by the same knobs that control updates from starter to shadow.

There is a final update from the starter to the startd when the job is done. If this fails, it is simply dropped.

shadow --> schedd

The shadow periodically sends updates to the schedd. The interval is controlled by the configuration variable SHADOW_QUEUE_UPDATE_INTERVAL (default 15 minutes). If an update fails, it is simply dropped.

The shadow may optionally push some attribute changes immediately instead of waiting for a periodic update. These attributes are NumJobStarts and NumJobReconnects. SHADOW_LAZY_QUEUE_UPDATE=True disables this behavior by default.

When the job is finished, the shadow does a final update to the schedd. If this update fails, the shadow will wait and retry. This behavior is controlled by SHADOW_JOB_CLEANUP_RETRY_DELAY (30s) and SHADOW_MAX_JOB_CLEANUP_RETRIES (5). In case of failure even after all retries have been attempted, the shadow exits with a failure code that causes the job to be requeued and (usually) run again.

schedd --> history

The schedd dumps the job ad into the (optional) history file when the job leaves the queue. If this fails, it is simply dropped.

startd --> history

When a claim is deactivated (i.e. starter exits), the startd writes its copy of the job ad into the startd's (optional) history file. If this fails, it is simply dropped.

qedit --> schedd

Updates to the job by condor_qedit (or remote HTCondor-C) are made to the schedd's copy of the job. The schedd then signals the shadow and gridmanager to pull the updated attributes into their own copies of the ad.

schedd -> condor_q

condor_q sees the copy of the job ad in the schedd (or quill). $$ references are not expanded, but extra attributes exist in the ad to indicate what the result of expansion would be (used for reconnect if the schedd restarts while the job is running).

schedd -> job_router

job_router follows the schedd's transaction log (job_queue.log) and builds its own in-memory copy of the job queue, using the quill libraries.