Recovery and Failure Notification Features

The Apple Qmaster distributed processing system has a number of built-in features designed to attempt recovery if there is a problem, and to notify you when the system attempts a recovery.

Recovery Features

The recovery actions described next occur automatically if failures occur in the Apple Qmaster distributed processing system. There is no need for you, as the administrator, to enable or configure these features.

If a Service Stops Unexpectedly

If either the cluster controller service or the processing enabled on a service node stops unexpectedly, the Apple Qmaster distributed processing system restarts the service. To avoid the risk of endless stopping and restarting, the system restarts the failed service a maximum of four times. The first two times, it restarts the service right away. If the service stops abruptly a third or fourth time, the system restarts the service only if it had been running for at least 10 seconds before it stopped.

If a Batch Is Interrupted

When a service stops suddenly while in the middle of processing an Apple Qmaster batch, the cluster controller resubmits the interrupted batch in a way that prevents the reprocessing of any batch segments that were complete before the service stopped. The cluster controller delays resuming the batch for about a minute from the time it loses contact with the service.

If a Batch Fails

When the service is running, but one batch fails to process, a service exception occurs. When this happens, the cluster controller resubmits the batch immediately. The cluster controller resubmits the batch a maximum of two times. If the job fails on the third submission, the distributed processing system stops resubmitting the job. In Batch Monitor, the job is moved to the History table, where the status column indicates that a failure occurred.

Failure Notification

There are two different ways that the Apple Qmaster distributed processing system can provide information about a problem.

Email Notification

When a processing service stops unexpectedly, Apple Qmaster sends a notification email to the address that was entered in the Apple Qadministrator Cluster Preferences dialog for that cluster. If no address was entered there, the email is sent to the address in the Internet settings of the computer on which the cluster controller is enabled.

Note: Apple Qmaster does not currently support SMTP servers that require authentication.

Log Files for Individual Jobs or Batches

If a particular job or batch fails, a log file is generated that describes this failure. You can find the name and location of this log file through Batch Monitor.

To find the name and location of a log file
  1. Select the batch or job in the History table of the Batch Monitor window.

  2. Click the Info icon.

If any log files were generated because of failures in the processing of the item, the names and locations of those logs are shown.

Notification and Log Labels

The following table lists the service labels used in the email notifications and logs.

Processing service type
Notification label
Local Compressor service
Distributed Compressor service
Distributed Apple Qmaster service