Jobs

Overview

Jobs (also known as Tasks under the QueueRunner model) are tasks stored in the database. Often these are used for tasks that will run unattended in the backend, but may also be a process where a record of parameters is necessary for future reference, though the actual Task will run in real-time.

History

We now have three variations on how jobs have been executed.

Direct Launch (No Longer Used)

The Python front-end setup a Job in the database, then called the backend to start, pause, or stop the job. This assumed a single backend node, and there was no concept of queuing multiple jobs for serial operation or parallelization.

QueueRunner - ADAPT Java Based (Deprecated)

QueueTasks and QueueRunner replaced the initial Jobs system, providing a queue and the ability to run multiple nodes/processes in parallel. QueueRunner was implemented in ADAPT and consumed Jobs, implemented as individual QueueTask classes in Adapt.

QueueRunner - Python

In an attempt to minimize or remove our use of ADAPT, the latest iteration implements QueueRunner in a Python daemon. Individual Jobs are implemented as Plugins, which provide a level of separation from the rest of the platform.

Job Settings

These settings are stored in the Job itself, and are typically separated from the actual Job functionality.

Setting Name Description
Name Not used internally. This is meant as a book-keeping item for later reference by the user. For example, a researcher could name each task based on what experiment number we had written out in notes. This makes it easy to find the task and review the settings later.

Job Parameters

JobParameters are Key/Value parameters assigned to Jobs. These are stored in the tools_jobparameters table with the fields:

Field Type
job_id Integer
name varchar[255]
value varchar[255]

JobParameters contain extra JobType-specific data not a part of the Job table. Each JobType will have it’s own list of required and optional parameters.

Note

Prior to August 2013, JobParameters required there to be a unique Name for each parameter entry for a job (i.e., unique on (job_id, name) ). Now, multiple entries with the same name may exist, requiring some change to processing. Most parameters will still require exactly one entry, but some will allow multiple.

Job Logs

These are short messages stored for a Job sent back from the processing.

A separate database table stores messages that may be useful to the end-users / admins. These Logs can be inserted at any time, so may be useful for showing intermittent status updates during long processes. The log consists simply of a date and message (String). While very large messages can be stored in the database here, these are intended for short messages and we will probably limit the message to an arbitrary length to avoid large amounts of data from accumulating in the database.

In ADAPT (Java Backend), entries are added with a call to NesterJob.SaveJobLogMessage.

Job Result Files

Result Files allow the Jobs to return larger sets of data back to the user than what would be practical through the JobLogs alone. Result Files are listed in the ViewJob page for the user to download. Any file type should be supported (text, image, audio, etc.).

In ADAPT, this is done with a call to NesterJob.UploadJobResultFile

Since Job processing may occur on separate systems (e.g., Stampede), these files have to be transferred back to the webserver. We do this by uploading the file through the Nester API (api/v0/jobResultFile)

QueueRunner

QueueRunner is ARLO’s new (as of April 2013) job management system. It’s a custom Job Queue, originally implemented as part of ADAPT, this is now being migrated to a stand-alone Python daemon.

Background

I had heavily debated using a standard messaging queue system (think RabbitMQ) but decided against for the sake of system deployment simplicity. This is likely something we will revisit in the future, but for now we have the QueueRunner.

We refer to QueueRunner as the overall system for managing tasks, which includes the QueueRunner process, the QueueTask classes (ADAPT), or the job Plugins (Python).

This is an extension to the original Jobs system in ARLO (database tables tools_jobs, tools_jobtypes, tools_jobstatustypes, and tools_jobparameters).

QueueRunner uses the same Job tables in the database, with some extra fields. The execution process, however, is completely different. A ‘Task’ is created in the database and left for the backend (QueueRunner client), which periodically scans the database looking for Queued tasks that need executed.

QueueTasks are pre-defined operations that perform a function on the backend. Each defined task will consist of a separate class defined in the Java backend (inheriting the generic QueueTask class). When the QueueRunner finds a task that needs executed, it will start the task in the corresponding class.

It is likely that there will be multiple QueueRunner nodes running. For example, when deploying to the super computer, each node will invoke the QueueRunner to allocate work to itself. It is also possible that each deployment may have a different set of handlers. For example, on the supercomputing nodes, we may not want to handle media file imports, but handle tag discovery tasks. Conversely, on the web server nodes, we may not want to burden the system with tag discovery, but handle the file import tasks.

Operation

To schedule a task, an entry is created in tools_jobs. A partial summary of the fields:

Field Description
Type What type of job is this; i.e., which QueueTask handler will process it.
Status New jobs will be created with the status “Queued”
startedDate when the job is picked up by the QueueTask handler and started, this time will be set.
lastStatusDate The QueueTask handler may occasionally update the job status, either switching status (start, stop, error, completed) or updating the execution percentage.
Priority Integer specifying relative priority, lower numbers have higher priority. Typical default is 0. (See below)

If the job has any corresponding JobParameters needed for execution (likely) then these are added to tools_jobparameters.

When QueueRunner finds a Queued job, it will try to assign the task to itself, using an atomic database update to change the status_id to “Started”. Once it has assigned it to itself, it will complete the task and update the database accordingly (see the Job Status options).

Job Plugins

For the Python implementation of QueueRunner, we use a plugin framework for implementing Jobs.

Launching a plugin-based Job consists of:

  • Setting the Job Type to “JobPlugin” (15)
  • Adding a JobParameter “PluginName” set to the name of the corresponding Job plugin.

Stopping Jobs

To stop running jobs:

  • “Queued” Tasks can be instantly changed to “Stopped”
  • “Running” Tasks can’t just be updated - currently need to complete on their own.

Resuming Jobs

“Stopped” Jobs can be changed to “Queued”.

Priority

Jobs will execute based on ‘priority’, an Integer specifying the relative priority of the job. Lower numbers have higher priority (think Linux’ ‘nice’) with the default priority corresponding to 0.

Within the existing QueueRunner, job priority will be specific to and implemented within Task handlers. In the current implementation, the JobType will take precedence, and this ‘priority’ field will only come into play once jobs of a specific type start executing.

For example, if there are ten ImportAudioFile tasks, and ten SupervisedTagDiscovery tasks, the imports will have priority over the discovery tasks, regardless of the Priority value.

The Priority was originally implemented for Supervised Tag Discovery, where some users may launch Jobs to analyze thousands of files. In the initial implementation, priority will be assigned based on the number of files to search. So a small job (of say 10 files) will interrupt a larger (with say 1000).

Job Status

Id Status Description
1 Unknown Job is assumed not part of the QueueRunner system, or is an invalid state, and will be ignored.
2 Queued Task is pending execution.
3 Running Task has been assigned and is being executed.
4 Error Some error occurred - task is stopped and will not complete
5 Stopped Task was manually stopped before beginning execution.
6 Complete Task has completed successfully.
7 Assigned Task is assigned to a client for execution, but has not started. In normal conditions, Tasks should not exist in this state for extended periods (only between being assigned within the API processing, and being started by the QueueRunner client).

Launching QueueRunner (Python Based)

Currently, the Python QueueRunner daemon only runs plugin-based Jobs. The ADAPT daemon will continue to run the existing Java-based jobs for now, though eventually these will likely be migrated to the new QueueRunner as well.

Settings

Settings are by default sourced from the Django settings.py file, and may be overridden on the command line.

Name Description
General Settings
ARLO_API_V0_URL Base URL for the v0 API (e.g., http://vagrantdev.arloproject.com/api/v0/)
QUEUE_RUNNER Dictionary Settings
API_TOKEN Authentication Token for ARLO API access.
CYCLE_TIME (Seconds) If 0, don’t cycle, exit if no queued jobs found.

Defined JobTypes

Note

The following includes details around the Java-based ADAPT implementation. Job processing is being migrated to a new Python-based QueueRunner client, which will be implemented along with a plugin-based system for defining JobTypes.

QueueTask Based Jobs (ADAPT)

QueueTask is a base class for building handlers. It contains the database interface and allocation logic. Specific handlers will then build the task handling on top of this.

QueueTask Name Database Id / Name Docs
QueueTask_SupervisedTagDiscovery 1 - SupervisedTagDiscovery (Deprecated)  
  2 - RandomWindowTagging (Deprecated)  
  3 - TagAnalysis (Deprecated)  
  4 - AdaptAnalysis (Deprecated)  
  5 - Transcription (Deprecated)  
  6 - UnknownTagDiscovery (Deprecated)  
QueueTask_UnsupervisedTagDiscovery 7 - UnsupervisedTagDiscovery UnsupervisedTagDiscovery
QueueTask_Test 8 - Test ADAPT Test Job A task specifically for testing during development. Any parameters may be used and will be ignored.
QueueTask_ImportAudioFile 9 - ImportAudioFile ImportAudioFile
QueueTask_SupervisedTagDiscoveryParent 10 - SupervisedTagDiscoveryParent SupervisedTagDiscovery
QueueTask_SupervisedTagDiscoveryChild 11 - SupervisedTagDiscoveryChild SupervisedTagDiscovery
QueueTask_ExpertAgreementClassification 12 - ExpertAgreementClassification Expert Agreement Classification
QueueTask_WekaJobWrapper 13 - WekaJobWrapper WekaJobWrapper
QueueTask_LibraryMediaFileValidation 14 - LibraryMediaFileValidation LibraryMediaFileValidation