close
The Wayback Machine - https://web.archive.org/web/20091220222539/http://code.google.com/appengine/docs/java/taskqueue/overview.html
My favoritesImage | ImageEnglishImage | Sign in

Task Queue Java API Overview

With the Task Queue API, applications can perform work outside of a user request but initiated by a user request. If an app needs to execute some background work, it may use the Task Queue API to organize that work into small, discrete units, called Tasks. The app then inserts these Tasks into one or more Queues. App Engine automatically detects new Tasks and executes them when system resources permit.

The Task Queue service is currently released as an experimental feature. This means the API and behavior of the service may change in ways that are not compatible with earlier releases, even for minor releases of the runtime environments. Please try out this feature and let us know what you think! (more info)

Using Task Queues in Java

A Java app sets up queues using a configuration file named queue.xml, in the WEB-INF/ directory inside the WAR. See Java Task Queue Configuration. If an app does not have a queue.xml file, it has a queue named default with some default settings.

To enqueue a task, you get a Queue using the QueueFactory, then call its add() method. You can get a named queue specified in the queue.xml file using the getQueue() method of the factory, or you can get the default queue using getDefaultQueue(). You can call the Queue's add() method with a TaskOptions instance (produced by TaskOptions.Builder), or you can call it with no arguments to create a task with the default options for the queue.

Note: While this feature is experimental, the Java package for the Task Queue API is com.google.appengine.api.labs.taskqueue. When the feature is formally added to the main API, the package path will change.

The following code adds a task to a queue with options:

import com.google.appengine.api.labs.taskqueue.Queue;
import com.google.appengine.api.labs.taskqueue.QueueFactory;
import static com.google.appengine.api.labs.taskqueue.TaskOptions.Builder.*;

// ...
        Queue queue = QueueFactory.getDefaultQueue();
        queue.add(url("/worker").param("key", key))

The default queue will call the request handler at the URL /worker with the parameter key at the rate set in the queue.xml file, or the default rate of 5 tasks per second.

Task Concepts

In App Engine background processing, a task is a complete description of a small unit of work. This description consists of two parts:

  • A data payload which parameterizes the task.
  • Code which implements the task.

As an example, consider a calendaring application which needs to notify an invitee, via email, that an event has been updated. The particular 'email notification task' for this might be defined as:

  • Task - Email Notification
    • data: the email address and name of the invitee, along with a description of the event
    • code: function which substitutes the relevant strings into an email template and then sends the mail.

Perhaps there are multiple invitees who need to be updated for a given event. The developer may choose to create a new notification Task for each attendee individually:

  • Task 1
    • data: invitee1's email address
    • code: email_function (prepares email contents and sends)
  • Task 2
    • data: invitee2's email address
    • code: email_function (prepares email contents and sends)
  • Task 3
    • data: invitee3's email address
    • code: email_function (prepares email contents and sends)
  • ...

As this example shows, it is possible that multiple tasks share the same common code and differ only in their data payload. Similarly, multiple tasks may share the same data payload but reference different code, as in this for an ecommerce site:

  • Task 1 - Send Receipt to buyer
    • data: an order_description object
    • code: email_buyer_receipt function
  • Task 2 - Initiate Transaction
    • data: an order_description object (same as Task 1)
    • code: charge_order function

More examples of Tasks include:

  • Feed Aggregator. A feed reader application needs to fetch, automatically and periodically, the contents of various news feeds from across the Internet. A single task might consist of:
    • data: the URL of a feed and the timestamp when it was last checked
    • code: a function which performs a URL Fetch to retrieve a feed, parse its contents, and insert new items into the database
  • Schema Migration. A new version of an application needs to programmatically iterate through old entities in the datastore and update them to a new schema. A single task might consist of:
    • data: the key of an Entity which needs to be updated
    • code: a function which analyzes the Entity's model, adds or modifies properties, and updates it in the Datastore.

Tasks as Offline Web Hooks

For App Engine to support concrete Task instances, two mechanisms are needed:

  • Data Storage: a language-agnostic container for arbitrary data
  • Code Reference: a language-agnostic mechanism for referencing arbitrary code, along with a means to pass in parameters.

Fortunately, the Internet provides such a solution already, in the form of an HTTP request and its response. The data payload is the content of the HTTP request, such as web form variables, XML, JSON, or encoded binary data. The code reference is the URL itself; the actual code is whatever logic the server executes in preparing the response.

Revisiting the calendar app example above, the tasks can be revised as:

  • Task 1
    • data: an HTTP POST message containing a form variable of email_address = attendee1's email
    • code: a URL which, when requested with the HTTP POST, will execute code on the server side that sends the mail

Using this model, App Engine's Task Queue API allows you to specify tasks as HTTP Requests (both the contents of the request as its data, and the target URL of the request as its code reference). Programmatically referring to a bundled HTTP request in this fashion is sometimes called a "web hook."

Importantly, the offline nature of the Task Queue API allows you to specify web hooks ahead of time, without waiting for their actual execution. Thus, an application might create many web hooks at once and then hand them off to App Engine; the system will then process them asynchronously in the background (by 'invoking' the HTTP request). This web hook model enables efficient parallel processing - App Engine may invoke multiple tasks, or web hooks, simultaneously.

To summarize, the Task Queue API allows a developer to execute work in the background, asynchronously, by chunking that work into offline web hooks. The system will invoke those web hooks on the application's behalf, scheduling for optimal performance by possibly executing multiple webhooks in parallel. This model of granular units of work, based on the HTTP standard, allows App Engine to efficiently perform background processing in a way that works with any programming language or web application framework.

Worker URLs and Task Names

As mentioned above, a Task references its implementation via URL. For example, a task which fetches and parses an RSS feed might use a worker URL called /app_worker/fetch_feed. With the App Engine Task Queue API, you may use any URL as the worker for a task, so long as it is within your application; all Task worker URLs must be specified as relative URLs.

import com.google.appengine.api.labs.taskqueue.Queue;
import com.google.appengine.api.labs.taskqueue.QueueFactory;
import com.google.appengine.api.labs.taskqueue.TaskOptions.Builder.*;

// ...
        Queue queue = QueueFactory.getDefaultQueue();

        queue.add(url("/path/to/my/worker"));

        queue.add(url("/path?a=b&c;=d").method("GET"));

In addition to a Task's contents (its data payload and worker URL), it is also possible to specify a Task's name. This provides a lightweight mechanism for ensuring once-only semantics. Once a Task with name N is written, any subsequent attempts to insert a Task named N will fail. Eventually (at least seven days after the task successfully executes), the task will be deleted and the name N can be reused.

Task Request Headers

Requests from the Task Queue service contain the following HTTP headers:

  • X-AppEngine-QueueName, the name of the queue (possibly default)
  • X-AppEngine-TaskName, the name of the task, or a system-generated unique ID if no name was specified
  • X-AppEngine-TaskRetryCount, the number of times this task has been retried; for the first attempt, this value is 0

Securing URLs for Tasks

You can prevent users from accessing URLs of tasks by restricting access to administrator accounts. Task queues can access admin-only URLs. You can read about restricting URLs at Security and Authentication. An example you would use in web.xml to restrict everything starting with /tasks/ to admin-only is:

    <security-constraint>
        <web-resource-collection>
            <url-pattern>/cron/*</url-pattern>
        </web-resource-collection>
        <auth-constraint>
            <role-name>admin</role-name>
        </auth-constraint>
    </security-constraint>

For more on the format of web.xml, see the documentation on the deployment descriptor.

To test a task web hook, sign in as an administrator and visit the URL of the handler in your browser.

Task Execution

Once you have created a Task and inserted it into a queue for processing, App Engine will execute it as soon as possible (subject to the app's scheduling criteria, if specified). Since a task is a web hook, its lifecycle is the same as any other request in App Engine - it may use the same APIs and is subject to the same constraints as an 'online' request from a user's browser. Notably, this means that the lifetime of a single task's execution is limited to 30 seconds. If your task's execution nears the 30 second limit, App Engine will raise an exception which you may catch and then quickly save your work or log process.

When the developer inserts a new task into a queue, the order in which that task will execute (relative to other tasks) is governed by the contents and properties of that queue. However, it is possible to specify certain properties (such as an ETA) which request special scheduling on a per-task basis.

If the execution of a particular Task fails (by returning any HTTP status code outside of the range 200-299), App Engine will attempt to retry until it succeeds. The system will back off gradually so as not to flood your application with too many requests, but it will retry failed tasks at least once a day at minimum.

When implementing the code for Tasks (as worker URLs within your app), it is important that you consider whether the task is idempotent. App Engine's Task Queue API is designed to only invoke a given task once, however it is possible in exceptional circumstances that a Task may execute multiple times (e.g. in the unlikely case of major system failure). Thus, your code must ensure that there are no harmful side-effects of repeated execution.

If a task performs sensitive operations (such as modifying important data), the developer may wish to protect the worker URL to prevent a malicious external user from calling it directly. This is possible by marking the worker URL as admin-only in the app configuration.

Queue Concepts

Thus far, this document has explained how Tasks can be used to encapsulate small chunks of work for asynchronous execution. When used in large numbers, Tasks provide an efficient and powerful tool for background processing. However, the developer may need to manage the execution of these tasks - in particular, he may need to control the rate at which tasks are invoked, so as not to exhaust resources.

The Task Queue API provides Queues as a container for tasks. All new tasks must be inserted into a queue; a developer can influence task execution by modifying properties of the queue. As an example, the developer may wish to ensure that his system sends no more than 10 emails per second. He can accomplish this by using a queue called email-throttle, which he configures with a rate of 10 invocations per second. Within his app's code, he makes it so all tasks which send email are inserted into this email-throttle queue. The App Engine system respects his configuration on the email-throttle queue and invokes its tasks at a rate of less than or equal to 10 per second, even if thousands of tasks are inserted at a single instant.

Beyond influencing the rate of task execution, Queues also provide the ordering in which tasks are consumed by the system (not withstanding the special task scheduling parameters). Fundamentally, Queues deliver a best effort FIFO order (first in, first out) - a developer creates new tasks and inserts them into the tail of the queue, the system removes tasks from the head for execution. However, the system attempts to deliver the lowest latency possible for any given task via specially optimized notifications to the scheduler. Thus, in the case that a queue has a large backlog of tasks, the system's scheduling may "jump" new tasks to the head of the queue.

Although a queue defines a general FIFO ordering, tasks are not executed entirely serially. Multiple tasks from a single queue may be executed simultaneously by the scheduler, so the usual locking and transaction semantics need to be observed for any work performed by a task.

It is important to understand that Queues are a mechanism for manipulating tasks in aggregate; Queues do not dictate the contents of any given Task. It is possible for a single Queue to contain many different types of tasks, which have varying data payloads and worker URLs.

The Default Queue

For convenience, App Engine provides a default queue to all applications. A developer may use this queue immediately without any additional configuration. This queue automatically has a throughput rate of five task invocations per second, however you may configure its properties in the same fashion as any user-defined queue with configuration for a queue named default. Code may always insert new Tasks into the default queue, but if you wish to disable execution of these tasks, you may do so by adding it to your configuration and lowering its rate to zero.

Queue Default URLs

You can specify the worker URL for a Task by passing it to the Task object constructor. If you do not specify a worker URL, the Task uses a default worker URL named after the queue:

/_ah/queue/queue_name

As an example, for a queue named email-worker-queue, you may implement a default request handler at /_ah/queue/email-worker-queue (within your application). Any new Task inserted into the email-worker-queue which does not have a worker URL of its own (in other words, it's purely data with no code reference) will be invoked using the URL of /_ah/queue/email-worker-queue.

A queue's default URL is used if, and only if, a Task does not have a worker URL of its own. If a task does have its own worker URL, then it will only be invoked at that URL, never another. (Once inserted into a Queue, a Task is immutable).

Please note that if a Task does not have a worker URL, then the Task will be invoked against the queue's default URL even if there is currently no handler defined for the queue's default URL! In this case, the system would attempt to invoke the Task with a nonexistent URL which would fail with a 404 (this 404, along with the exact URL that was tried, will be available in your application's logs). Due to the failure state of this 404, the system will save the Task and retry it until it is eventually successful. You can clear (or 'flush') tasks that can't complete successfully using the Administration Console.

Tasks and App Versions

All versions of an application share the same task queues. The version of the app used to perform a task depends on how the task was enqueued.

If the version of the app that enqueues a task is the default version (http://app-id.appspot.com or a Google Apps domain), the queue will use the default version of the app to perform the task, even if the default version has changed since the task was enqueued. If the app enqueues a task then the default version is changed, the queue will use the task's URL path with the new default version of the app to perform the task.

If the version of the app that enqueues a task is not the default version when the task is enqueued (such as http://3.latest.app-id.appspot.com/), the queue will use that version of the app to perform the task, regardless of which version is the default version.

Managing Task Queues

You can manage task queues for an application using the Administration Console. You can use the Console to list queues and their configuration, inspect the tasks currently waiting to be executed, and manually delete individual tasks or every task in a queue. This is useful if a task cannot be completed successfully and is stuck waiting to be retried.

To manage queues, sign in to the Administration Console and select "Task Queues." Note that the default queue only appears in the Console after the app has enqueued a task to it for the first time.

Task Queues and the Development Server

When your app is running in the development server, tasks are automatically executed at the appropriate time just as in production. However, there are minor differences in behavior between the development server and production that you should be aware of. First, the development server does not respect the "rate" and "bucket-size" attributes of your queues. As a result, tasks will be executed as close to their scheduled execution times as possible, and setting a rate of 0 will not prevent tasks from being automatically executed. Second, the development server does not retry tasks. Finally, the development server does not preserve queue state across server restarts. We hope to implement support for these features in the development server in a future release.

To disable automatic execution of tasks, set the "task_queue.disable_auto_task_execution" jvm flag: --jvm_flag=-Dtask_queue.disable_auto_task_execution=true

You can examine and manipulate tasks from the developer console:

http://localhost:8080/_ah/admin/taskqueue

To execute tasks, select the queue by clicking on its name, then select the tasks to execute. To clear a queue without executing any tasks, click the "Flush Queue" button.

Quotas and Limits

Execution of a task counts toward the following quotas:

  • Requests
  • CPU Time
  • Incoming Bandwidth
  • Outgoing Bandwidth

The act of executing a task consumes bandwidth-related quotas for the request and response data, just as if the request handler were called by a remote client. When the task queue processes a task, the response data is discarded.

For more information on quotas, see Quotas, and the "Quota Details" section of the Admin Console.

In addition to quotas, the following limits apply to the use of Task Queues:

Limit Amount
task object size 10 kilobytes
number of active queues (not including the default queue) 10
total queue execution rate 20 task invocations per second
maximum countdown/ETA for a task 30 days from the current date and time

Note: Quotas and limits for the Task Queue service may change while the feature is still experimental.

Status of the Task Queue API

The App Engine Task Queue API is currently in an experimental state, release as an App Engine Labs feature. We are eager to get your feedback so that we may improve the API. During this initial, experimental phase, the API will be located under the App Engine Labs package:

com.google.appengine.api.labs.taskqueue

In particular, we are looking for feedback on the following:

  • the usability of the API itself
  • the behavior and policies which App Engine uses executing Tasks, e.g.:
    • how much Task throughput per second your app needs
    • how long you need Tasks to run
    • queue lifecycle management
  • what debugging tools and reporting you need to maximize the utility of Task Queues

Once the API is finalized, we'll move it out of Labs to its final location at:

com.google.appengine.api.taskqueue

Of course, we'll give multiple weeks worth of notice before this happens and we'll work with developers on the best transition possible. Please keep in mind that the following may change before the Task Queue API leaves Labs:

  • Quotas and Limits - most likely, we'll increase the number of Tasks your app can use in a given day
  • The API itself - we may need to make changes to the API, due to usability or bugs
  • Billing - we may change how we charge for usage of the Task Queue API.