Dequeuing Settings
Normally, workers would dequeue a batch of tasks from the queue, process them, report as completed (or failed), then repeat on the next batch. The more workers you have, the more tasks you can process at once. There are no technical limits on the maximum number of inflight tasks. Also, workers dequeue tasks as fast as they can process them. However, this is not always desirable. Moab allows you to limit the rate of dequeuing and the number of inflight tasks if needed.
Concurrency Limiting (number of inflight tasks)
Sometimes it is necessary to limit the number of concurrently processing tasks because of, for example, technical
limitations on a number of connections to an underlying database or contractual limitations on a 3rd party integration.
This can be achieved by setting dequeuing_settings.max_inflight_tasks
parameter at the queue level. Default 0
means unlimited. Moab will ensure that at most max_inflight_tasks
tasks are in INFLIGHT
state at any given moment in
time. Please keep in mind that it does not mean that all those tasks are actively being processed by workers, since any
worker can die and some tasks will remain in INFLIGHT
state until keepalive_timeout_in_seconds
expires. When the limit
is reached, Dequeue
returns a successful empty response.
The mechanism for limiting the number of inflight tasks is implemented inside the Moab queue on the server side and the number is maintained regardless of how many workers are pulling tasks from the queue simultaneously.
Rate Limiting
Rate of dequeuing can be limited with Token Bucket algorithm. dequeuing_settings.rate_limiting.max_tokens
is the size
of the bucket. dequeuing_settings.rate_limiting.interval
and dequeuing_settings.rate_limiting.interval_unit
specify
the interval at which the bucket is fully refilled. Units can be SECONDS
, MINUTES
or HOURS
. Here are some examples:
// 1k tasks per second
{
"max_tokens": 1000,
"interval": 1,
"interval_unit": "SECONDS"
}
// 2k tasks per 1 minute
{
"max_tokens": 2000,
"interval": 1,
"interval_unit": "MINUTES"
}
// 15 tasks per hour
{
"max_tokens": 15,
"interval": 1,
"interval_unit": "HOURS"
}
It is possible to spend all tokens at once. Consider the difference between 1 task per second and 10 tasks per 10 seconds. Both buckets are refilled at the same rate (1 token per second), but when buckets are full the latter can release more tokens at once.
If both are configured, rate limiting works together with max_inflight_tasks
as expected: Dequeue
returns a
successful empty response if either of limits is reached.
The mechanism for rate limiting is also implemented inside the Moab queue on the server side and the rate is spread between all the workers which are pulling tasks from the queue. The distribution of the rate between workers is non-deterministic, only the total rate is guaranteed to be limited.
Pause
Entire dequeuing can be paused simply by setting dequeuing_settings.dequeuing_paused
to true. Any worker who pulls a
paused queue will receive a successful empty response. This can be used by an operator in case of emergency when a queue
has tasks in it, but there is a known issue that prevents those tasks from being completed successfully (for example, a
dependency is temporarily unavailable). Returning a successful empty response mitigates any panic behavior from workers.
It looks just like an empty queue for them, no errors, no need to retry requests.
Please note that Dequeue
method is eventually consistent. Queues definitions are cached to reduce the load on control
plane. So any change made by UpdateQueue
, such as changing dequeuing settings, will be propagated here in about 10 seconds.