I want to make out with Graphite’s face.
The baseline on the left is when I started measuring a worker that was feeding off a Delayed::Job queue w/ millions of jobs in it. The first jump in performance is when I realized the index that comes w/ Delayed::Job for MongoMapper does not include locked_by (which it needs to when you start to have millions of jobs in the queue).
Due to the way Delayed::Job is built, when you have millions of jobs of equivalent priority that are all eligible for execution based on their run_at timestamp, your index is going to scan thousands of eligible jobs to see if any are “locked_by” the current worker. (In my case, I have 20-40 workers feeding off this queue). By adding an index of order: priority_1_locked_by_1_run_at_1, the time to find any locked jobs was drastically reduced.
The second jump (towards the very end - when things get more stable) is when I made these three commits to my branch of delayed job for mongo mapper. The first was the conceptual change. #2 was a syntax error. #3 was a brainfart of an omission.
- https://github.com/ryana/delayed_job_mongo_mapper/commit/84ce19678e3258f60bab745cec52ca80e18489f4
- https://github.com/ryana/delayed_job_mongo_mapper/commit/45dfa0093d2f1e78b070dc70a27d176185ecb834
- https://github.com/ryana/delayed_job_mongo_mapper/commit/6b53c0beb9f157de5ddd90ad23acc422a3b041a4
When there are millions of jobs in a queue, and the jobs tend to execute very quickly, the number of jobs locked by a worker that is looking for a job is going to usually be zero. Even though I significantly decreased the time it takes to find a job locked by a particular worker, it still takes a lot longer than saying “give me 1 job that isn’t locked by anyone”. You’ll find one of those jobs almost immediately. This second (series of) modification(s) cut down on the frequency of checking for locked jobs, and it turned my slow query log from a full on pants party to a ghost town. Glorious.
Graphite: you are fucking awesome.

