I want to make out with Graphite’s face.
The baseline on the left is when I started measuring a worker that was feeding off a Delayed::Job queue w/ millions of jobs in it. The first jump in performance is when I realized the index that comes w/ Delayed::Job for MongoMapper does not include locked_by (which it needs to when you start to have millions of jobs in the queue).
Due to the way Delayed::Job is built, when you have millions of jobs of equivalent priority that are all eligible for execution based on their run_at timestamp, your index is going to scan thousands of eligible jobs to see if any are “locked_by” the current worker. (In my case, I have 20-40 workers feeding off this queue). By adding an index of order: priority_1_locked_by_1_run_at_1, the time to find any locked jobs was drastically reduced.
The second jump (towards the very end - when things get more stable) is when I made these three commits to my branch of delayed job for mongo mapper. The first was the conceptual change. #2 was a syntax error. #3 was a brainfart of an omission.
- https://github.com/ryana/delayed_job_mongo_mapper/commit/84ce19678e3258f60bab745cec52ca80e18489f4
- https://github.com/ryana/delayed_job_mongo_mapper/commit/45dfa0093d2f1e78b070dc70a27d176185ecb834
- https://github.com/ryana/delayed_job_mongo_mapper/commit/6b53c0beb9f157de5ddd90ad23acc422a3b041a4
When there are millions of jobs in a queue, and the jobs tend to execute very quickly, the number of jobs locked by a worker that is looking for a job is going to usually be zero. Even though I significantly decreased the time it takes to find a job locked by a particular worker, it still takes a lot longer than saying “give me 1 job that isn’t locked by anyone”. You’ll find one of those jobs almost immediately. This second (series of) modification(s) cut down on the frequency of checking for locked jobs, and it turned my slow query log from a full on pants party to a ghost town. Glorious.
Graphite: you are fucking awesome.
Hey mom I’m on TV again! (in the background…)
SimpleDB gotcha
While carefully reading through the AWS Customer Agreement we found this interesting paragraph:
5.8.2 […] We may delete, without liability of any kind, any of your Amazon SimpleDB Content that has not been accessed in the previous 6 months.
Ouch!
While SimpleDB keeps surprising us, for our EC2 cluster management platform Scalarium we switched to CouchDB and Redis some time ago. Turns out SimpleDB is sometimes too simple.
— @jweiss
I want to replace MySQL w/ Mongo. Convince me I’m wrong.
My main reason for wanting to do it is MongoDB’s schema-less nature. MongoDB would give me zero downtime deploys, and the map/reduce sharding they are starting to harden is just SICK. I’ve been thinking about things that it fundamentally lacks that would deter me, and the only thing I’ve come up with is that it lacks transactions. But I think that’s manageable, and still puts me on the Mongo side of the fence.
Does anyone have any opinions on this? Has anyone built any web apps where MongoDB is the sole database? How did you handle things that normally require transactions?
