• Random
  • Archive
  • RSS
  • Ask me anything

Ryan Angilly

A business guy who became a really good software developer first by accident

My slides from Mongo Boston

    • #10gen
    • #document based
    • #gridfs
    • #mongo boston
    • #mongodb
    • #nosql
    • #oss
    • #rails
    • #ruby
    • #sharding
    • #punchbowl
    • #mypunchbowl
    • #punchbowl.com
  • 1 year ago
  • 1
  • Comments
  • Permalink
  • Share
    Tweet

“Open Source Software” isn’t about source code

While thinking about the slides for my upcoming talk at Mongo Boston, I had an interesting thought.  It’s nothing earth shattering, but I wanted to write it down and see where it took me.

Lets take a look at the open source ecosystem surrounding MongoDB, and for this example we’ll focus on the Ruby space and some of the stuff I’ve played with at Punchbowl.  You have MongoDB itself, which is written in C++.  There’s MongoMapper, an ORM.  There’s Rack-GridFS, a Rack middleware for directly accessing files stored in GridFS.  There’s OpenIDAuthentication, a library for doing OpenID auth in MongoDB.  There’s Roachclip, a plugin I wrote for MongoMapper which combines the fun of Thoughtbot’s Paperclip image processing with the ability to store all the assets in GridFS through Joint.  There are literally hundreds of open source software projects out there that anyone can pick up and use in a hobby or in a business.

Some of this software has documentation.  Some doesn’t.  It’s all open source, though.  So you can download it, crack it open in a text editor, and just go figure it out.  See a problem or notice a lack in functionality?  Most of this software is using version control (hi Github); just fork and fix it.  What used to be:

  http://github.com/original_author/sweet_repo

now becomes:

  http://github.com/you/sweet_repo

Maybe your contributions will get pulled into the mainline, maybe your version will remain a fork and start to get used by others, or maybe nobody else will ever use it but you. 

Such is the life of open source source code.  In my opinion, it’s pretty damn cool.  But this coolness is NOTHING compared to what open source is really about.

What I’ve started to realize over the past year is that “Open Source Software” isn’t about the last part of the Github URI: the repository name.  Open Source Software is really about the second to last part: The Author.

Open Source Software is about the people.  MongoDB isn’t just a C++ repository; it’s Kristina, Mike, Kyle, Dwight & Elliot (among others).  MongoMapper, OpenIDAuthentication, Rack-GridFS, Roachclip & Paperclip aren’t just Ruby libraries; they are John Nunemaker, Brandon Keepers, Blake Carlson, yours truly & Jon Yurek (again, among others).

There is obviously value in the code: it does stuff.  The code can also teach you stuff.  The code can show you how to properly build shareable, modular and properly tested pieces of software.  The code can teach you the black arts of meta-programming and the Ruby object model.

But I assure you that no matter how much awesomeness is in the code, there is at least 7 times more awesomeness in the brains of the people who built it.  Those people who first had a need for it.  The people who struggled through the bugs.  The people who had a dozen false starts that might not get reflected by the current HEAD.  That experience is PRICELESS.  And it’s all available if you just ask.

So, ask.  Say hi.

Every single one of those people, and countless more, are “Open Source”.  You can get at them on mailing lists, IRC, and Github Issues.  You can email them directly or tweet at them.  Most of the time, they’ll answer (and don’t get pissed if they don’t answer — sometimes people get busy).

So if you are using Open Source Software just for the source code, you’re missing out.  Use these people.  Ask questions.  Get a conversation going.  That’s the real meaning of Open Source Software.

    • #ruby
    • #mongo
    • #mongodb
    • #mongomapper
    • #paperclip
    • #thoughtbot
    • #name dropping
    • #github
    • #irc
    • #twitter
  • 1 year ago
  • Comments
  • Permalink
  • Share
    Tweet

3 reasons to use MongoDB

Note: This is precursor post to my talk at Mongo Boston on September 20th.  It’s gonna be at Microsoft’s NERD (which I hear is COMPLETELY AWESOME).  If you haven’t signed up yet, stop being lame.  It’s gonna be awesome.  http://www.10gen.com/conferences/mongoboston2010

People have asked me why to use MongoDB.  I used to answer with “it’s SO FREAKING AMAZING!!”, and talk about how “new” and “cool” and “hawt” it felt.  I would wax on about how it felt like the first time I used Rails and so forth.

That was cheating.  It’s a crappy answer of no value.  So today, I am going to give you three reasons to use MongoDB over MySQL, Postgres, Tokyo Cabinet, or CouchDB.

1. Simple queries

MongoDB is a document store with no transactions and no joins.  When an application warrants using this type of database[1], the result is that your queries become much simpler.  They are easier to write.  They are easier to tune.  They make it easier for developers to do their job.  In Punchbowl land, ‘users’ have ‘events.’  There is a table for each, with a user_id on the events table.  Lets say I want to get all the users who have published an event.

In an SQL database, I have two tables: users and events.  I could write this query like so:

SELECT `users`.* from `users` INNER JOIN `events` ON `events`.`user_id` = `users`.`id` where `events`.`published_at` is not null group by `users`.`email`;

Analogously, in a MongoDB database, lets say I have just one collection: users.  Each user document has an attribute called ‘events’, which is a list of embedded documents.  It looks something like this in JSON:

{
  ”name” : “Ryan Angilly”,
  ”events” : [
{
“title” : “First one!”
},
{
“title” : “Whoa!”
},
{
“title” : “Oh hi”,
“published_at” : true
}
  ]
}

To perform the same query in MongoDB query syntax:

db.users.find( { ‘events.published_at’: {$ne: null}}  )

Simpler.  Simpler to read.  Simple to write.  I glossed over the fact that we are drastically changing how we store our data, but that’s the whole point.  And you can clearly see how it makes things easier to understand.

2. Sharding

Sharding is a simple concept.  If you have lots of data and you are getting disk-bound and/or running out of space, take your data and split across several machines.  You get more disk throughput and more storage.  In a perfect world, as your storage and performance needs grow, just add more shards.

MongoDB is pretty close to this perfect world.  If you have a mongod process running, and you want to setup sharding, you:

1) Bring up a new machine
2) Start a new mongod process to act as a member of your shard cluster
3) Start a new mongod process to act as a separate ‘config’ database for maintaining configuration information about which data are in which shard
4) Start a mongos process & tell it how to find the current db, the new shard member, the config database
5) Enter ~5 commands to enable sharding on whichever databases and collections you want
6) Modify your apps to connect to the mongos process instead of the old mongod process
7) Profit.

All intraprocess communication is done over IP, so the configuration mongod process and mongos process can run on their own machines or run on the same machine as one of your shard members.  This can be be done with no downtime, guys.  And you don’t have to have an eye towards sharding when you start.  You can take a regular old mongod process and it will “just work.”

There are solutions to this problem in MySQL [2], but they require massaging data at a layer above the database.  The database itself does not support this feature.  Also, you don’t have to think about sharding until you need it.  You don’t have to pre-optimize.  When you don’t need sharding, just start up a mongod process and go.  When you do need sharding, fire up a few more machines, and issue a few commands.  No downtime.

A common quip that I’ve heard and read is something to the effect of: “How many people reading this post actually have enough data to worry about the need to shard?  Not many.”  My response to this is simple.  Most people who use MySQL w/ master/slave replication probably don’t need that either.  Lots of apps could get away with sqlite and a cronjob that backed the file up every hour.  But MySQL & master/slave replication is the status quo, so we all do it.  Now, think about HD video, geolocation, realtime messaging, augmented reality, closer-to-realtime-satellite imagery.  Think about all that data, and how much faster people will want it (and mashups & derivatives of it), 5 years from years from now.  Then think about what database you want to start using right now.

3. GridFS

For reasons that I’m not experienced enough to talk about, you don’t store files in MySQL.  Let’s say you have an application where a user can upload a profile pic.  The standard practice is to store the path to that file in the database, store the file on the filesystem (a shared filesystem if you have multiple app servers) or S3.  If you use a filesystem, some kind of backup is usually performed as well.  If you have multiple apps, you have to use a shared filesystem.

With GridFS, you store files in the database.  MongoDB was built to do this.  Why is this a “reason to use MongoDB,” because MongoDB has replication and sharding of collections built-int.  And guess what?  You can apply that stuff seamlessly to GridFS collections as well.  When you store assets in MongoDB, you get all the replication and sharding capability for free.  Want to backup your user assets?  Just replicate the GridFS collections.  Running out of space on your NFS share?  Have fun dealing with IT.  Running out of space on your MongoDB GridFS installation?  Bring up another machine and shard that collection.

Storing assets in a database is the way we should be doing it from now on.

FINI

So there you have it.  MongoDB is teh awesome because of a simple query syntax, the ability to shard data across machines easily, and the ability to store files in GridFS while taking advantage of replication & sharding.

If you can make it to Mongo Boston, sign up and come say hi, I’ll be the guy getting lynched by the MySQL and CouchDB fanboys.

[1] “Well when the crap is that?!” you may be asking.  Look for my next post about when you should be using MongoDB.

[2] http://axonflux.com/mysql-sharding-for-5-billion-p

    • #mongodb
    • #mongo
    • #mongo boston
    • #sharding
    • #gridfs
    • #mysql
    • #document-store
    • #sql
  • 1 year ago
  • 25
  • Comments
  • Permalink
  • Share
    Tweet

What is MongoDB?

Note: This is precursor post to my talk at Mongo Boston on September 20th.  It’s gonna be at Microsoft’s NERD (which I hear is COMPLETELY AWESOME).  If you haven’t signed up yet, stop being lame.  It’s gonna be awesome.  http://www.10gen.com/conferences/mongoboston2010

A lot of people come up to me and ask about MongoDB.  Here’s a 101 for those of you still totally in the dark.  

MongoDB is a database

It’s just like MySQL in the sense that you run a daemon, that daemon creates files on a filesystem, and you access it over a network via a client.  A single mongod process runs on one machine, and can have many databases.  A database can have many collections (“tables” in MySQL-speak).  You can write to it & you can query based on attributes of records.  Out of the box, it comes with support for replication and sharding.  It has support for atomic operations.  There are clients for it written in pretty much every popular language.

MongoDB is “schema-less”

In MySQL, you create a table w/ a pre-defined set of typed attributes (Create a ‘users’ table w/ name:string, email:string).  When you write a new record to a table in MySQL, you specify attribute/value pairs: name = ryan.  If you don’t specify a certain attribute (email, in this case) that’s usually ok.  That record will just get a default value for that attribute.  This default value could be null or an empty string or a predefined string, but it’s still there, and it has a value.  Every record in the table will always have the same set of attributes.  If you try to write a record to a table and include an attribute not in that table’s definition (wicked_attractive = true) you’ll get a nasty error .

In MongoDB, you create a “collection” with no pre-defined attributes (Create a ‘users’ collection).  When you write a new document to a collection in MongoDB, you also specify attribute/value pairs: name = ryan.  But in this case, there are no default values.  There is no email attribute on that document.  If you try to write a document to a collection with an attribute that no other document has (wicked_attractive = true), MongoDB will be ok with it.  This is a key point:  documents within the same collection can have different sets of attributes.

MongoDB is a “document-store”

This is closely related to being “schema-less.”  In MySQL, you define a set of attributes for a table.  Rows get inserted into tables, and the rows are 1 dimensional.  What I mean by 1 dimensional is that all of the pieces of data in a row are first class citizens.  The number of pieces of information in a row equals the number of attributes defined for that table.

MongoDB lets you store arbitrarily complex documents (think JSON).  The following document can be stored in the users collection:

{
 name: ‘Ryan’,
  email: ‘ryan@angilly.com’,
  likes: [‘mongodb’, ‘skiing’, ‘Red Sox’, ‘Boulder chicks’],
  dislikes: [‘humidity’, ‘Sarah Palin’, ‘bigotry’, ‘The Yankees’],
  current_outfit: {
    pants: ‘blue shorts’,
    shirt: false,
    shoes: ‘flip-flops’,
    undies: ‘wouldn't you like to know’
  }
}

In this case, there are 5 “top level” attributes, but 14 “pieces of data.”

Along with standard DB types (string, integer, float, datetime, boolean), MongoDB also has arrays and hashes as native types.  In this ‘users’ document, you have an embedded ‘current_outfit’ document, but ‘current_outfit’ isn’t a collection.  It’s just an embedded document inside of this particular user document.  You also have lists of likes and dislikes.  The elements in a list do not have to be the same type.

You can put indexes on “deep” attributes.  In MySQL, you can put an index on `users`.`email` to speed up queries on that attribute.  In MongoDB, you can put indexes on any attribute in the document.  In our previous example, for… example…, you can put an index on users.current_outfit.shirt and quickly query to see who is topless.  If you put an index on an array type (users.likes), you’d be able to quickly query for any user who ‘liked’ ‘twitter’, and quickly get a result.

MongoDB is “NoSQL”

To query MySQL, you use (surprise, surprise) SQL:

SELECT * FROM `users` WHERE `users`.`email` = ‘ryan@angilly.com’ limit 1;

SQL is a very powerful language, where different types of joins give you the power to issue a single query that effectively spans multiple tables, and can return a result set with data from multiple tables.

There is no SQL in MongoDB.  You query a MongoDB database by issuing a query command and passing along a hash.  In the mongo shell console (which is JavaScript) we use JSON:

db.users.find({email: ‘ryan@angilly.com’}).limit(1);

There are $ operators for doing different types of inequalities, lat/long distance calculations, regex matches, etc….

When you issue a query to a MongoDB database, you cannot ask for stuff from two collections at once.  There are no joins.  However, keeping with our last example, if you query for a user with the email ‘ryan@angilly.com’ you would get back the entire user document we stored — with likes, dislikes, and current_outfit included.  This is what people mean when they say “not having joins is ok because you don’t need them.”  You can embed arbitrarily complex data inside a document, and get it all at once.

MongoDB is different (has downsides)

MongoDB is different.  And anytime something is different, it has downsides from what you’re used to.  Out of the box, MongoDB will acknowledge a write has completed before it’s on disk (although this is tunable on a write-by-write basis).  MongoDB does not have transaction support (but after designing an app from the ground up with documents, you find you rarely need them).  MongoDB will not make you more attractive to the opposite sex (although I hear they are working on it for 1.8).

I hope this has give you some insight into what MongoDB is.  If you can make it to Mongo Boston, come say hi.  I’ll be the wicked attractive topless guy.  http://www.10gen.com/conferences/mongoboston2010

    • #mongodb
    • #mongoboston
    • #nosql
    • #sql
    • #database
    • #10gen
    • #schema-less
  • 1 year ago
  • Comments
  • Permalink
  • Share
    Tweet

My talk on @mongodb at #mongoNYC a few weeks ago.  Audio is a little tough, but it came out pretty good otherwise I think

    • #mongodb
    • #nerdalert
    • #speaking
  • 1 year ago
  • Comments
  • Permalink
  • Share
    Tweet

I’m presenting MongoDB for Dummies at MongoNYC

Just confirmed that I’ll be speaking at MongoNYC on May 21st.  You can register here:http://www.10gen.com/event_mongony_10may21

Meghan from 10gen was nice enough to provide a 25% off discount code.  Just register with “punchbowl” and you’ll get hooked up.

My talk will be a reprise of the talk I did at MongoSF last week, with some additional information on the ‘Top Secret Project’ that I mention in the slides.

    • #MongoDB
    • #nerdalert
    • #presentation
    • #MongoNYC
  • 2 years ago
  • Comments
  • Permalink
  • Share
    Tweet

I’m speaking at MongoSF

I’m wicked excited to announce that I’m flying out to San Francisco at the end of the month to speak at MongoSF.  The conference is a single day, multi-track conference that’s going to run the gammit: programming workshops, MongoDB internal discussions, the current state of the OSS ecosystem around MongoDB, and (the reason I’m going) several presentations on examples of production deployments.

To be honest, when I saw the rest of the people presenting at this conference I gulped a little bit.  There are some giants presenting, and I’m psyched to hear about all the cool stuff that is going on at the cutting edge.

What I’m also psyched about is the wide range of production deployment talks.  On the “this is how you use MongoDB like a Big Dog” end of things there is a talk I am very excited to hear by David Mytton, founder of Boxed Ice, entitled Humongous Data at Server Density: Approaching 1 Billion Documents in MongoDB which I’m sure will be a blast.

And at the other end: my talk :-)

Back in October, we made the decision at Punchbowl to use MongoDB alongside MySQL as part of a non-mission-critical new application we were building (since then it has become mission-critical).  I’ll be discussing the decision process to use MongoDB, implementation details of how we got up and running, unexpected issues we ran into, and how they were resolved.  I’ll also give some insight into how I feel the entire process has benefitted our engineering team and our company’s bottom line.

And because I’m opinionated, I might also throw in 5 minutes of “IMHO” with regards to some of the NoSQL vs. SQL “debate” that has been brewing in the blogosphere over the past six months.

There are two audiences I’ll be addressing in my talk.  First and foremost, I’ll be speaking to developers who have never used MongoDB.  Developers that think MongoDB looks cool from all the blog posts they’ve read, but are hesitant about “going through all the trouble.”

Second, I’ll be dropping in a few nuggets aimed at the business guys & gals: engineering managers, founders, CEOs, etc….  The people who guard their developer man-hours like hawks and who have even more hesitation, but without any of the romantic feelings about “how cool MongoDB sounds”.  I’ll try to assuage your anxieties and show you that, in the end, giving developers the chance to experiment with this stuff will pay dividends down the road.

So sign up for MongoSF today if you haven’t already.  I’m excited to see you there.

    • #MongoDB
    • #cool
    • #ruby
    • #speaking
    • #startups
  • 2 years ago
  • Comments
  • Permalink
  • Share
    Tweet
damnit ruby.
View Separately

damnit ruby.

    • #ruby
    • #MongoDB
    • #Nerdalert
  • 2 years ago
  • Comments
  • Permalink
  • Share
    Tweet

I want to replace MySQL w/ Mongo. Convince me I’m wrong.

My main reason for wanting to do it is MongoDB’s schema-less nature.  MongoDB would give me zero downtime deploys, and the map/reduce sharding they are starting to harden is just SICK.  I’ve been thinking about things that it fundamentally lacks that would deter me, and the only thing I’ve come up with is that it lacks transactions.  But I think that’s manageable, and still puts me on the Mongo side of the fence.

Does anyone have any opinions on this?  Has anyone built any web apps where MongoDB is the sole database?  How did you handle things that normally require transactions?

    • #mysql
    • #mongodb
    • #dba
    • #nerdery
  • 2 years ago
  • 1
  • Comments
  • Permalink
  • Share
    Tweet

Portrait/Logo

About

Hi, I'm Ryan, and I build stuff on the internet. I'm currently building Signal Genius.

I blog about my failed startup, MessageSling, at The Day Series.

Things I used to do:

  • Built and launched FourthSegment
  • Hacked at Punchbowl.com.
  • Founded MessageSling.com.
  • Spent several years at EMC

Me, Elsewhere

  • @angilly on Twitter
  • Facebook Profile
  • angilly on Flickr
  • angilly on Foursquare
  • My Skype Info
  • ryana on github

Twitter

loading tweets…

Following

I Dig These Posts

  • Photo via tmills

    A serious bath-taking bear.

    [via]

    Photo via tmills
  • Photo via tmills

    I read this thing on Vice tonight about how girls hate girls even when they’re friends and while all things women are forever hermetically sealed...

    Photo via tmills
  • Link via graysky
    cdixon.org – chris dixon's blog / Best practices for raising a VC round

    (via Instapaper)

    Link via graysky
  • Photo via dancroak

    Puppy.

    Photo via dancroak
See more →
  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr