Q&A: Kyle Banker Author of MongoDB in Action

in Books, Q & A, Ruby

Kyle Banker is a Software Engineer at 10gen (the company responsible for creating and maintaining MongoDB.)

At 10gen, Kyle maintains the MongoDB Ruby Driver and supports the Ruby developer community. He is also currently authoring the upcoming book MongoDB in Action.

Kyle has given a number of talks at various conventions as well as at MongoDB conferences in San Fransisco, New York, London, Paris, and more. You can view some of Kyle’s presentations in our video sections: Ruby / Schema Design

Follow Kyle on Twitter @Hwaet

Intros

LM: Can you tell us a little about yourself?

Sure.

Studied philosophy and lit in college. Taught high school English and choir. Then became a software developer. Got interested in databases, fell in love with MongoDB, built try.mongob.org, and currently have the great fortune of working for 10gen.

Ruby Driver

LM: You maintain the MongoDB Ruby driver … how closely does the driver match the core MongoDB features? Are any features missing?

I think that the driver has a pretty nice API, overall, which isn’t far from what you get in the MongoDB shell. Those of us at 10gen who work on drivers have tried hard to provide a consistent API, making it easy to move from one driver to another.

We’re currently trying to decide how the drivers can best support replica sets. It’s likely that you’ll soon be able to configure your driver connection to send reads to secondary nodes; you’ll also be able to set a write concern for a given connection, database, and operation.

LM: Is there anything unique or interesting about the Ruby driver, were there any challenges in creating it?

Well, I can’t claim to be the original author of the driver. One of 10gen’s early employees wrote the first version; Mike Dirolf then took it over and authored the C extensions for BSON.

But over the past year, I’ve done some pretty significant refactoring, rewritten the GridFS implementation, and increased performance by about 30%. I’ve also done a lot of work on a JRuby branch, which, it turns out, isn’t as fast as we’d like. So I’m going to be trying a difference approach. As you might be able to tell, the biggest challenges with Ruby surround performance.

MongoDB Core

10gen’s Kyle Banker Presenting MongoDB at RailsConf

LM: In your opinion, what is the most interesting or exciting feature currently in the works for the next verision of MongoDB?

The next stable version of MongoDB will be 1.8; 1.7 development releases will start appearing this week. The one definite improvement that I can speak about for 1.8 is single-server durability.

We’ve heard enough concern about MongoDB’s lack of single-server durability to warrant this new feature; it’ll be implemented in the standard way, as a kind of append log that’s replayable in the event of a server crash.

Personally, I’d still much rather rely on replication for durability since that maintains the highest performance and ensures the fastest recovery. But for those who need it, the single-server option will be there.

LM: MongoDB’s development process is somewhat unique given that it’s open source and features are driven by user request/demand … yet it is produced by a commercial company. What advantages and/or disadvantages has this model created?

Probably the biggest advantage is that there are currently twelve engineers who are paid to work on the project full-time. This means that the core server and drivers advance at a pretty brisk pace; bugs are frequently fixed the same day they’re reported; and users get tons of free, near-instantaneous, expert support via the mongodb-user list and IRC.

That the technology is open-source means that developers get to use and learn from it for free. And for those enterprises that need support or training, there’s a highly qualified company there to provide it.

I’ve seen some people take issue with the core server’s AGPL licensing (the drivers are Apache-licensed). But all this means in practice is that if you modify the core server software, you have to publish the changes. That may be perceived as a disadvantage; I view it more as fair trade-off.

MongoDB In Action

LM: You are currently in the process of writing a MongoDB book MongoDB in Action. What are your goals in writing this book? Do you have a particular audience in mind?

I love a good technical book, especially if it reads well, bears re-reading, and ventures into the realm of its parent topic. David Flanagan’s books are like this. If you’ve read JavaScript: The Definitive Guide or The Ruby Programming Language, you see that these books talk a lot about programming languages in general and have a way of making every word count. If I can approach that ideal, I’ll consider MongoDB in Action a success.

The book is geared toward application developers. It includes a lot about schema design, including a design-pattern appendix, and it uses e-commerce at its primary domain for the examples. But substantial portions are also devoted to deployment, replication, auto-sharding, and troubleshooting.

LM: Given MongoDB’s uncommon process and pace of development, how did you chose what areas and features to focus on while making sure your book stands the test of time?

Certainly technical books are anything but timeless. But features like MongoDB’s query language, indexing strategy, and data model are all pretty much set in stone, so I don’t anticipate any obsolescence there. One advantage of this book is that it’s going to be published after the release of MongoDB 1.8.

Since replica sets and sharding only appeared in production within the last month, we’ll see a lot of maturation and new features over the next couple months, and all of this will make its way into MongoDB in Action.

LM: Is there a publish/due date for a book yet? When might we expect to be able to get our hands on it?

Yes. The book will be available in print form by February 2011. The writing is nearly half-way complete, and chapters are already being released as part of Manning’s early-access program (http://manning.com/banker).

Extra Credit

LM: As a self professed linguaphile: what is your favorite word and why? (And hwæt doesn’t count.)

Well, I can’t say that I have a favorite word, but since you mention “hwæt,” let me say that I do really love single-syllable Germanic words: they’re one of the great features of English. Here are a few choice morsels: bridge, daft, oak, glib, knob, chest, and smidge. Say those aloud a few times and you’ll see what I mean.

0 Comments

Q&A: Ishaan Kumar Creator of MongoVUE

in GUI, Q & A

One “soft spot” of MongoDB is it’s lack of any official GUI tools.

There are a number of web-based tools out there, and their feature set is growing but until recently there were no clients for the most popular OS in the world: Windows.

MongoVUE

That is all starting to change and one person helping that happen is Ishaan Kumar creator of MongoVUE a Windows GUI for MongoDB.

While still in it’s early stages MongoDB is a great tool to help you learn and administer MongoDB. We reached out to Ishann to ask him a few questions about MongoVUE, it’s features and it’s future …

Stay tuned for more Q & As in the coming weeks and months we have a number of them lined up with high profile people in the MongoDB community!

Feel free to ask any further questions of Ishaan in the comments or visit the MongoVUE blog or download it here.

Q & A With Ishann Kumar

Can you tell us a little bit about yourself?

I am a software developer based in India and have worked on a number of different technologies and programming languages. I work on MongoVUE in my spare time.

Why did you create MongoVUE?

When I started using MongoDB, I searched online for GUI tools or applications. I found a few web based projects but no native applications for Windows OS. So I decided to create one as I strongly feel that the speed and flexibility of a desktop app cannot match that of a web based app.

What interesting challenges have you encountered using Mongo with a Windows desktop app?

A few [of] my users are successfully using MongoVUE for administering their production environment.

Sam Corder and other programmers have done an excellent work with the C# driver which is used in MongoVUE. That made interacting with the server easy.

The challenge was to keep the GUI fast and spiffy. So I resorted to using Lazy Loading pattern and Asynchronous programming for increasing (perceived) speed. A bit of work is still pending in this area – for example, making the GUI non-blocking when a command is executed.

What are some of your short term goals for MongoVUE?

There are a few basic features missing in MongoVUE, like

  • Inline editing / Updating
  • GridFS support

The short term goal is to get these built in MongoVUE and to enhance the usability of existing features. You can see the roadmap here.

What are some of your longer term goals for MongoVUE?

The project goal is to keep innovating and building good features in MongoVUE. Some items that I’d like to explore long term are:

  • Support for Linux through Mono.
  • An interactive shell like the one that comes with MongoDB.
  • Data Visualizations – ability to fetch data not just as a series of documents but as interactive charts and models. Some preliminary work in this area will be available in the next release 0.3.1.
  • Data Modeling and Schema design. Although MongoDB is schemaless, there is much value in having a documented schema model for collaboration and communication in team members.

Putting MongoVUE aside, what excites you most about MongoDB?

MongoDB with its web scale and rich query language makes it a compelling choice for large projects. What excites me most about MongoDB is that it unleashes a new paradigm in representing and storing data models which were difficult to achieve in the relational world.

****************

Feel free to ask any further questions of Ishaan in the comments or visit the MongoVUE blog or download it here.

5 Comments

Quick Tip: MongoDB Distinct Count

in Querying, Quick Tip

One query SQL users are pretty used to writing is DISTINCT with COUNT() to get the number if distinct (unique) rows that match a statement like …

SELECT COUNT(DISTINCT(PageURL)) FROM LogTable;

This rather simple query will get us back the number of unique PageURL’s in our LogTable … not an amazingly useful query but you get the point.

Getting Back Distinct Counts

Since MongoDB is often used for logging you might find you will often need such a query to get quick stats your boss just asked you for (but he “needed like yesterday”) … so how do we do this in MongoDB?

This is where things get a little weird if you are used to SQL.  Basically (as of the writing of this blog post) there is no actual way to do the same thing in MongoDB.

However that doesn’t mean it isn’t fairly to get your count … you just need to start thinking a little differantly!

MongoDB provides both count() and distinct() …

> db.logCollection.count();
> db.logCollection.distinct("pageURL");

But how do we use both at the same time? Remember the MongoDB shell is also a JavaScript interpretor, meaning we aren’t only restricted to queries … we can also writing code to mingle in with our queries.

JavaScript is Your Friend

There are two basic ways of doing this, using straight forward JavaScript (for simple queries) or Map-Reduce (link | link link).

For this quick tip we’ll just handle how to get your distinct count with Javascript’s length.

Fire up your shell and type in:

> db.logCollection.distinct("pageUrl").length;

This will query the number of distinct pageUrl’s in your Collection and then get the “length” of the cursor (of documents) that is returned … providing you with the distinct count.

Sweet!

Note: don’t add the () to length or you will throw an error.

10 Comments

MongoDB+PHP: Install and Connect

in PHP

This is Part 1 of a series of short posts on how to use MongoDB with PHP.

Installing MongoDB support for PHP is really easy!  Here’s a short howto guide for installing MongoDB for PHP and preforming a simple query.

Install MongoDB for PHP Support

Unix/Linux

Via your command line run pecl:

$ sudo pecl install mongo

If you get an error saying the system can’t find `phpize` then you may need to install the PHP dev package …

$ sudo aptitude install php5-dev

You will then need to edit your php.ini file add add the mongo.so extension:

extension=mongo.so

Just restart your webserver and you are done.

Windows

Installing on windows is just as easy. Firsst, you will need to get the binaries from here and then add the proper .dll to your php.ini:

extension=php_mongo.dll

Create Test Collection

Now we will create a test collection in the default ‘test’ database … bring up your mongoDB shell by running:

$ mongo

Then, insert a document like the following:

MongoDB shell version: 1.6.1
connecting to: test
> doc = { "hello" : "world", "php" : "is alive!" };
> db.phptest.save(doc);

Now let’s just query it to make sure everything is good …

> db.phptest.findOne();
{
        "_id" : ObjectId("4c75add543dd9c1e48177f49"),
        "hello" : "world",
        "php" : "is alive!"
}

Great!

Connect to MongoDB

Next, we’ll connect to our MongoDB server.

Create a PHP file and add the following:

<?php
$connection = new Mongo();
$db = $connection->test;
$collection = $db->phptest;
?>

This will connect us to the ‘test’ Database and then to our ‘phptest’ Collection.

Query and Display MongoDB Data

Now, add these lines below your connection:

<?php

$connection = new Mongo();
$db = $connection->test;
$collection = $db->phptest;

$obj = $collection->findOne();
echo "<h1>Hello " . $obj["hello"] . "!</h1>";

echo "<h2>Show result as an array:</h2>";
echo "<pre>";
print_r($obj);
echo "</pre>";

echo "<h2>Show result as JSON:</h2>";
echo "<pre>";
echo json_encode($obj);
echo "</pre>";

?>

Hello World

Save your PHP page and bring it up in your browser.

You should get back:

  • Hello World
  • An array showing your document.
  • As well as your document in JSON.

What Just Happened?

We ran the findOne(); command against our Collection …

$obj = $collection->findOne();

We got back $obj which contained the Document we inserted earlier. Then we got the value of the “hello” attribute (which is “world”) of the object $obj …

echo "<h1>Hello " . $obj["hello"] . "!</h1>";

Then we pass the same object to print_r() … to see it as an array and then to json_encode() … to see it as JSON.

What’s Next …

In the next part of this series we’ll handle more complex queries as well as inserts and updates.

9 Comments

MongoDB 32bit is Limited to ~2.5GB, Why?

in Administration

If you have a 32bit server and are running MongoDB you may have noticed that MongoDB 32bit is limited to ~2.5GB databases, why?

Simple anwser … MongoDB embrarses the future, and the future is 64bit! The nerdy (and more complicated) answer is down to how MongoDB’s storage engine works.

Storage Engine

The MongoDB storage engine (that stores and queries the database files) utilizes something called memory-mapped files (see wikipedia or MSDN) this allows for greatly increased performance.

Here is a easy to swallow quote

Modern operating systems support the mapping of files to virtual memory, “MMF” in short. The effect of MMF is that the entire contents of a file appears as in-memory.

Basically, the data on disk (which is slow to read from) is “mapped” in memory (which is way faster to read from.)

So using this method allow for super-speedy reads, etc. (which is why we love MongoDB isn’t it?)

Or, quote MSDN (Warning: nerd alert) …

Memory-mapped files (MMFs) … allow applications to access files on disk in the same way they access dynamic memory through pointers … you can map a view of all or part of a file on disk to a specific range of addresses within your process’s address space … So, writing data to a file can be as simple as assigning a value to a dereferenced pointer.

32bits is so 1999 or 1989 or whatever …

The problem arises when using 32 bit processors, if you have a database bigger then 2 point somethin’ GB … suddenly it’s not supported and you can’t replicate it!?

Why? 32bits are why …

A 32-bit architecture such as Intel’s IA-32 can only directly address 4 GiB or smaller portions of files.

A 32-bit computer can address 232 = 4,294,967,296 bytes of memory, or 4 gibibytes (GiB) …

Since 32bit architecture doesn’t have the ability to “address” bigger files (which MongoDB considers ~2.5GB) they aren’t supported.

64bits is so now (er or 1976)!

64bit servers have actually been around for a long time (since the 70s in fact) but only now have they moved from “super-computers” like the amazing comfy Cray-1 (see picture to the right) to downright affordable ”commodity” hardware.

Note: 64bit server no longer come with built-in couches.

One of the big advantages is that a 64bit computer can “address” a lot more memory: 264 bytes (or 16 exbibytes) to be exact … need I say, that is that is a lot.

Here is what MongoDB has to say itself

By not supporting more than 2gb on 32-bit, we’ve been able to keep our code much simpler and cleaner.  This greatly reduces the number of bugs, and reduces the time that we need to release a 1.0 product. The world is moving toward all 64-bit very quickly.  Right now there aren’t too many people for whom 64-bit is a problem, and in the long term, we think this will be a non-issue.

So perhaps you should consider stepping up to 64bit!

1 Comment