Getting Started With MongoDB GridFS

in GridFS, PHP

One really useful built-in feature of MongoDB is it’s GridFS.

This filesystem within MongoDB was designed for … well, holding files, especially files over 4MB … why 4MB?

Well BSON objects are limited to 4MB in size (BSON is the format that MongoDB uses to store it’s database information) so GridFS helps store files across multiple chunks.

As Kristina Chodorow of 10Gen puts it

GridFS breaks large files into manageable chunks. It saves the chunks to one collection (fs.chunks) and then metadata about the file to another collection (fs.files). When you query for the file, GridFS queries the chunks collection and returns the file one piece at a time.

Why would you want to break large files in to “chunks”? A lot of of comes down to efficient memory & disk usage.

Chunks ‘O Random-Access Memory

Gee, mister. You’re even hungrier than I am.

Say you want to store larger files (maybe a 2GB video) when you preform a query on that file all 2GB needs to be loaded into memory … say you have a bigger file, 10GB, 25GB etc … it’s quite likely you’d run out of usable RAM or not have that much RAM available at all!

So, GridFS solves this problem by streaming the data back (in chunks) to the client … this way you’d never need to use more than 4MB of RAM.

Other Reasons to Use GridFS

Some other nicities of GridFS are …

  • If you are using replication or autosharding your GridFS files will be seamlessly sharded or replicated for you.
  • Since MongoDB datafiles are broken into 2 GB chunks MongoDB will automatically break your files into OS manageable pieces.
  • You won’t have to worry about OS limitations like ‘weird’ filenames or a large number of files in one directory, etc.
  • MongoDB will auto generate the MD5 hash of your file and store it in the file’s document. This is useful to compare the stored file with it’s MD5 hash to see if it was uploaded correctly, or already exists your database.

Command Line: mongofiles

An easy way to get started and see how GridFS works is the use the mongofiles command line utility (if you downloaded the binaries of MongoDB you should file this tool in the bin directory.)

To make things easy, mongofiles accepts RESTful looking commands, for example …

$ ./mongofiles -d myfiles put 03-smbd-menu-screen.mp3
connected to: 127.0.0.1

added file: {
   _id: ObjectId('4ce9ddcb45d74ecaa7f5a029'),
   filename: "03-smbd-menu-screen.mp3",
   chunkSize: 262144,
   uploadDate: new Date(1290395084166),
   md5: "7872291d4e67ae8b8bf7aea489ab52c1",
   length: 1419631 }

done!

This uploaded (PUT) the 03-smbd-menu-screen.mp3 file to a database called myfiles (it could be any database.)

This file now resides in the myfiles DB in the fs.files Collection. We can confirm this by passing the list command.

$ ./mongofiles -d myfiles list
connected to: 127.0.0.1
03-smbd-menu-screen.mp3 1419631

Hurrah! We have our files in there …you can also query it via the MongoDB Shell like so …

> use myfiles;
> db.fs.files.find({});
{
   "_id" : ObjectId("4ce9ddcb45d74ecaa7f5a029"),
   "filename" : "03-smbd-menu-screen.mp3",
   "chunkSize" : 262144,
   "uploadDate" : "Mon Nov 22 2010 03:04:44 GMT+0000 (UTC)",
   "md5" : "7872291d4e67ae8b8bf7aea489ab52c1",
   "length" : 1419631
}

Note: the size, upload date & md5 are all produced for you which is pretty  handy.

Uploading a File (or Data) via MongoDB Driver

Likely a more realistic way of storing files in GridFS will be via one of the many available language drivers. Each driver handles GridFS a little differantly but the concepts are the same.

The first thing you need to sort is out is are you going to upload actual files or are you going to create files from strings of data?

For example an application that allows a user to upload a video directly from their computer to your application would use the file method … however an application that would take a profile image (for example) and compress and resize it for use in your application would likely use the string of data method.

The File Method

In this example we’ll assume the file is already in your filesystem in the /tmp/ dir, but the file could be wherever your web-server/PHP is configured to access.

To work with GridFS files in PHP you use MongoGridFS class, more information can be found in the documentation.

We will use MongoGridFS::storeFile but you could also use MongoGridFS::put (which works like the command line example.)

<?php

// Connect to Mongo and set DB and Collection
$mongo = new Mongo();
$db = $mongo->myfiles;

// GridFS
$grid = $db->getGridFS();

// The file's location in the File System
$path = "/tmp/";
$filename = "03-smbd-menu-screen.mp3";
// Note metadata field & filename field $storedfile = $grid->storeFile($path . $filename, array("metadata" => array("filename" => $filename), "filename" => $filename));
// Return newly stored file's Document ID echo $storedfile; ?>

The String of Data Method

String of data is very similar only we’ll pass a string instead of a file/path so use the code above but use storeBytes instead.

$storedfile = $grid->storeBytes("This is test file data!",
                 array("metadata" => array("filename" => $filename),
                 "filename" => $filename));

You could of course pass any string, string representation of a images (or an encoded file via a string) where we’ve put “This is test file data!” …

A Little About Metadata

For PHP it doesn’t really matter but since other drivers handle things slightly differently it’s best to write any metadata to it’s own metadata field as well as a separate filename field as we have done in the example above.

You can put any file metadata that makes sense for your use in the metadata field.

Stream Back Files

Now that our file or files are loaded into GridFS streaming back the file is farily simple …

  • Connect to MongoDB
  • Do a findOne() on the file
  • Load it into memory using getBytes()
  • Set the proper headers
  • Stream the file back to the browser

So, here is how we’d stream back an image in PHP …

Stream an Image from GridFS to the Browser

Warning: this will load the file into memory. If the file is bigger than your memory, this will cause problems!

<?php
// Connect to Mongo and set DB and Collection
$mongo = new Mongo();
$db = $mongo->myfiles;     

// GridFS
$gridFS = $db->getGridFS();     

// Find image to stream
$image = $gridFS->findOne("chunk-screaming.jpg");

// Stream image to browser
header('Content-type: image/jpeg');
echo $image->getBytes();

?>

With a little adjustment you could stream back an mp3, or video, or prompt for a file download, etc.

Other Ways to Search for GridFD Files

You could also use the Document’s ID …

$image = $gridFS->findOne(
         array("_id" => new MongoId("4ceb167810f1d50a80e1c71c"))
         );

That will likely be how your application would look up a file in a real world system.

You can use any valid MongoDB findOne() query in it’s place as well, or use find() to get back a GridFS cursor of files, you can find out more about that here.

Deleting Files

You delete files in the same way, there are actually a couple ways to remove GridFS files, but we’ll just use one of the easiest …

Be really careful about passing the correct query to remove or you might just find yourself removing all your files! You can also use MongoGridFS::delete and pass the Document’s ID only.

<?php

// Connect to Mongo and set DB and Collection
$mongo = new Mongo();
$db = $mongo->myfiles;

// GridFS
$gridFS = $db->getGridFS();

// Find file to remove 
$removeFile = $gridFS->remove(
                array("_id" => new MongoId("4ceb167810f1d50a80e1c71c"))
              );

?>

Wrap Up

Hopefully you can now get started with GridFS and see if it will work well for your application … remember if you stream back (using the image example above) the files they will be loaded into memory and not streamed in 4MB chunks …so be careful!

Have fun.

9 Comments