Quick Tip: MongoDB Distinct Count

in Querying, Quick Tip

One query SQL users are pretty used to writing is DISTINCT with COUNT() to get the number if distinct (unique) rows that match a statement like …

SELECT COUNT(DISTINCT(PageURL)) FROM LogTable;

This rather simple query will get us back the number of unique PageURL’s in our LogTable … not an amazingly useful query but you get the point.

Getting Back Distinct Counts

Since MongoDB is often used for logging you might find you will often need such a query to get quick stats your boss just asked you for (but he “needed like yesterday”) … so how do we do this in MongoDB?

This is where things get a little weird if you are used to SQL.  Basically (as of the writing of this blog post) there is no actual way to do the same thing in MongoDB.

However that doesn’t mean it isn’t fairly to get your count … you just need to start thinking a little differantly!

MongoDB provides both count() and distinct() …

> db.logCollection.count();
> db.logCollection.distinct("pageURL");

But how do we use both at the same time? Remember the MongoDB shell is also a JavaScript interpretor, meaning we aren’t only restricted to queries … we can also writing code to mingle in with our queries.

JavaScript is Your Friend

There are two basic ways of doing this, using straight forward JavaScript (for simple queries) or Map-Reduce (link | link link).

For this quick tip we’ll just handle how to get your distinct count with Javascript’s length.

Fire up your shell and type in:

> db.logCollection.distinct("pageUrl").length;

This will query the number of distinct pageUrl’s in your Collection and then get the “length” of the cursor (of documents) that is returned … providing you with the distinct count.

Sweet!

Note: don’t add the () to length or you will throw an error.

10 Comments
  • Xinyu Wang1

    Very helpful. Thanks.

    • http://learnmongo.com/ Learn Mongo

      No problem, hope you enjoy the blog!

  • Gividumbuster

    Hi, great tip!
    Do you have any idea how you can perform .length using the PHP driver?

    • Shine Lee

      i have the same question

      • Hjbcn

         Just array count the result, I guess ^_^

  • http://mushfiq.com/ Mushfiq

    But it dont works when you have a large set of documents.And returns Error: uncaught exception: distinct failed: { “errmsg” : “assertion: distinct too big, 4mb cap”, “ok” : 0 } 
    distinct too big, 4mb cap

  • Velia

    What about if I want to know the total per row?.. I mean something like this in mysql:

    SELECT pageURL, COUNT(pageURL)
    FROM LogCollection
    GROUP by pageURL;
     

    • LAFK

      Velia,

      I’d go with map-reduce. You will have to spent 30 minutes learning it, then another 5 with tweaking examples, as counting elements is basic map-reduce example.

      Generally – do not think SQL with Mongo, you will have a hard time. It’s a different beast, as this tip shows as well.

      Justin
      Thank you. I tried combining find and distinct and count and thanks to your post realized that’s not the best way to go for my purpose. :-)

      Good thing to remember: Mongo and Javascript. Doh!

    • Surya

      Use mongo 2.2+ aggregation framework like:
      db.logCollection.aggregate([{ $group : { _id : "$pageUrl", total : { $sum : 1 } } }]) 

  • Otreby

    smart ! thanks for the tip!