StackOverflow Anwser Series
This is a new “concept” post based on StackOverflow Questions/Answers.
Feel free to comment below if this format is helpful & informative!
This Answer covers the following MongoDB topics:
- Data Files
- Compacting Data Files
- The db.repairDatabase(); command
- Capped Collections
Is there a way to auto compact MongoDB Data Files?
The mongodb documentation says that:
To compact this space, run db.repairDatabase() from the mongo shell (note this operation will block and is slow).
I’m wondering how to make the mongodb free deleted disk space automatically?
In general if you don’t need to shrink your datafiles you shouldn’t shrink them at all. This is because “growing” your datafiles on disk is a fairly expensive operation and the more space that MongoDB can allocate in datafiles the less fragmentation you will have.
Also doing any sort of “shrink” will likely be a rather expensive operation and will likely lock your database while it’s running!
So, you should try avoid shrinking and provide as much disk-space as possible for the database.
However if you must shrink the database you should keep two things in mind.
- MongoDB grows it’s data files by doubling so the datafiles may be 64MB, then 128MB, etc up to 2GB (at which point it stops doubling to keep files until 2GB.)
- As with most any database … to do operations like shrinking you’ll need to schedule a separate job to do so, there is no “autoshrink” in MongoDB. In fact of the major noSQL databases (hate that name) only Riak will autoshrink. So, you’ll need to create a job using your OS’s scheduler to run a shrink. You could use an bash script, or have a job run a php script, etc.
$ mongo foo bar.js
// Get a the current collection size. var storage = db.foo.storageSize(); var total = db.foo.totalSize(); print('Storage Size: ' + tojson(storage)); print('TotalSize: ' + tojson(total)); print('-----------------------'); print('Running db.repairDatabase()'); print('-----------------------'); // Run repair db.repairDatabase() // Get new collection sizes. var storage_a = db.foo.storageSize(); var total_a = db.foo.totalSize(); print('Storage Size: ' + tojson(storage_a)); print('TotalSize: ' + tojson(total_a));
This will run and return something like …
MongoDB shell version: 1.6.4 connecting to: foo Storage Size: 51351 TotalSize: 79152 ----------------------- Running db.repairDatabase() ----------------------- Storage Size: 40960 TotalSize: 65153
Run this on a schedule (during non-peak hours) and you are good to go.
However there is one other option, capped collections.
Capped collections are fixed sized collections that have a very high performance auto-FIFO age-out feature (age out is based on insertion order). They are a bit like the “RRD” concept if you are familiar with that.
In addition, capped collections automatically, with high performance, maintain insertion order for the objects in the collection; this is very powerful for certain use cases such as logging.
Basically you can limit the size of (or number of documents in ) a collection to say .. 20GB and once that limit is reached MongoDB will start to throw out the oldest records and replace them with newer entries as they come in.
This is a great way to keep a large amount of data, discarding the older data as time goes by and keeping the same amount of disk-space used.