Maintaining a MongoDB Replica Set requires occasional maintenance and upgrades. Sometimes, this necessitates taking the primary node offline or converting it to a secondary node.
Learn more about different Replica Set roles, Replica Sets: Member Roles and Types
This guide will walk you through the steps needed to safely perform this task.
Why Would You Need to Step Down a Primary?
There are several reasons you might need to step down a primary:
- Server Maintenance: Routine checks, updates, or hardware upgrades.
- Software Upgrades: Upgrading the MongoDB version.
- Backups: Performing comprehensive backups without affecting the primary node’s performance.
Identifying the Primary Node
Before stepping down your primary node, you need to identify which node is currently serving as the primary.
You can do this by running the rs.hello()
command and looking for the primary
field:
> rs.hello().primary
localhost:27022
The output will display the address of the primary node.
Stepping Down the Primary
The “step-down” process ensures that the primary node finishes any ongoing tasks (such as write operations or index building) before stepping down. It also ensures that at least one secondary node is caught up with the primary’s oplog.
Learn more about MongoDB’s oplog, Understanding the local Database in MongoDB
To initiate this process, connect to your primary node and run the rs.stepDown()
command:
> rs.stepDown()
{
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1673935033, i: 1 }),
signature: {
hash: ...,
keyId: Long("0")
}
},
operationTime: Timestamp({ t: 1673935033, i: 1 })
}
This command will step down the primary node immediately, allowing an eligible secondary to become the new primary.
Customizing the Step-Down Process
You can customize the step-down process using two parameters:
- stepDownSecs: The time (in seconds) during which the former primary cannot become primary again. The default is 60 seconds.
- secondaryCatchUpPeriodSecs: The time (in seconds) given for secondaries to catch up with the primary. The default is 10 seconds.
Here’s how to use these options:
> rs.stepDown(stepDownSecs, secondaryCatchUpPeriodSecs)
For example, to set the step-down period to 120 seconds and the secondary catch-up period to 20 seconds, you would run:
> rs.stepDown(120, 20)
What Happens if No Secondary is Caught Up?
If no secondary is sufficiently caught up to take over as the primary the current primary will not step down. This safe guard is in place to ensure that your cluster does not end up without a primary, which would disrupt write operations.
To avoid possible step-down issues you should regularly monitor the replication status of secondary nodes and ensure they are in sync. If needed, adjust the secondaryCatchUpPeriodSecs
parameter to allow more time for secondaries to catch up during step-down. In cases where manual intervention is required, you can force resynchronization or increase the priority of secondary nodes.
You will want to ensure your replica set’s priority and voting configuration is optimal for smooth failovers. Also, make sure to plan maintenance windows during low activity periods and inform stakeholders of potential impacts to manage expectations.
Lastly, regularly test your step-down and failover processes in a staging environment to identify and fix potential issues! By doing this and using proactive monitoring you can ensure that secondaries are prepared to become primary, maintaining high availability and resilience in your MongoDB cluster.
After Successfully Stepping Down
After successfully stepping down, a new election will be called and a new primary elected. This process ensures that your MongoDB cluster remains operational and maintains high availability, even during maintenance or upgrade tasks.
Conclusion
Stepping down a MongoDB primary node is a critical operation that requires careful planning and execution. By following the steps outlined above, you can ensure a smooth transition and maintain the high availability of your MongoDB cluster. Always remember to check that your secondaries are up to date before initiating the step-down process to avoid any disruptions in service.