This past week we were reminded of the importance of having an effective backup process after Gitlab.com suffered a major data loss. A directory containing around 300 GB (a huge amount!) of critical production data was almost completely deleted through human error. And while there were 5 different methods in place that were supposedly backing up and replicating that data, none of them were working effectively. That’s failure on an epic scale when dealing with your critical business data.
For this parable on the importance of data backups though, let’s take a step back. While most of us are probably familiar with what we mean when we say “backup”, we need to all have the same working definition. At the highest level, data backup is the process of copying or archiving documents for the purpose of restoring that data should a loss happen.
There are multiple ways data loss can occur: through human error, like what happened with Gitlab, hardware failure, theft, natural disaster, or malware. Regardless of how it happens, data loss will likely lead to unhappy customers, employee downtime, and the loss of revenue - the IT disaster trifecta.
To expand on the definition above, most good IT people will tell you that an effective backup process consists of more than just making a copy of your data and calling it a day. Good backups happen constantly and automatically. They are “offline”, which is to say that the backups are disconnected from the production system in a way that the same “oops” that takes out the main copy cannot cascade and also take out the backups. They are “off-site” or “geographically redundant” to protect against things like floods and fires. And they’re also monitored and tested all the time—taking a backup every hour isn’t helpful if a subtle error renders them useless, as Gitlab learned to their sorrow. It’s just not enough to backup your data to a flash drive once a week and hope for the best.
What happens if your computer crashes and your last backup was 5 days ago? All the documents and data since the last backup are potentially gone forever. When data loss occurs it can create a ripple effect of technical and business failures caused by the inability to access the lost data.The total loss of information as mentioned above is the obvious first-order effect on your business. The second is that even if you can get that data back, it could take a considerable amount of time to retrieve it. And any apps, processes, or employees that depend on that data to keep working can’t be productive during the downtime. And third, those apps and software I just mentioned could end up failing or becoming further corrupted if that data they depend on is gone forever or taking a long time to restore. All of these things lead to downtime, lost customers, and lost revenue.
Having a capable team whose job is to make sure that your data is being backed up correctly and can quickly spot and address issues is invaluable to your company. This team will make sure that if something were to happen you’d be back up and running, with all of your restored data as soon as possible, to help you avoid that disaster trifecta.
Fortunately for Gitlab.com they were able to restore all but 6 hours of data, but thousands projects, user comments, user accounts, and more were lost forever. For a company in the business of hosting customer data, that’s about as bad as it gets. But it’s a good example for the rest of us of why having an effective backup process is vital to the success of your company.