Google confirmed that a few hours ago, a “significant subset” of users faced certain issues such as high latency, error messages or unexpected behavior while using the service. It was quickly rectified and comes just a day after a major outage of Google services which saw Gmail, YouTube, Google Drive, Google Docs and the complete Google Authentication system that powers Google’s Single Sign-On. While users were able to access their mailboxes on Gmail today, some encountered delays while sending messages. Google doesn’t say if today’s issue was limited to certain parts of the world or if it was in any way related to the earlier outage. Incidentally, there are reports that Google’s cloud gaming platform Stadia also suffered an outage a few hours ago, but it isn’t clear if the Gmail and Stadia issues are related.
“The problem with Gmail has been resolved. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better," said Google in a service update message in the early hours of today, IST. There are no details on what the problem is, and what caused the delay in sending messages from Gmail. The previous major outage, Google had confirmed, was down to what it calls and internal storage quota issue. “Today, at 3.47AM PT Google experienced an authentication system outage for approximately 45 minutes due to an internal storage quota issue. Services requiring users to log in experienced high error rates during this period. The authentication system issue was resolved at 4:32AM PT. All services are now restored,” said a Google spokesperson in an official statement at the time. I am no network administrator but to me that sounds like Google ran out of storage space somewhere in the chain. Perhaps a filled up hard drive in one of the servers, that led to a chain reaction of services going on the blink.
Usually, server management involves having measures in place to keep storage free automatically and the human intervention is called for with a red flag in case one of the storage modules in the server is acting up for some reason. It seems neither happened in this case for Google. There should and would have been multiple levels of monitoring in place, and clearly the alarm wasn’t sounded either by the systems or by humans, when it should have been. This doesn’t reflect well for a cloud computing platform for enterprises and individuals, which is competing with Amazon Web Services and Microsoft Azure.
The problem with Google Authentication going offline meant that the damage wasn’t just limited to Google’s own apps and services. Remember that simple Sign In With Google option that you would’ve used so many times on apps and services? The Single Sign-On is meant for third party services and apps and that meant anyone using Google credentials to log in to any third-party app or website, not related to Google in any way otherwise, would have hit a dead end. It may be time for Google to have a fresh assessment of tools in place, and ensure that the failsafe measures, whatever they may be, kick in when needed. At a time when millions are still working remotely because of the Coronavirus pandemic around the world, the reliance on digital tools, apps and services is at an all time high. These outages do not help with user confidence. Particularly for enterprises.