Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Google vs. Microsoft: Lessons on handling a Cloud #fail

Infoworld Tech Watch | March 3, 2011
This week's Gmail outage reminded us all that the cloud ain't perfect. But you may draw some inspiration from a look at how Google reacted.

Normal 0 MicrosoftInternetExplorer4

InfoWorld's Leon Erlanger tells the tale of Google's response (at infoworld.com/t/cloud-computing/gmail-lesson-dont-use-stats-minimize-angry-users-107 ): The company's App Status Dashboard reported the outage shortly after it occurred on Sunday. At 3 p.m. Eastern time on Sunday, Google told us, "We're investigating reports of an issue with Google Mail." Google continued to post updates to the Dashboard report every two or three hours through the night and into the next day. At Crash + 5 hours the Dashboard reported, "This issue affects less than 0.08% of the Google Mail userbase... Affected users may be temporarily unable to sign in while we repair their accounts." At Crash + 22 hours, Google revised its damage estimate to 0.02 percent of Google Mail users -- perhaps 40,000 accounts. At Crash + 32 hours, a Google VP posted a full explanation (at gmailblog.blogspot.com/2011/02/gmail-back-soon-for-everyone.html) of the problem and details about what was being done to correct it.

Now consider how Microsoft handled a similar incident: On Dec. 30 of last year, Microsoft suffered a massive SQL Server failure [5] that affected 17,355 Hotmail accounts. As I reported at the time, Microsoft's response left much to be desired.

At Crash + 8 hours, I saw sporadic reports of Hotmail problems. It wasn't at all clear whether the problems were random or systemic. Hotmail, like Gmail and other email services, shuts users out for brief periods to perform system maintenance, so temporary minor outages are hard to distinguish from major ones. At Crash + 12 hours, we still had nothing official from Microsoft. The Windows Live Solution Center [6] fielded hundreds of complaints. In the absence of an official statement, the beleaguered staff at the Solution Center resorted to answering each post with basically the same cut-and-paste response to an avalanche of criticism and angst.

I didn't see any official acknowledgement of the problem -- much less a status report on the resolution -- until Jan. 3, when Chris Jones on the Inside Windows Live blog posted an explanation (at windowsteamblog.com/windows_live/b/windowslive/archive/2011/01/03/hotmail-email-access-issue-now-resolved.aspx): "Beginning on December 30th we had an issue with Windows Live Hotmail that impacted 17,355 accounts. Customers impacted temporarily lost the contents of their mailbox through the course of mailbox load balancing between servers. We identified the root cause and restored mail to the impacted accounts as of yesterday evening, January 2nd."

That's how it played out. On the Microsoft side, at Crash + 4 days we received confirmation that the problem had been resolved at Crash + 3 days. At the time, many Hotmail users reported that they still didn't have their mail back.

 

1  2  Next Page