Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

How Netflix survived the Amazon EC2 reboot

Joab Jackson | Oct. 7, 2014
Sometimes the best path to success is to learn how to avoid failure.

The AWS reboot would be the first true test of Cassandra's reliability, however. The entire cloud database engineering team was on alert.

In the end, and thanks to Chaos Monkey testing, most all of the Cassandra nodes remained online. Of the 218 Cassandra nodes that were rebooted, only 22 did not return to a full operational state, and those were successfully restarted with minimal human intervention.

"Repeatedly and regularly exercising failure, even in the persistence layer, should be part of every company's resilience planning," the blog concluded. "If it wasn't for Cassandra's participation in Chaos Monkey, this story would have ended much differently."


Previous Page  1  2 

Sign up for Computerworld eNewsletters.