Microsoft has fired a shot heard around the globe, so to speak, in data management with the debut of Azure Cosmos DB at the recent Microsoft Build 2017 developer conference in Seattle. The cloud database is positioned for elasticity and globally available data, supported on the Azure cloud. The project was founded in 2010 by Microsoft's Dharma Shukla, who holds the title of distinguished engineer at the company.
InfoWorld Editor at Large Paul Krill spoke with Shukla during the conference to get his perspectives on the technology.
InfoWorld: Why is this project, which began more than six years ago, going to the public now?
Shukla: It's a very complex system. The goal we had was to build a globally distributed database system, which makes the data automatically available wherever the users are so that you can be in any part of the world and you get really low latency, really fast performance when you access your data. You can scale throughput and storage elastically, pay only for what you need across the world. Cloud is perfect for it. When you write your app, your app is automatically available everywhere, wherever the datacenters are.
But when it comes to data, data is locked. Data doesn't get distributed automatically, so that was the mission for Cosmos DB, and the reason it took so long is it's like climbing Mount Everest. All along, we had internal applications using it. We also released a service called DocumentDB in 2015, which was a milestone along the journey. Since then, we've been adding more and more stuff, so now it's culminated into Cosmos DB, the service that we released.
InfoWorld: With DocumentDB, those customers were moved over to Cosmos?
Shukla: Yes, they are automatically Cosmos DB customers.
How is Cosmos DB an advance over DocumentDB?
There is significant enhancement. We've been steadily adding features, and as a cloud database we hardened the features. We tested it with internal customers, then selected external customers before opening it to everyone. We've been working on many capabilities, which are the database engine capabilities, as well as distributed systems capabilities. DocumentDB was a snapshot in that journey. A lot has been added.
To give you some examples, [there is] support for multiple data models in the database engine to support key value, graph, and other data types, support for elastically scaling throughput at different granularities. In DocumentDB, it was only at a second granularity. These are hard problems.
For example, if you're scaling throughput, say I want a million transactions per second across four different geographical regions. The next line of code you write is you say instead of a million transactions per second I need 20 million transactions per second. As a customer, you provision these. You scale your system, a single table or database, with this much throughput.
Sign up for Computerworld eNewsletters.