This vendor-written tech primer has been edited to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.
Over the past half decade, the big data flame has spread like wildfire throughout the enterprise, and the IT department has not been immune. The promise of data-driven initiatives capable of transforming IT from a support function to a profit center has sparked enormous interest.
After all, datacenter scale, complexity, and dynamism has rapidly outstripped the ability of siloed, infrastructure-focused IT operations management to keep pace. IT big-data analytics has emerged as the new IT operations-management approach of choice, promising to make IT smarter and leaner. Nearly all next-generation operational intelligence products incorporate data analytics to some degree. However, as many enterprises are learning the hard way, big data doesn’t always result in success.
While the Four Vs of big data – volume, velocity, variety, and veracity – are intended to serve as pillars upon which to construct big data efforts, there’s a fifth V that needs to be included, and that’s value. Every big data initiative should begin with the question “What value do I want to derive from this effort?” How a group or organization answers that question should deeply inform the means by which that end is achieved. To date, however, value has very much been the silent V.
So how should organizations go about deriving the greatest value from their data? Three key areas deserve close attention:
* Understand data gravity. The term “data gravity” was coined by Dave McCrory, the CTO of Basho Technologies, and refers to the pull that data exerts on related services and applications. According to McCrory, data exerts this gravitational pull in two key ways. First, without data, applications and services are virtually useless. For this reason, application and service providers naturally gravitate toward data, and the bigger the data set, the more applications and services it will attract.
Second, the bigger the data set, the harder it is to move. Generally it’s more efficient and cost-effective to perform processing near where the data resides. We’ve seen large companies use cloud-based services for IT operations data. If the data itself originates in the same cloud, this approach is fine. Even data generated on-premises can be stored and analyzed in the cloud if it’s small enough. For large amounts of data generated outside the cloud, however, problems arise. For example, one organization had to purchase dedicated bandwidth just to upload the telemetry. Even then, there was so much data at times, the local forwarders would fall behind, and it would be hours before the data was available. In cases such as this one, it’s important to understand data gravity and process the data near where it’s generated.
Sign up for Computerworld eNewsletters.