This vendor-written tech primer has been edited to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
Managing modern networked systems and applications is daunting because infrastructure is complex and things can go wrong in so many parts of the technology stack -- servers, storage, network devices, applications, hypervisors, APIs, DNS, etc. How can you address the challenge?
A good place to start: problems that can solve themselves, should.
This is called "self-healing" in the systems management space. As our systems are increasingly virtualized, the opportunity to have our systems work around and self-correct issues has grown greatly in recent years.
The simplest example of self-healing is automatically restarting a service or process that stops or otherwise becomes unresponsive. It is important to keep in mind that this is a workaround and that automated activity of all sorts needs to be logged and monitored, in turn. If an application leaks memory such that it needs to be automatically restarted several times a day, that restart is not the fix, it's a Band-Aid that is mitigating the impact while the developers responsible fix the application.
But since many applications span many systems, to make your systems self-healing you need to move past simple automation in response to alerts and onto full orchestration capabilities. Automation allows you to perform single tasks, but orchestration allows you to initiate an entire workflow to perform an entire process.
That orchestrated workflow should consist of composable automated tasks, but it often needs a perspective larger than that of a single server or service. One server may know it needs to restart a downed process on itself, but higher level orchestration would decide to throw away that old server, start up a new VM, tell it where to find its database for its application to point to, check to verify its services start correctly, update DNS and put it into the load balancer.
Using solid orchestration you can even shift services automatically to different data centers when one is down.
In the DevOps space, this kind of orchestration has been seen as especially valuable, since it is also necessary to perform complex software deployments frequently, and a number of tools both free and commercial have been developed to address this need, such as open source options like Ansible, Salt, Rundeck, and extensions to Puppet and Chef.
Automating repetitive configuration and process tasks not only yields higher quality results (e.g., fewer mistakes), but also lets you free up manpower to work on higher-ROI issues instead of using valuable technical resources on what is effectively menial labor.
The most frequent reason cited for not implementing orchestration or even basic automation is "lack of time and resources to do so," but even a modest amount of automation can save enough time to prove its worth quickly.
Sign up for Computerworld eNewsletters.