Siemens uses a combination of frameworks including Apache Spark and TensorFlow to develop specific machine learning methods for each individual analytic task. Experimentation with these models is encouraged in a separate and secure working environment.
"What we have to create an analytical model is a sandbox, where data scientists can play with the data and identify the structures of the model," says Kress. "Once that is clear and we want to operationalise the model, then we have a classical three-tier structure of develop, test and operationalise."
This continuous integration and deployment process uses the same underlying data lake, so even when scientists are in the sandbox they can see all the data that exists, and understand how to combine data points to discover the insights that they need. That creative process results in an analytics model that can be continuously implemented in their railway monitoring.
Siemens absorbs slightly over 50,000 data points per second for its rail services, and has to store the data for extended periods of time. The complexity and variety of the Siemens analytical workloads make Teradata essential when the models are deployed.
"I need to be able to balance all those different workloads and keep the system stable," Kress says. "If I would do that on Hadoop and one of my guys comes in there and puts a big workload there, no customer for the next two days will get any response. That's not unacceptable."
Sign up for Computerworld eNewsletters.