"They often own the data sources discovered in steps 2 or 3," Keep says. "They know what tables the data lives in, how it's formatted, how it's extracted. They know if there's a clean way of getting data out without interrupting core data systems."
Step 5: Develop the single view model
This critical step will govern everything that follows, but Keep notes it's less daunting if you've successfully completed your initial upfront discovery. Identify the type of data, where it lives and how you need to query it.
"Here we might look at exactly what data is mandatory and what's optional," Keep says. "For your application, email address, date of birth and credit card number might be mandatory. The social media account might be optional. Then figure out what data needs to be indexed. That's going to speed up the queries that the consuming applications are going to want to run. This is where a database with a flexible data model really, really helps. We don't need to know what all the optional fields are, we can add them as we go. We just need the mandatory data."
Step 6: Data loading and standardization
Once you have your single-view data model in place, you need to define how you want data represented within that single view. You need to design common field names for the attributes you're capturing. Your various data sources might variously capture 'DoB,' 'Date of Birth,' and 'Birthdate.' You need to standardize those field names.
"In stage six, what we actually do is make sure we're transforming all the data from our source systems so it's matching this standardization," Keep says. "It starts with the initial data load."
"With the initial load, you've got an empty single-view database and you pull in all the data from your source systems so it meets the requirements you've defined," he adds. "Then you'll capture updates to your single view. You might do that in batch, but what we're seeing more commonly now is they want a much fresher view. For that, [Apache] Kafka is very popular now. It provides a near real-time version of the data. That's what we call the delta load."
Step 7: Match, merge and reconcile
Even though you standardized your data in the previous step, you'll need to use algorithms to identify where records don't line up based on source systems. For instance, a business travel application may draw on records that refer to 'Mat Keep,' 'Mr. Keep' and 'Matthew Keep.' Your single-view application needs to match, merge and reconcile those records.
Sign up for Computerworld eNewsletters.