BONN – May 15, 2017

Big Data Guide: How companies can begin using big data and analytics in a structured manner

The Big Data Guide produced by Cross-Business-Architecture Lab e.V. offers companies a free set of guidelines that they can use to successfully design big data and advanced analytics processes.

Finding out how customers react to marketing campaigns across different channels is no trivial undertaking. Still, it’s even more difficult to determine how a campaign needs to be changed in order to elicit positive reactions. Those who work in marketing departments want to know even more than that, however. For example, they also want to know which customers will react to campaigns, and how, and how long it will take for positive feedback to be reflected in higher sales. None of these questions can be answered with certainty without big data analytics.

The Cross-Business-Architecture Lab has developed a framework that can help businesses structure big data projects correctly. This framework takes into account both the subjects of analysis (i.e. the product(s) or customer(s) in question, or the company’s own organization) and the objectives the big data analyses are meant to achieve. The objective might be to personalize a product, better understand a customer, or improve the efficiency of one’s own organization. In the scenario described here, the initial objectives are to better serve the customer and to make the company’s marketing operations more efficient. Use of the big data solution offers benefits in terms of both objectives. The framework helps project managers determine which sources of data should be mined, which interfaces need to be written, and how the visualization tools should be designed that will later be used to present analysis results to project participants.

Whether or not a big data application can be successfully introduced depends heavily on whether decision makers are able to realistically assess the associated capabilities at their company. If the assessment of existing knowledge is too optimistic, the implementation of the project in question can be delayed considerably. If this happens, then it is also possible that expertise will need to be procured on short notice during the project, or that errors will need to be corrected later on down the line. CBA Lab uses the term big data maturity to refer to a company’s ability to perform big data analyses (see the chart below).

Here, maturity relates to five areas: strategy and roadmap, governance, reference architecture, infrastructure, and development, testing, and maintenance. These categories are assessed on the basis of five criteria: ad-hoc, repeatable, defined, managed, and optimized. Alexander Hildenbrand is the head of CBA Lab’s Industrial Analytics working group, which developed the Big Data Guide. He explains the capability model as follows: “If an organization believes that its big data strategy is fully developed, when in reality the organization is only just beginning to formulate a strategy, then that organization is going to run into big problems. In such a situation, no assessment can be made as to whether specific use cases fit the strategy, or whether the technology employed is suited to the roadmap, the architecture, and the company’s goals. This can lead to miscalculations and the duplication of work in the best case, and to the cancellation of a project in the worst.”

That’s why the suitability of the existing IT landscape needs to be assessed before a use case can be implemented. This starts with the infrastructure stack and extends to the analysis of processes, applications, interfaces, and the development infrastructure. The skills that exist in the IT department also need to be assessed, and responsibilities have to be defined. “The most complex part of all big data projects involves the creation of the data model and the preparations for data processing,” says Hildenbrand. “Without the model, the algorithms won’t work.”

This “suitability test” is based on CBA Lab’s Big Data Capability Model (see the chart below). This model has seven levels that involve everything from the data source to data access and business scenarios, and how these relate to five specific IT areas: system management, system operation, data governance, data security, and big data integration.

Data source refers to all the different possible data types – e.g. ranging from structured to unstructured and internal to external. When a use case campaign analysis is performed, all those involved in the project must report on which data sources are available to them and the extent to which they are capable of processing the data from these sources for the planned analyses. Social media is becoming more and more important as a data source in this regard. Most social media data is unstructured data that needs to be handled differently than structured data that comes from an internal relational database. This leads to the question as to how the data is to be stored: Are suitable storage and file systems available for use by the company? Which systems are still needed to ensure data can be stored in a manner appropriate for each data type? What impact might the storage/retention of data have on adjacent systems – i.e. systems for data governance, security, and system management?

The next question that needs to be addressed relates to how data is to be accessed. The processes and requirements associated with the batch method are obviously very different from those for streaming or continuous event processing. It’s clear that different processes and applications will use different mechanisms to access data. This means that system operators must decide which and how many data access options they want to permit. This decision as well will influence the IT landscape and the skill set the IT team will need to have.

Such questions and assessments continue on the model’s other levels – e.g. analysis techniques, preparation and processing, and business scenarios. “It’s a game that always has several dimensions,” Hildenbrand explains. “Our capability model provides the necessary structure and offers a framework for developing the right ideas for this process.” In this regard, Hildenbrand also stresses the fact that for CBA Lab, big data is primarily about data storage/retention and access, while analysis is more of an issue for analytics departments.

Reference architecture for big data

Once a big data team has determined which capabilities they need to use and which still need to be developed, it moves on to the next step – the creation of a big data reference architecture. The goal of the reference architecture is to improve communication among everyone involved in a big data project by creating a visual overview and establishing common terminology. The reference architecture defines:

  • Technology decision making approaches

  • Product decision making approaches

  • Security principles

  • Rules for data integration

  • Responsibilities, structure of committees, councils, etc.

  • Management and control structures; processes

The reference architecture should also take into account dependencies in relation to other reference architectures used. For example, an existing cloud architecture will likely have a major influence on the infrastructure to be used for the big data architecture that is being developed. A checklist is helpful when developing a reference architecture, whereby this checklist must include the following questions:

  • Should the big data function to be created be centralized or decentralized?

  • How will the data be managed? In a data lake, a data silo, etc.?

  • Which user groups should big data services be made available to?

  • What type of data interfaces to the data sources should be set up, and how many interfaces should there be?

  • How many users is the system being planned for?

  • Which licenses will be needed?

  • What is to be procured and what is to be developed in-house?

  • How high may the lifecycle costs be? How can these be calculated?


Five dimensions and 13 modules

CBA Lab’s Big Data Guide addresses five dimensions and contains 13 modules relating to big data and analytics. The dimensions are as follows: Strategy, Technology, IT Processes and Policies, Security, and Compliance. Each dimension is subdivided into modules that can be viewed as work packages.

“Strategy” consists of the modules...

  • Alignment: Joint development of the (big data) strategy with the business side of the organization in order to better understand customer expectations and behavior, and to use this information to benefit the company. An initial step in this direction is to create a cross-functional team consisting of data subject matter experts. These experts then need to find out which requirements various departments have in common with regard to data and which requirements are department-specific.

  • Use cases and data identification: Identification and prioritization of suitable use cases, and development of big data scenarios from a technical and business perspective.

  • Capabilities and maturity: This involves questions relating to the required and existing capabilities for big data / advanced analytics.

  • Strategy and roadmap: Development of a strategy and an implementation plan

“Technology” consists of the steps...

  • Development of a reference architecture, without which big data projects cannot be successful. Different use cases can necessitate the use of different architectures. This means that a rough reference architecture can be developed, after which specific architecture details can be added/changed for the various use cases.

  • Creation of the required platform and infrastructure: Technology (hardware, software, tools) for big data analysis operations.

  • Creation of a big data development environment and establishment of the required expertise.

The “IT Processes and Policies” dimension consists of...

  • Development of a data governance model.

  • Development of big data development processes and a system to manage them.

  • Development of a big data operating model.

  • Development of a big data support process.

The “Security and Compliance” dimension consists of the steps...

  • Assessment of additional risks and their effects.

  • Review of legal obligations and contractual provisions.

“Of course, you also have to determine which use case will offer the greatest benefits to the company and customers,” Hildenbrand explains. “That’s what you measure the costs and complexity against.” Various evaluation techniques can be used to make such a determination, although as Hildenbrand points out, you can’t always be completely sure that the business case you choose will pay off.

“Big data needs to be trusted,” says Hildenbrand. “We can predict with relative certainty whether a big data analysis will lead to new knowledge. However, this knowledge will not pay off for a company unless it is used effectively by sales and marketing teams, for example , or in other departments, and this use leads to higher revenues, lower costs, or a shorter time to market.” Hildenbrand also says the modules do not necessarily have to be employed sequentially: “For example, it can be helpful to gain initial experience before defining a strategy.”


You can download the Big Data Guide here:


Image caption