Designing a Data-Centric Industry 4.0 System

What are the three types of data, and which one is most important?

May 24, 2019

7 min read

Industry 4.0 will change the way goods are manufactured. Concepts such as predictive maintenance, demand forecasting, and digital twins not only reduce downtime and quality excursions, but can also help optimize production efficiency. The rewards from the effort involved will prove fruitful. Price Waterhouse Cooper expects Industry 4.0 efforts to create n $400 billion of efficiency gains.

This evolution to Industry 4.0 involves a great deal of cutting-edge technologies. The landscape is novel and challenging. As a result, even the largest Fortune 500 companies can find the task of developing standard architectures for collecting and processing streams of manufacturing data daunting.

There is no one-size-fits-all strategy for developing an Industry 4.0 architecture. Critical importance must be placed on classifying data and designing systems to process the various types of data available. Industry 4.0 unlocks a great deal of data, and not all data is created equal.

In order to understand this, one must start with the realization that most industrial data is streaming and time-based. For example, a vacuum oven might generate a vacuum data point every second while a manufacturing execution system (MES) might generate a route completion data point every five minutes. While the content and form of the two data points differ, they’re both streams.

Streaming data can be defined by attributes such as bandwidth and usage. Once these attributes are grouped for similarities, the following logical three classes of industrial data emerge: edge, local, and remote.

Machinedesign Com Sites Machinedesign com Files Data Tablefig1

Industry 4.0 streaming data types.

Edge Data

Edge data is high-speed, real-time, and transient. The aforementioned vacuum oven might generate vacuum and temperature data at a very high rate. Before Industry 4.0 data generated would have been down-sampled and, therefore, the majority thrown away. Streams of data that didn’t contain engineering outcomes, such as machine logic, would have been completely lost.

The issue with throwing away this data, of course, is the loss a great deal of potentially valuable insights. Assume, for instance, the vacuum oven begins to struggle when the vacuum pump is having trouble. If the high bandwidth data and machine logic had been saved, a pattern or signature would have been identified to diagnose this condition and create a maintenance request in real time.

In addition, processing on combinations of sensor streams, or sensor fusion, can yield very valuable transactional information. Imagine the parts entering the oven have visible barcodes. By combining a barcode scanner, control logic, and the oven operating characteristics, operators have full visibility into the state of that part in the process. By combining this context with process characteristics and extracting trends, real-time predictive models can be built to help optimize both machine efficiency and product quality.

The challenge with edge data, of course, is bandwidth. Saving the tens or even hundreds of thousands of sensor and logic streams throughout a plant is costly and impractical. The only way to unlock the attributes and insights available is to place high-speed analytics at the source of data generation.

A useful edge analytics platform combines a real-time streaming analytics layer data conditioning tools, real-time visualizations and advanced machine learning capabilities. Streaming analytics engines generally can’t use traditional programming languages due to the event-driven nature of streaming sensor data. Each platform has its own unique solution to developing streaming analytics. These engines consist of many forms, but are generally complex event processing (CEP)- or rules-based.

Because managing hundreds or thousands of edges individually can be impractical, edge analytics must be coupled with centralized management for distributed orchestration and remote analytics authoring. Information extracted from streaming analytics, which exists as a higher-level classification (Local or Remote), can then be shipped to any other system or database.

Pitfalls to watch out for in edge platforms are lack of capability, difficult to use configurations, and proprietary analytics that can’t be moved out of a provider’s ecosystem. Trade-offs can also exist between various streaming analytics implementations. CEP engines are more flexible and can model more complex processes but tend to require more up-front work in configurations. Rules-based engines tend to be limited to simple alerts and conditional monitoring, but are much quicker and easier to get running.

Machinedesign Com Sites Machinedesign com Files Figure 2

A real-time machine-driven statistical process control plot driven by sensor fusion CEP data.

Local Data

Local data has been the traditional supervisory control and data acquisition (SCADA) focus for decades. Unlike edge data, local data is not transient. But in order to reduce bandwidth and storage needs, local data is usually down-sampled. It’s also generally stored in a location close to the edge. This is the data that’s useful to keep for dashboards, process debugging, and process context that isn’t real-time critical. In the vacuum oven, for instance, the down-sampled temperature and pressure data would be classified as local data and kept to root cause problems.

Many edge platforms contain their own local time series databases. Time series databases differ from relational databases in that they’re optimized to store and retrieve aperiodic time series data. A time stamp generally provides the relationship between different data series. In addition to retaining critical data, this database can be used to store and drive dashboards, or drive local analytics and alerts. Retention policies in many popular databases can also be set so that the data only persists for a specific period of time.

Remote (Cloud/Data Lake) Data

The key difference with remote data isn’t just frequency – it’s also the nature of the data. Getting data to and storing it in a cloud, or even an organization’s data lake, tends to be very expensive and uses valuable bandwidth. As a result, the data that exists in any remote system must be limited to data that needs to be accessed by large numbers of widely geographically-distributed users. As a rule of thumb, the bandwidth of this data should be less than one thousandth of the data available at the edge.

Remote data, therefore, generally tends to be part and process transactional data. Process status, route completions, and quality characteristics are examples of data useful in a central data lake or a cloud.

Much of the sensor fusion information extracted from raw data at the edge through a streaming analytics platform ends up belonging to the remote/cloud class of data. Going back to the vacuum oven example, consider the ability to capture the exact state of any part or process. A streaming analytics engine can repackage the information from these states, then automate MES and material transactions, as well as mine any associated quality characteristics. These transactions and quality characteristics would be useful to other plants as well as supply chain leadership and would be well suited to remote data usage classifications.

Conclusion

An effective Industry 4.0 architecture must focus on three types of data: edge, local, and remote. Designing for one type of data puts engineers at risk for losing critical process information and valuable insights.

Machinedesign Com Sites Machinedesign com Files Figure 3

Streaming analytic logic (each line is a stream).

Edge processing must exist as close to the source of data as possible. In a manufacturing plant with a robust network capable of handling terabytes of streaming data, having one edge stack per plant might be acceptable. In cases where ingestion speed is critical, such as with high-speed vibration data, or in cases where the edge is used to make a decision or a control action, edge analytics must exist as close as possible to the asset.

An edge solution, coupled with an architecture and strategy leveraging all three types of data, is necessary for any effective Industrial 4.0 architecture.

Lou Loizides has more than a decade of experience in manufacturing, primarily focusing on Industry 4.0. He’s currently innovating with the use of edge analytics as the manufacturing subject matter expert for Foghorn Systems.