A Guide to Stream Processing

Reading Time: 26 minutes

About This White Paper

The goal of streaming systems is to process big data volumes and provide useful insights into the data prior to saving it to long-term storage. The traditional approach to processing data at scale is batching; the premise of which is that all the data is available in the system of record before the processing starts. In the case of failures the whole job can be simply restarted.
While quite simple and robust, the batching approach clearly introduces a large latency between gathering the data and being ready to act upon it. The goal of stream processing is to overcome this latency. It processes the live, raw data immediately as it arrives and meets the challenges of incremental processing, scalability and fault tolerance.

This white paper introduces you to the domain of stream processing covering these topics:

Use cases that benefit from stream processing
Building blocks of a stream processing solution
Key concepts used when building a streaming pipeline: definition of the dataflow, keyed aggregation, windowing
Runtime aspects and tradeoffs between performance and correctness
Overview of distributed stream processing engines
Hands-on examples based on Hazelcast Jet®

Who Should Read It?

This paper is intended for software architects and developers who are planning or building system utilizing stream processing, fast batch processing, data processing microservices or distributed java.util.stream.

What’s In This White Paper?

Fast Processing of Infinite and Big Data
What is Stream Processing
When to Use Stream Processing
The Building Blocks of Stream Processing
Transformations
Windowing
Running Jobs
Fault Tolerance
Sources and Sinks
Overview of Stream Processing Platforms