Reversim Summit 2019

Full session (30 minutes)

Engineering

data pipelines

big data

Data processing at scale has wide challenges: each framework has a different approach on how to write the code, and they differ between batch and streaming. Operations-wise, building and maintaining a cluster for distributed data processing has a lot of overhead and cost, even if you’re using a semi-managed cloud provider solution.

Google Cloud Dataflow developed a unified batch and streaming data processing SDK that became an open source as Apache Beam. Beam supports multiple execution engines like Spark, Flink and Google Cloud Dataflow.

In this talk, we’ll see how a unified model helps solve many use cases of data processing at scale, and how the Google Cloud Dataflow managed service reduces the operations overhead to zero as your data and business needs grow.

Distributed Data Processing

Uri Shamay (cmpxchg16)