While scheduling a limited number of data workflows is a generally manageable task, scaling to hundreds of data workflows with dependencies and diverse job types, requires a substantial customized engineering, complexity, and overall expensive resources. Serverless-based architectures offer an alternative to traditional resource management.
Tomer Levi explains how the data engineering team at Fundbox uses AWS StepFunctions, Docker containers, and Spark to build a live serverless data orchestration platform. Tomer will further describe AWS StepFunctions state machines, their limitations, and how to overcome them by building a custom job scheduling and dependency features. Finally, the talk will illustrate how resource bottlenecks were overcome using Docker containers and AWS Fargate.