Full session (30 minutes)
Engineering
Infrastructure
Data
So you need to move some data from here to there in some given time - easy! You write a few scripts and let *nix Cron handle the tasks. Then you realize that you need a more robust solution with some dependency between tasks and more visualization on them, so you build a few pipelines in Jenkins to handle that. As the data and business needs grow, handling scheduled tasks for different missions at scale becomes too complex:
- Task dependencies (fan-in / fan-out)
- Task priorities & SLA
- Task modularity
- Multiple data sources
- Parallelize tasks
- Visualization of tasks over time
- Task logging
- Task timeouts
- Retries on failure
- Notifications on task error
In this talk we will take a deep dive into Apache Airflow, and how it helps us solve that complexity.