Reversim Summit 2019

Full session (30 minutes)

Engineering

ELT

Maintenance

In Outbrain we’ve noticed that big data processing is tricky, It’s a challenge to estimate delivery time, deliver high quality ETLs and maintain them over time. These challenges make life hard for all parties involved: engineers, data scientists, managers and business . Many of these difficulties may lie in our scarce usage of known software methodologies in the ETL domain, Methodologies such as good variable names, useful comments, testing and runtime alerts. We’ll compare these methodologies prevalence in Outbrains code base vs Hive jobs vs Spark jobs and see how we bring more of our software best practices to ETLs

Applying Software Engineering Methodologies to Data Processing

Gal Lerman