Full session (30 minutes)

In Outbrain we’ve noticed that big data processing is tricky, It’s a challenge to estimate delivery time, deliver high quality ETLs and maintain them over time. These challenges make life hard for all parties involved: engineers, data scientists, managers and business . Many of these difficulties may lie in our scarce usage of known software methodologies in the ETL domain, Methodologies such as good variable names, useful comments, testing and runtime alerts. We’ll compare these methodologies prevalence in Outbrains code base vs Hive jobs vs Spark jobs and see how we bring more of our software best practices to ETLs

Gal Lerman