The whole point of running a data lake is to let users consume the data and make powerful business insights. But sometimes, as expected when you have many users, things will go wrong. And when they will - you should be ready.
In my talk, I'll present an incident that happened to us a few months ago and how monitoring the usage of our users (with the ELK stack) helped identifying the source of the problem.
I'll also explain why it took us so long to find what caused the problem, and the practical conclusions we made.