Reversim Summit 2019

Full session (30 minutes)

Data Science

Engineering

Data scientists are adopting modern frameworks like Jupyter, TensorFlow and Dask, while data engineers mostly use legacy tools comprised of traditional databases, BI, data warehousing and Hadoop-powered data lakes. Can we really keep going with two different silos and disciplines? GPUs are widely used to accelerate machine learning, while new libraries like NVIDIA RAPIDS make it possible to process data frames in the CPU and accelerate packages like Pandas and Scikit Learn. However, processing data with multiple servers, or larger datasets is more challenging. In this session, we will demonstrate how to automate the deployment of big data tools side by side with a modern data science stack and GPUs. This solution means multiple jobs and interactive notebooks now run over the same cluster.

The End of Silos

Yaron Haviv