A common misconception is that data science is only about choosing the right model for the problem at hand, when, in fact, considerable time and effort are put into more fundamental challenges. One major bottleneck in machine learning is getting reliable labeled data to train the model on.
How to learn on partially labeled data? How to deal with a data set that may potentially contain miss-labeled observations? How to create a feedback loop on your model predictions?
These are some of the labeling challenges that we are dealing with in PerimeterX. Many of them don't have a single, clear solution, and can be approached from different angles. Our talk will review some academic approaches that discuss these issues alongside case studies from PerimeterX.