This post is part of a series called “Deep Learning Dissected” contributed by IBM, that explores the challenges in adopting deep learning-based cognitive systems
I just returned from SC2017 conference in Denver, the premier supercomputing and high-performance computing industry event. It’s fitting because the first manifestation of data science and big data analytics really began in the high-performance computing (HPC) space. It also brings together some of the biggest “Star Wars” fans, and many showcases and demos had a Star Wars theme.
In the early days of HPC, problems were so big that HPC pioneers, the first Jedi Knights of data science, are why we got clustering in the first place; we couldn’t build systems big enough to address big data problems. But the evolution of computing and the rise of a Dark Side of unstructured data have necessitated data scientist Jedis to once again use their knowledge of deep learning, their Force, to restore balance to the Universe.
A Long Time Ago, In a Galaxy Not That Far Away
The application of computer vision is rapidly being adopted across a diverse set of industries. Assisted or autonomous driving is getting tons of media attention, but I’m meeting banks leveraging computer vision to stop masked people from damaging ATMs, large-scale manufacturers using vision to identify errors early in their production process, and vision assisting doctors in identifying anomalies in medical images.
There is no doubt that image classification and object detection have broad applicability. But one needs tons of data for neural networks to become accurate enough for production use. The COCO data set, which is often used for training object detection models, consists of more than 200,000 labeled images and 1.5 million object instances. But the unspoken truth is that data scientists spend 80 percent of their time preparing data in such massive datasets, characterizing this as the least enjoyable part of their job.
This is because you don’t just dump data in a repository and expect the model to learn from it. You need to label or annotate the data and transform it into the proper format before any effective training begins. While unsupervised training enables inference without labeled data, as my colleague Jean-Francois Puget recently pointed out, the methods are not predictive and you are better off spending a month labeling data rather than figuring out an unsupervised learning algorithm.
Tools exist that facilitate the annotation process, and there is, of course, Mechanical Turk. As IBM computer vision engineer Nick Bourdakos discusses, these tools still require a high amount of manual work, whether you are doing it yourself or outsourcing it. For all his brilliance and with the tools at his disposal, it still took Nick four hours of non-stop work to annotate 309 images for his Millennium Falcon and TIE Fighters object detection model.
A New Hope for Data Labeling
I propose a different path. The same way a Jedi trains to learn the ways of the Force, then leverages the Force to bring balance to the world, I’m proposing data scientists leverage their Force, the knowledge of algorithms, to minimize an undesirable task: cumbersome data annotation. Through active learning, a data scientist only needs to label a subset of their broader data, then train a neural network model to label the remaining and larger data set.
Having returned from SC17, our team was also inspired to create a Star Wars object detection model. Using our PowerAI Vision platform, it took 13 minutes to manually label 55 images: 6 minutes to train a neural network with those labeled images, and 7 minutes for the model to auto-label nearly twice the number of images, for a total of 146 labeled images. Had we applied the model against more film, the number of auto-labeled images would have been even higher.
There was an additional 5 minutes of auditing and fixing the labels, but the total amount of time spent through this active learning, 53 minutes, is far less than the multiple hours Nick Bourdakos had to put in to annotate his images, even with existing tools. Add another 22 minutes to retrain your model, and you have an object detection model that identifies not only the Millennium Falcon and TIE Fighters, but also Rey, Finn, and BB-8 in the same film clip. That’s three more “objects” than what Nick’s model identifies. The Force is strong with active learning and PowerAI Vision.
An Active Force Awakens
Active learning, with an intuitive user interface, solves a twofold dilemma: the cumbersome process of actually labeling massive amounts of data and the need to engage the subject matter experts in this process. The real-life examples I highlighted above leveraged an organization’s internal data with supervised learning to train their computer vision models.
While the data scientists are often tasked with collecting labeled data, it is doctors that recognize anomalies in medical images and subject matter experts that recognize faulty parts early in a production cycle, not the data scientists. While I don’t expect doctors or quality engineers to leverage Python scripts to label images, they can easily audit images that have been auto-labeled by a trained model on a smaller data set.
Want to learn more about how you can apply active learning? Watch my presentation at SC17 about adopting enterprise-ready deep learning with PowerAI Vision:
Feature image by Jack Moreh via FreeRange Stock.