Machine learning has increasingly become a commodity. Early adopters are losing their first-mover advantage, and the COVID-19 pandemic seems to have accelerated the trend. According to McKinsey’s the future of work after COVID-19 report, two-thirds of the senior executives surveyed are stepping up investment in AI. The competition is not anymore about who does AI, but who can do it better and faster.
If you look at software development over the decades, one of the cornerstones for success has been the speed of iterations. DevOps, Agile, Scrum methodologies all aim for improving the velocity speed. AI is no different, the faster the feedback loop, the quicker products improve, and the greater their competitive edge in the market. Concepts like MLOps, AIOps, DataOps are being widely adopted by companies to increase velocity in machine learning projects.
AI is leading a drastic change in the software industry, we tell the computer what to do but not how. Engineers are not writing code anymore, they feed data to their model which in turn will “write their code”. Data is the key. But before supervised learning models can inhale the data, a crucial step is needed: making dumb data smart. Taking self-driving vehicles as an example, for every image fed to an ML model, pedestrians, bikes, road signs, vehicles had to be labeled by a human. This labeling process represents over 80% of the time consumed in most AI and Machine Learning projects.
The key to faster iteration, therefore, is faster annotation. Data labeling is usually done via Crowdsourcing platform: workers all over the world manually label the data. The issue for a company is that it can only scale the labeling processing proportionally to the number of people it can hire, and the money it can invest.
Turns out that AI itself can help with that challenge. Computers started to be better than humans at image recognition about half a decade ago, and it starts to be used at a massive scale for automated data labeling. The market is heating up. Scale AI, a leading company is in the space, recently raised a $325M Series E Serving customers like Airbnb, Nvidia, Toyota, Samsung, or Etsy.
Scale AI wants to help companies by removing the labeling part out of their plate so that they can focus on the more strategic parts of their machine learning workflow. They take the customer’s data and give it back labeled. Labelbox, the leading competitor, offers a different approach. They are developing a platform that their customers use to label their data. Most of the data labeling is done automatically leverages machine learning models, only leaving the edge-cases to humans.
“With the streamlined design of Labelbox, we are able to cut costs on labeling by as much as 50% while maintaining the highest quality in our training data and get to training our models faster. With human-in-the-loop model-assisted labeling, we expect another huge reduction in time and costs to the labeling process,” noted Edward Kim, Data Analyst and AI at Sharper Shape, a Labelbox customer.
With continuous delivery of software updates becoming the norm and the amount of data collected increases exponentially, companies need to be label to label data as fast as they acquire it. Companies speeding data annotation will iterate products faster that will keep them ahead of the competition. Those who stick with hand labeling alone are going to struggle and eventually fail.
Feature Image par alan9187 de Pixabay.