Sama, an impact employer, released a new productized training for AI data annotation. The new training solution empowers employees with increased AI skills, improving tag and shape accuracy by 15% each and reducing overall project ramp time by 50%.
What is AI Data Annotation?
Artificial Intelligence technology is the backbone of modern industrial innovations and development. But, have you ever wondered what is the “backbone” of the AI applications that we are surrounded from all sides? It’s AI data annotation.
AI data annotation is the process of labeling or tagging data to train artificial intelligence (AI) models. In the realm of artificial intelligence, data annotation serves as the foundation upon which intelligent systems are built. It’s the process of meticulously labeling or tagging data, providing AI models with the necessary information to learn and understand the world. Simply put, it’s like teaching a machine to understand and interpret information in the same way a human would.
There are three key aspects of working with AI data annotation:
Data Collection: Gathering relevant data, such as images, videos, text, or audio files.
Annotation: Labeling or tagging specific elements within the data. For example, in image annotation, you might draw bounding boxes around objects or label them with specific categories.
Quality Assurance: Reviewing and verifying the accuracy of the annotations to ensure the training data is reliable.
AI training models use high-quality annotated data for training AI models. These models perform automated tasks with augmented-level intelligence. Common examples include NLP applications such as text-to-speech, voice, and image recognition. Likewise, common AI data annotation models are applied to image annotation, video annotation, speech annotation, and more.
Data annotation is a complex and time-consuming process that often requires human expertise. However, it is a critical step in developing the advanced AI applications.
Here’s where Sama’s role becomes important.
Sama announced the company-wide rollout of its new flexible, scalable productized training platform. It is regarded as a leader in purpose-built, responsible enterprise AI with agile data labeling for model development and supervised fine-tuning.
More about the training solution by Sama
The proprietary training solution builds on Sama’s commitment to excellence and an industry-leading 99% client acceptance rate by reducing project ramp times by up to 50% while increasing individual annotators’ tag accuracy by 16% and shape accuracy by 15% on average. For Sama’s enterprise clients, this results in higher-quality models going into production faster, saving both time and capital. For Sama employees, this new platform improves the training experience, offers greater understanding of data annotation and AI development principles, and builds their skills for successful long-term careers in the digital economy.
At the time of this announcement, Duncan Curtis, SVP of AI product and technology at Sama said- “Basic data annotation training is just that: basic. It encourages rote memorization instead of truly learning the ins and outs of correct annotation. At Sama, we have always believed in the power of project-specific training to increase quality and reduce rework. This new iteration of our training platform takes that a step further — it’s more scalable and well-suited for even the most complex tasks, including automotive and Generative AI data annotation, even when client parameters may change mid-project.”
Duncan added, “Our annotators can now actively learn at their preferred pace and receive useful feedback for fuller comprehension. These sessions will help them not only excel on current AI projects but build and master new skills, which will prepare them for future AI innovations and development needs.”
Fitting AI Data Annotation into A Responsible AI Framework
Training for complex tasks, such as annotating LiDAR data or complex sensor fusion data, previously required lengthy courses and, consequently, a significant amount of time to provide detailed feedback for a trainee to master the skills. Sama’s productized training ties into its responsible AI framework by emphasizing data annotation work’s role as a stepping stone.
By building a talent pipeline that is actively learning and mastering concepts, Sama is investing in its own workforce. That same talent pipeline, primarily consisting of women and underrepresented communities, allows AI developers to more easily access a broader range of perspectives about how AI should be developed and what needs to be corrected, promoting more responsible and ethical models overall.
“We envision Kenya as a growing hub for new AI innovations and talent, able to reap the economic benefits of AI while ensuring that models are developed with diverse perspectives. For this vision to become reality, digital skilling is a must. This latest development is a clear signal of Sama’s deep commitment and investment in local talent,” said Maxwell Okello, CEO of America Chamber of Commerce (AmCham) Kenya.
Recommended: Bocada, Carahsoft Partner for Public Sector Data Protection
Sama’s AutoQA™ platform: The Key to Success
The new training platform begins with annotation tasks that have gold answers, which the customer or trainer has verified. During training, Sama’s AutoQA™ platform autonomously compares an annotator’s answers to these ground truth responses and can offer specific instruction on where to improve. If an annotator feels stuck, they also have access to hints, such as briefly showing the correct shapes. They can track their progress and others’ to see their advances in real time. Early results have yielded a 15% increase in shape accuracy and a 16% increase in tag accuracy compared to previous Sama training modules, reducing the odds of delays caused by rework. Project ramp time (the time from when a contract is signed to when work on a project begins) has been reduced by up to 50% with these new features.
In addition, the platform has built-in flexibility to adjust to changing client needs. When instructions or criteria change during the middle of a project, Sama can update training instructions and easily deploy re-training modules to the entire workforce. This allows for a smooth transition to follow the new criteria and can reduce rework.
This new solution joins a suite of products designed to scale to all project sizes, including some of the largest open-source models in the world. Sama employs a human-in-the-loop (HITL) approach to constantly and consistently provide models with feedback from expert annotators, validating a model’s behavior and ensuring it is performing to standards. This feedback occurs during the entire model development process, including data creation, supervised fine-tuning, LLM optimization and ongoing model evaluation, ensuring clients can develop models in a more responsible way.
Top CyberTech News: Abstract Security Expands Multi-Cloud Platform with Google Cloud
Sama’s work is backed by SamaAssure™, the industry’s highest quality guarantee, which routinely delivers a 98% first batch acceptance rate.
More about Sama
Sama is a global leader in data annotation solutions for computer vision, generative AI and large language models. Our solutions minimize the risk of model failure and lower the total cost of ownership through an enterprise ready ML-powered platform and SamaIQ™, actionable data insights uncovered by proprietary algorithms and a highly skilled on-staff team of over 5,000 data experts. 40% of FAANG companies and other major Fortune 50 enterprises, including GM, Ford and Microsoft, trust Sama to help deliver industry-leading ML models.
Driven by a mission to expand opportunities for underserved individuals through the digital economy, Sama is a certified B-Corp and has helped more than 68,000 people lift themselves out of poverty. An MIT-led Randomized Controlled Trial has validated its training and employment program.
To share your insights, please write to us at news@intentamplify.com