What Is Active Learning?

Finding the right self-driving training data doesn’t have to take a swarm of human labelers.

by DANNY SHAPIRO

Reading one book on a particular subject won’t make you an expert. Nor will reading multiple books containing similar material. Truly mastering a skill or area of knowledge requires lots of information coming from a diversity of sources.

The same is true for autonomous driving and other AI-powered technologies.

The deep neural networks responsible for self-driving functions require exhaustive training. Both in situations they’re likely to encounter during daily trips, as well as unusual ones they’ll hopefully never come across. The key to success is making sure they’re trained on the right data.

What’s the right data? Situations that are new or uncertain. No repeating the same scenarios over and over.

Active learning is a training data selection method for machine learning that automatically finds this diverse data. It builds better datasets in a fraction of the time it would take for humans to curate.

It works by employing a trained model to go through collected data, flagging frames it’s having trouble recognizing. These frames are then labeled by humans. Then they’re added to the training data. This increases the model’s accuracy for situations like perceiving objects in tough conditions.

Finding the Needle in the Data Haystack

The amount of data needed to train an autonomous vehicle is enormous. Experts at RAND estimate that vehicles need 11 billion miles of driving to perform just 20 percent better than a human. This translates to more than 500 years of nonstop driving in the real world with a fleet of 100 cars.

And not just any driving data will do. Effective training data must contain diverse and challenging conditions to ensure the car can drive safely.

If humans were to annotate this validation data to find these scenarios, the 100-car fleet driving just eight hours a day would require more than 1 million labelers to manage frames from all the cameras on the vehicle — a gargantuan effort. In addition to the labor cost, the compute and storage resources needed to train DNNs on this data would be infeasible.

The combination of data annotation and curation poses a major challenge to autonomous vehicle development. By applying AI to this process, it’s possible to cut down on the time and cost spent on training, while also increasing the accuracy of the networks.

Why Active Learning

There are three common methods to selecting autonomous driving DNN training data. Random sampling extracts frames from a pool of data at uniform intervals, capturing the most common scenarios but likely leaving out rare patterns.

Metadata-based sampling uses basic tags (for example, rain, night) to select data, making it easy to find commonly encountered difficult situations, but missing unique frames that aren’t easily classified, like a tractor trailer or man on stilts crossing the road.

Caption: Not all data is created equal. Example of a common highway scene (top left) vs. some unusual driving scenarios (top right: cyclist doing a wheelie at night, bottom left: truck towing trailer towing quad, bottom right: pedestrian on jumping stilts).

Finally, manual curation uses metadata tags combined with visual browsing by human annotators — a time-consuming task that can be error-prone and difficult to scale.

Active learning makes it possible to automate the selection process while choosing valuable data points. It starts by training a dedicated DNN on already-labeled data. The network then sorts through unlabeled data, selecting frames that it doesn’t recognize, thereby finding data that would be challenging to the autonomous vehicle algorithm.

That data is then reviewed and labeled by human annotators, and added to the training data pool.

Active learning has already shown it can improve the detection accuracy of self-driving DNNs over manual curation. In our own research, we’ve found that the increase in precision when training with active learning data can be 3x for pedestrian detection and 4.4x for bicycle detection relative to the increase for data selected manually.

Advanced training methods like active learning, as well as transfer learning and federated learning, are most effective when run on a robust, scalable AI infrastructure. This makes it possible to manage massive amounts of data in parallel, shortening the development cycle.

NVIDIA will be providing developers access to these training tools as well as our rich library of autonomous driving deep neural networks on the NVIDIA GPU Cloud container registry.

Finding the right self-driving training data doesn’t have to take a swarm of human labelers.

by DANNY SHAPIRO

The same is true for autonomous driving and other AI-powered technologies.

What’s the right data? Situations that are new or uncertain. No repeating the same scenarios over and over.

Finding the Needle in the Data Haystack

And not just any driving data will do. Effective training data must contain diverse and challenging conditions to ensure the car can drive safely.

Why Active Learning

Finally, manual curation uses metadata tags combined with visual browsing by human annotators — a time-consuming task that can be error-prone and difficult to scale.

That data is then reviewed and labeled by human annotators, and added to the training data pool.

NVIDIA will be providing developers access to these training tools as well as our rich library of autonomous driving deep neural networks on the NVIDIA GPU Cloud container registry.

etetewtgae

Top Rated

Mazda make global golf tournament ‘Mazda AJGA’ A Pathway to Pro Golf from U.S. to Thailand for the first time ever

Bridgestone Receives “The Best Supplier of Overall Performance in 2023 (Truck Business)” Award, As a Strong Partnership with Hino

FIRST BESPOKE LIMITED EDITION IN INDIA CURATED BY BENTLEY MULLINER

OUTRIGGER Koh Samui Beach Resort Introduces Exclusive Laser Tag Experience

BENTAYGA EWB - INFINITE CHOICE, CURATED BY MULLINER

Escape to Bliss: The Spa at The Standard, Hua Hin Launches a VIP Pass to Better Wellness

Be the first to test drive the Volvo EX30 at the 45th Bangkok International Motor Show

Nippon Express (South Asia & Oceania) to Exhibit at Future Mobility Asia

Kia Sales (Thailand) unveils the full line-up of The Kia EV5, Thailand’s first-ever all-electric versatile mid-size SUV, with special launch price starting from 1,249,000 baht, at the 45th Bangkok International Motor Show.

Continental Increases Earnings in 2023 and Targets Further Improvement This Year

Product Information: NEW MG CYBERSTER

MG gets closer to Young Consumers with #DareToBeYou, a Marketing Breakthrough on Self-Expression

OMODA & JAECOO Officially Launches in Thailand, Unveiling Four New Car Models to Provide Better Alternatives for Thai Drivers. Set to Hit the Market Mid-Year!

TSMC and Synopsys Bring Breakthrough NVIDIA Computational Lithography Platform to Production

Bridgestone Delivers Utmost High-Performance Driving Experience On-Road & Off-Road with Premium All-Terrain Tire, “BRIDGESTONE DUELER ALL-TERRAIN A/T002” in 10 Sizes

The Standard x Corona Sunsets: Songkran Edition is Here! Splash Into the Thai New Year at The Standard, Hua Hin

DLSS 3.5 and Full Ray Tracing Coming To Black Myth: Wukong, NARAKA: BLADEPOINT and Portal with RTX; Star Wars™ Outlaws Launching With DLSS 3 and Ray Tracing

Informa - Tarsus Group and the Rubber Authority of Thailand, are organizing "TyreXpo Asia 2024" with the goal of leading Thailand to become the hub of the rubber industry in ASEAN.

AI Decoded: Demystifying AI and the Hardware, Software and Tools That Power It

Shining Brighter Together: Google’s Gemma Optimized to Run on NVIDIA GPUs

Say What? Chat With RTX Brings Custom Chatbot to NVIDIA RTX AI PCs

Igniting the Future: TensorRT-LLM Release Accelerates AI Inference Performance, Adds Support for New Models Running on RTX-Powered Windows 11 PCs

Climate Solutions Prize: Continental Honors Winner of Tech Challenge on Pioneering Sustainable Materials

MG reaffirms MG4 ELECTRIC success with the launch of MG4 XPOWER with official price announcement at the Motor Show

Striking Performance: Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows

MG launches three “100th Anniversary Special Edition” models in celebration of its significant milestone

“BRIDGESTONE ECOPIA EP150 with the Ultimate Customizationof Cutting-Edge ENLITEN® Technology” Selected as Original Equipment to Power “New Xpander HEV and New Xpander Cross HEV” from Mitsubishi Motors

NVIDIA DLSS & GeForce RTX: List Of All Games, Engines And Applications Featuring GeForce RTX-Powered Technology And Features

Mazda make global golf tournament ‘Mazda AJGA’ A Pathway to Pro Golf from U.S. to Thailand for the first time ever