UC Berkeley has opened the largest self-driving dataset to the general public. The huge dataset contains 100,000 video sequences which can be used by engineers and others in the burgeoning industry to further develop self-driving technologies.
You can download the dataset called ‘BDD100K’ here. Each video in the dataset is roughly 40 seconds long at decent definition (720p and 30 frames per second).
Along with each video GPS information recorded from mobile phones gives an indication of the approximate driving trajectory. All the videos were collected in various locations across the United States.
Dataset covers a range of driving conditions
The publicly available videos provide a rich treasure trove to work from as they cover a multitude of different weather conditions from sunny, rainy and even hazy. The balance between day and night time conditions has also been praised.
In addition to building self-driving cars, the dataset offers the opportunity for detecting pedestrians on the roads/pavements. There are more than 85,000 instances of pedestrians in the video which gives a solid database for this exercise.
The open source dataset is organized and sponsored by Berkeley DeepDrive Industry Consortium, a group dedicated to investigating state-of-the-art technologies in computer vision and machine learning for automotive applications. Berkeley wasn't kidding when they said it was the largest ever publically available dataset.
800 times larger than Baidu's data
In March, Baidu released a massive dataset for the time, but Berkeley's effort today is 800 times larger than Baidu’s, it's 4,800 times bigger than Mapillary’s dataset and 8,000 times bigger than KITTI. The datasets are expected to be a boon for self-driving technology developers working in the perception system for autonomous vehicles.
The demand for these types of datasets has been consistently high and there is no doubt some interesting work will come from the generosity of Berkeley. To coincide with the release of the open source dataset Berkeley has set up three challenges.
Check out the challenges related to Road Object Detection, Drivable Area Segmentation and Domain Adaptation of Semantic Segmentation on their website. The challenges will allow emerging autonomous vehicle developments to be compared against the work of other key data scientists in the field.
Autonomous driving is one of the fastest growing technology areas. From small university-based teams to the big guns like Google and Uber, everyone is determined to be the first to crack the technology that will bring driverless cars to our city streets.
Self-driving cars have got a bad rap recently after an autonomous Uber car hit and killed a pedestrian while traveling in Tempe Arizona. Uber subsequently paused their self-driving development program, but that is not expected to last for long.
The release of this huge dataset means that there is more diversity of data available for researchers and scientists to use in their journey to overcome self-driving car challenges. Berkeley researchers have suggested that they will add to the dataset in the future and expand from only monocular videos to include panorama and stereo videos as well as other types of sensors like LiDAR and radar.