Imagine a world where all cars, buses and trains can operate on their own. No one, no bus driver, no train conductor, only smart technology. Imagine that you commute to the car in the morning, have a cup of coffee, and read the news, while your car can drive you to work. The seemingly unattainable futuristic fantasy is moving towards reality. The technology required for autonomous driving is booming, and we are gradually seeing vehicles with more and more autonomous elements.
Since the primary successful demonstration within the 1980s (Dickmanns & Mysliwetz (1992); Dickmanns & Graefe (1988); Thorpe et al. (1988)), the sector of autonomous vehicles has made great progress. Despite these advances, the belief of fully autonomous driving navigation in arbitrarily complex environments remains considered to need decades of development. There are two reasons: First, the autonomous driving system operating in a complex dynamic environment requires artificial intelligence to summarize unpredictable situations and make real-time inferences. Second, informational decision-making requires accurate perception. At present, most existing computer vision systems have a particular error rate, which is unacceptable for autonomous driving navigation.
Self-driving cars are a type of unmanned ground vehicle (Unmanned Ground Vehicle), which has the transportation capabilities of traditional cars. As an automated vehicle, self-driving cars can sense their environment and navigate without human operation. Fully self-driving cars have not yet been fully commercialized. A big few of them are prototypes and display systems. Some reliable technologies have been transferred to mass-produced models. However, self-driving cars have gradually become a reality, which has aroused a lot of ethical discussions.
Self-driving cars can sense their environment with technologies such as radar, optical radar, GPS, and computer vision. The advanced control system can convert the sensing data into proper navigation roads, obstacles and related signs. By definition, self-driving cars can update their map information through sensing input data, so that vehicles can keep track of their location, even if conditions change or the car drives into an unknown environment.
In the USA, the National Highway Traffic Safety Administration (NHTSA) has proposed a proper five-level arrangement for autonomous driving (2016 version):
Level 0: No automatic. The driver is in control of all the mechanical and physical functions of the vehicle at any time, and only functions that are not related to active driving such as alarm devices are included.
Level 2: The driver mainly controls the vehicle, but the system level is automated to significantly reduce the operational burden, such as Adaptive cruise control (ACC) combined with automatic car following and lane departure warning, and automatic emergency braking system (AEB).Through the combination of blind spot detection and part of the automobile collision avoidance system.
Level 3: The driver needs to be ready to control the vehicle at any time. During automatic driving assistance control, if you are following the car, although you can temporarily avoid operation, but when the car detects a situation that requires the driver, it will immediately return and let the driver take over. For subsequent control, the driver must take over the situation that the system cannot handle.
Level 4: The driver can allow the vehicle to fully drive itself when conditions permit. After the automatic driving is activated, there is generally no need to intervene in the control. The vehicle can follow the set road rules (such as highway, smooth traffic flow and standardized road signs, obvious Prompt line), perform tasks such as turning, changing lanes, and accelerating by yourself, except for severe weather or blurred roads, accidents, or the end of the autopilot section, etc., and the system provides the driver with a “sufficient transition time” “, the driver must be monitor the operation of the vehicle, but it can include an altered parking function under the sidelines. (Automatic car with steering wheel).
Level 5: The driver does not need to be in the car and will not control the car at any time. This type of auto can activate the driving device by itself, and may perform all important safety-related functions without having to drive on the designed road conditions, including things when nobody is within the vehicle, without being controlled by the driving will.. You can make your own decisions. (Automatic car without steering wheel).
This is based on SAE International release of another classification system based on these five different levels (from driver assistance to fully automated systems). The design concept is ‘who is doing and what’ Taxonomy.
Autopilot systems usually have a very classic, modular assembly line.
The first is the perception module (perception stack), sensing module information map, a three-dimensional sensor, a two-dimensional sensor to the “world model” (world model), the world’s model of the above information, summarized in a map, understand each The position of different objects at different moments relative to the road surface, road line, etc., predict the alternative paths for the next moment. It followed a planning module (planning model), decision-making. The content of the decision is also hierarchical. Coarse-grained decision-making needs to decide how to get from point A to point B, which accomplishes a job similar to GPS. Additionally, there are many fine-grained decision-making tasks, like deciding which lane to require, whether to temporarily occupy the other lane to finish overtaking, and the way much speed should be set. Finally, there is the control module , which controls all the controllers. There are high-level controllers, such as the electronic stability system ESP, and the most basic controllers, such as the controller that controls each wheel to accelerate and brake.
The computer vision technology currently under development is critical to the development of autonomous driving systems. To make an autonomous driving system make correct decisions, it mainly involves the following computer vision tasks:
The first is vehicle positioning : measure the movement of the vehicle and locate it on the map. This part of the work is done by the visual odometry system and the localization system. The difference between the two is that visual ranging estimates the relative movement of the vehicle relative to the previous time step, while positioning is a global estimation of the vehicle’s movement on the map. The positioning are often accurate to the centimeter level. The distance between the vehicle and some fixed objects in the map (such as telephone poles) is already known. Based on this information, the vehicle can already perform a fairly good path planning.
Then there is 3D visual reconstruction. The reconstruction range is usually 50-80 meters, depending on the driving speed. Most STOA autopilot systems use LiDAR for 3D reconstruction. However, there are also a small number of teams trying to recover three-dimensional information directly from the image. Since the data in the image is noisier in comparison, reconstruction based entirely on the image is a more challenging task.
In addition to rebuilding, you also need to have a full understanding of what is happening directly in front of the vehicle. Therefore, you need to perform object detection , and you also need to further classify the object based on what it is , and detection and classification will help predict its future trajectory. There are some ways to detect and classify. You can draw a bounding box for each object: this is the most common way, but autonomous driving requires motion planning in the three-dimensional physical world, so you must at least A three-dimensional bounding box is required.
More precisely, examples of dividing (instance segmentation) and semantic segmentation. When the object is a concave shape or a tunnel that needs to be traversed, the bounding box is obviously not enough. Instance segmentation divides all pixels of each instance belonging to some specific target categories into one category. Instance segmentation is usually performed on two-dimensional images, but there are also three-dimensional versions. Three-dimensional instance segmentation is basically equivalent to object reconstruction. While semantic segmentation assigns a semantic label to each pixel in the image, and different instances of the same category are not distinguished. In addition, panoptic segmentation can basically be regarded as a combination of instance segmentation and semantic segmentation. Panoramic segmentation also distinguishes categories that have no instances but a whole, such as sky and vegetation. The sky cannot use a bounding box frame column, and the vegetation needs to be avoided at ordinary times, but the system also needs to know that there is no problem when a car rushes onto the lawn in an emergency (compared to a tree or a pedestrian, it is a big problem) . Therefore, semantic information is necessary.
Next is motion estimation . According to the previous frame or several frames, estimate the position of each point in the field of view, or each object, in the next frame. Some objects, such as vehicles, are easier to predict their movement, so the motion model can predict with higher accuracy. Other objects, such as pedestrians, will change their trajectories very suddenly, making it more difficult to establish a motion model. Even so, the action prediction in a short time interval (2-3 seconds) still plays a vital role in the decision-making process in crowded scenes with more dynamic objects.
The above tasks are all independent, but actually, the systems that collect the above information don’t operate independently. Therefore situational reasoning (contextual reasoning) will also help give a more accurate prediction. For example, a group of pedestrians usually wait for a red light and cross the road at the same time. When one car tries to merge, another car will brake to give way. With these external information and prior knowledge as constraints, it will become easier to understand complex scenes.
Finally, a has not attracted more attention to the field is uncertain reasoning (reasoning under uncertainty). The data obtained by human senses or vehicle sensors must contain uncertainty. Therefore, how to accurately assess uncertainty while taking into account “minimizing risks” and “complete tasks” is an important topic. Ideally, all the above-mentioned detection, segmentation, reconstruction, and positioning tasks should be performed under uncertainty constraints, and the system should know what errors it may make before proceeding.
The self-driving car display system can be traced back to the 1920s and 1930s. The first truly automated car was developed by the Tsukuba Mechanical Engineering Laboratory in 1977. The vehicle uses analog computer-based technology for signal processing and tracks white street markings through two cameras installed on the vehicle. Prototype cars appeared in the 1980s. Carnegie Mellon University’s Navlab and ALV projects were funded by DARPA in 1984, and the first results were released in 1985. The EUREKA Prometheus project of Mercedes-Benz and the Bundeswehr University in Munich began in 1987. From the 1960s to the second DARPA challenge in 2005, research on autonomous vehicles in the United States was mainly funded by DARPA, the US Army and the US Navy.
In the 2008 DARPA Qualification Tournament and the 85km City Challenge Finals, the Boss developed by CMU University won the competition. It uses on-board sensors (global positioning system, lasers, radars and cameras) to track other vehicles and detect static obstacles. It also locates itself relative to the road model. The three-tier planning system combines mission, behavior, and motion planning to drive in an urban environment. The mission planning layer considers which street to use to achieve mission goals. The behavior layer determines when to change lanes and priorities at intersections and performs error recovery operations. The motion planning layer chooses actions to move towards the local goal while avoiding obstacles. This system has had an important impact on the design of the current automatic driving system.
In 2011, Sebastian Thrun and others tried to apply the system they developed in the 2007 DARPA Urban Challenge to a more realistic environment. They used three unsupervised algorithms to automatically calibrate a 64-beam rotating lidar, which was more accurate than cumbersome. Manual measurement. Then, they generate high-resolution environmental maps for online positioning with centimeter accuracy. Improved perception and recognition algorithms now enable cars to track and classify obstacles such as cyclists, pedestrians and vehicles, while also detecting traffic lights. The new planning system uses this input data to generate thousands of candidate trajectories per second, dynamically selecting the best path. The improved controller can continuously select the accelerator, brake and steering drive to maximize comfort. In addition, these algorithms have been tested in various situations such as sunlight or rain, day or night. The system they developed has successfully recorded hundreds of miles of autonomous operation under various realistic conditions.
In 2012, KITTI data was released, mainly to promote the application of visual recognition systems to robots, including autonomous vehicles. They used their autonomous driving platform to create a very comprehensive data set for tasks such as stereo, optical flow, visual ranging/SLAM and 3D object detection. The release of this data set has greatly promoted the research of autonomous driving.
In 2015, Jianxiong Xiao et al. proposed a direct perception method to estimate driving availability. They suggest mapping the input image to a small number of key perceptual indicators that are directly related to the availability of road/traffic conditions for driving. Their presentation provides a set of compact and complete scene descriptions, enabling simple controllers to drive autonomously. They used a 12-hour human driving record in a video game to train a deep convolutional neural network and proved that their model can drive a car in a very diverse virtual environment. They also train a car distance estimation model on the KITTI dataset. The results show that the direct perception method can be well extended to real driving images.
In 2016, Karol Zieba et al. trained a convolutional neural network (CNN) to directly map raw pixels from a single front camera to steering commands. It also operates in areas where visual guidance is not clear, such as parking lots and unpaved roads. The system automatically learns the internal representation of the necessary processing steps, such as using only the human steering angle as a training signal to detect useful road features. Compared with the explicit decomposition of the problem (such as lane marking detection, path planning and control), their end-to-end system optimizes all processing steps simultaneously. They believe that this will ultimately lead to better performance, because internal components can self-optimize to maximize overall system performance, rather than optimizing the intermediate standards of human choice.