Object Detection System Based on Data Flow from LG SVL Simulator and Deep Neural Networks
|
Аннотация
INTRODUCTION 7
1 Overview 15
1.1 History of Object Detection 15
1.2 Development of Object Detection Technology 17
1.2.1 Development of Traditional Object Detection 17
1.2.2 Development of object detection technology based on deep learning 18
1.3 Significance of Object Detection for Autonomous Driving 19
1.4 Introduction to SVL simulator 20
1.5 The main work of this paper 23
1.6 The structure of this paper 23
2 Best method choice 25
2.1 Selection of self-driving car simulators 25
2.2 Selection of Object Detection Algorithms 29
2.2.1 R-CNN series 30
2.2.2 YOLO series 35
2.2.3 Comparison of R-CNN series algorithms and YOLO series algorithms 47
2.2.4 Comparison of YOLOv5 algorithm and YOLOv4 algorithm 47
2.3 Selection of label tools 49
3 Practical research work and data graphs 52
3.1 Create an environment based on LG SVL simulator 52
3.1.1 Creation of the scene 52
3.2 Data set arrangement 55
3.2.1 Clear incomplete data 57
3.2.2 Data deduplication 58
3.2.3 Data set division 60
3.2.4 Data Labeling 60
3.3 Neural Network Training 61
3.3.1 Configuration file 62
3.3.2 Training 62
4 Results and analysis of data 64
4.1 Confusion Matrix 64
4.2 labels 65
4.3 F1_curve 65
4.4 P curve 67
4.5 P-R curve 67
4.6 Detection effect 69
CONCLUSIONS 70
REFERENCES 71
INTRODUCTION 7
1 Overview 15
1.1 History of Object Detection 15
1.2 Development of Object Detection Technology 17
1.2.1 Development of Traditional Object Detection 17
1.2.2 Development of object detection technology based on deep learning 18
1.3 Significance of Object Detection for Autonomous Driving 19
1.4 Introduction to SVL simulator 20
1.5 The main work of this paper 23
1.6 The structure of this paper 23
2 Best method choice 25
2.1 Selection of self-driving car simulators 25
2.2 Selection of Object Detection Algorithms 29
2.2.1 R-CNN series 30
2.2.2 YOLO series 35
2.2.3 Comparison of R-CNN series algorithms and YOLO series algorithms 47
2.2.4 Comparison of YOLOv5 algorithm and YOLOv4 algorithm 47
2.3 Selection of label tools 49
3 Practical research work and data graphs 52
3.1 Create an environment based on LG SVL simulator 52
3.1.1 Creation of the scene 52
3.2 Data set arrangement 55
3.2.1 Clear incomplete data 57
3.2.2 Data deduplication 58
3.2.3 Data set division 60
3.2.4 Data Labeling 60
3.3 Neural Network Training 61
3.3.1 Configuration file 62
3.3.2 Training 62
4 Results and analysis of data 64
4.1 Confusion Matrix 64
4.2 labels 65
4.3 F1_curve 65
4.4 P curve 67
4.5 P-R curve 67
4.6 Detection effect 69
CONCLUSIONS 70
REFERENCES 71
When humans first invented computers, they had already started thinking about making computers intelligent. Artificial intelligence has become a hot field with many active research topics and practical applications that benefit all aspects of life. This field is currently growing exponentially and will continue to develop healthily in the future. It is hoped that some subjective, non-normative things, such as recognizing images, can be handled automatically with the help of artificial intelligence.
In the early days of artificial intelligence, it was easy for computers to deal with difficult or even impossible problems for humans to solve. These problems could be described by formal mathematical law. The tasks that AI faces are challenging to describe in a formal notation and undoubtedly easy for humans to perform. For example, people can easily recognize what the other person is saying and recognize objects in images. For such problems, the computer cannot give its judgment.
Computers are good at assisting abstract and formal tasks, but humans find it difficult mental work. Back in the last century, computers beat human players at chess. However, it has only been in recent years that computers have reached human-level performance in speech and image recognition tasks. Usually, a person's thinking development requires much knowledge about the outside world. A considerable part of domain knowledge is subjective and difficult to express clearly with formal structures. Like humans, computers need to acquire the same order of magnitude of knowledge to behave intelligently. Therefore, a key challenge for scholars in artificial intelligence is how to teach computers to learn from this subjective, informal knowledge.
Early research projects have some knowledge base-based methods, which store knowledge in structured symbols in an approximate exhaustive way, and then design corresponding logic rules for computers to understand the declarations of these symbols. It is conceivable that such a project is time-consuming and labor-intensive, and failure is expected. The reason is that these structural symbols and declarations are subjectively selected by humans, who have not yet been able to construct algorithmic rules that can accurately describe the world.
Some scholars have explored a solution to these relatively emotional problems in modern times. The solution is to devise a way for computers to learn from massive amounts of experience by building hierarchical structures to fit things, and relatively simple relationships define the layers. This scheme allows the computer to capture experience from immense knowledge autonomously. The advantage is that it avoids the need for humans to specify learning content for the computer because humans cannot fully know the characteristics that should be learned. Scientists put forward the concept of hierarchy for the first time, which is based on how the human brain works so that computers can learn complex features by building simple models. We call this deep learning [1] because the computer eventually constructs a “ deep ” graph with layers connected by simple rules.
Artificial intelligence systems need to have the ability to fit models from raw data, that is, the ability to learn independently. We usually call this ability to fit machine learning [2]. Using machine learning, computers can fit approximate models of natural things and make judgments about similar things. The current mainstream machine learning work goes through two steps. First, a feature set is manually selected, and then the original data is submitted to the model, and a decision model is fitted according to the selected features.
Things, in reality, are incredibly complex, and it is difficult for people to go deep into things to see the essence. It is challenging to know which features are essential or even what are the natural features. The inspiration from studying biological neural networks is that machines can autonomously discover the laws hidden in knowledge, rather than simply instilling knowledge into computers, which will make computers forget like a naughty child after learning. It is challenging for humans to extract highly abstract features from raw data. At the same time, computers can use simpler models to represent complex concepts by simulating the human brain and solving the critical problem of feature extraction. Deep learning has gradually developed into an algorithm centered on the artificial neural network [3] algorithm.
Artificial neural networks have brought unprecedented expectations to the research of artificial intelligence. The artificial neural network is not a product of modern times. Its feasibility was verified in the 1950s. Why has it been generally accepted until recent years? This is also the significance of studying deep network algorithms. First of all, the ever-increasing amount of data is stored, and there are many datasets dedicated to studying neural networks, which are getting larger and of higher quality. People are desperate for an algorithm to discover the essence of things from massive amounts of data. Secondly, large-scale models have emerged. After decades of development, the number of neuron connections in the model has reached billions, gradually approaching the number of connections in the human brain. Scientists expect this growth to continue steadily for years to come. Finally, the ability of artificial neural network models to make decisions has been improving, and the accuracy rates on significant data sets are constantly being refreshed. Deep learning is developing rapidly, but it is still very young, and there are still many uncharted fields of research and practical applications waiting for human beings to discover.
Since the birth of deep learning, it has attracted many companies and individuals to join research in this field. Books and applications in the direction of deep learning have sprung up in recent years. By reading the works in deep learning, we can quickly grasp the hot spots and trends in the field. In 2015, an article titled "Deep Learning," published in "Nature," officially pushed deep learning to a climax. Subsequently, many colleges and universities and scientific research units invested in deep learning research, fruitful results were obtained, and new ideas emerged one after another.
Object detection [4] is one of the leading research directions of computer vision, and the academic community has been studying it for a long time. Taking 2012 as the dividing line, the mainstream object detection algorithms before 2012 were mainly based on traditional methods. There are mainly Viola-Jones [5] algorithm, HOG feature algorithm [6], DPM model [7], and so on to extract features by manual design. After 2012, due to the great success of AlexNet, the target detection algorithm based on deep learning gradually became popular. Target detection algorithms based on deep learning can be divided into two types. One is the two-stage algorithm based on candidate regions [8] R-CNN [9], FastR-CNN [10], Faster R-CNN [11], etc. This kind of algorithm needs to generate the target candidate frame first and then classify and regress the candidate frame, so the accuracy is relatively high, but the speed is slow; the other is the retrieval algorithm based on target regression such as YOLO [12], Retinanet, etc. One-stage algorithm [13] directly predicts the categories and positions of different targets, and the detection speed is fast, but the accuracy is relatively low.
In 2001, Viola and Jones applied Haar-like wavelet features to the task of face detection. Although wavelet features were used for detection tasks before them, they designed and improved more targeted features and improved the AdaBoost algorithm to cascade the trained classifiers. The algorithm they proposed left a strong mark in the history of face detection, so it was also called the Viola-Jones detector at the time [14]. Navneet Dalal and Bill Triggs proposed the Histograms of Oriented Gradient (HOG) to detect pedestrians, which calculates the gradient direction of a local area and makes statistics the feature of the local area. By calculating the gradient of the horizontal and vertical directions of the image pixels, the gradient direction of each pixel is calculated. Then the local gradient information is counted and normalized to obtain the feature vector of the local image. Gradients and edge directions can well represent the shape of the object detected in the image, even if we do not know the corresponding gradient and edge positions, which can better maintain robustness to the pose and appearance of human objects in the image. Deformable Part Model (DPM) [15] is a part-based object detection algorithm robust to object shapes. Felzenszwalb proposed this algorithm in 2010, and its basic idea is consistent with the HOG feature algorithm. It improves based on HOG features and only retains the cell units of the picture, and directly normalizes the local cell units, no longer blocks the image; it also uses the SVM classifier to address the multi-view and deformation problems of target detection. They are using different component strategies.
In 2014, Ross Girshick and others proposed R-CNN, which introduced the convolutional neural network into the field of target detection and successfully generalized the neural network results in image classification to the field of target detection. The algorithm extracts 2000 category-independent candidate regions for a given input image. It then uses the CNN network proposed by Krizhevsky et al. to extract 4096-dimensional feature vectors for each candidate region. The image size needs to be converted to fit the CNN network in this process. The fixed input is finally input to the SVM for classification, and the final result is output. The average accuracy of the method has been dramatically improved, reaching 53.3% accuracy on VOC 2012. However, the shortcomings of R-CNN are also apparent. One is that feature extraction is performed for each candidate region. There is no shared calculation, resulting in a waste of computing resources and too long detection time; second, it needs to be trained in stages, and the training is cumbersome. Because of the above shortcomings, Girshick and Ross borrowed from SPPnet and proposed Fast R-CNN. Fast R-CNN is higher than R-CNN and SPPnet in detection performance, the training process is no longer staged, and the extracted features no longer need to be cached during the detection process. Based on Fast R-CNN, Shaoqing Ren et al. proposed Faster R-CNN. The algorithm consists of two parts. The first part is a fully convolutional network to generate candidate regions, and the second part is the Fast R-CNN detector. For detection classification. The most prominent feature of Faster R-CNN is that it proposes a regional recommendation network (Rogin Proposal Network, RPN) to replace the search selective algorithm to generate detection boxes, which reduces the amount of calculation when generating candidate regions. Given a picture of any size as input, Faster R-CNN first scales the input picture to a fixed size and then sends it to CNN for feature extraction to obtain a feature map. RPN network calculates the detection frame, and the next part is responsible for completing the classification task.
Although Faster R-CNN has improved the detection speed, its detection speed is still not satisfactory in practical applications. The R-CNN series divides the object detection task into multiple stages, and each stage needs to be trained separately, so the model detection performance is not easy to optimize. For the shortcomings of the R-CNN series. The YOLO (You Only Look Once) model was proposed by Joseph Redmon et al. The algorithm first adjusts the size of the input image and then sends it to the convolutional neural network to obtain the classification and regression results of the box (bounding box). Final test result. YOLO converts the target detection problem into a regression problem, and the model has a simple structure, easy to train fast detection speed, and strong generalization ability. Because of the above shortcomings, the author proposed YOLOV2. Based on YOLOV1, Batch Normalization (BN) was used to speed up the convergence rate of the model and reduce the overfitting of the model. Drawing on the idea of Faster R-CNN, it introduced the " Anchor" mechanism; VGG-16 is no longer used as a feature extractor, and a new backbone network, Darknet-19, is proposed. The residual unit was introduced into YOLOV3, and the network was further deepened. Based on Darknet-19, the Darknet-53 network was proposed for feature extraction; the second is to use the Feature Pyramid Networks (FPN) to achieve multi-scale Detection and strengthen the Detection of small-scale objects. Aiming at the characteristics of a two-stage type algorithm with high accuracy and slow speed and a one-stage type algorithm with low precision and high speed, in order to balance the accuracy and speed, Tsung-Yi Lin et al. proposed the RetinaNet model [16]. They introduced the Focal Loss loss function, modified based on the cross-entropy loss function. By adjusting the loss function, the one-stage model can achieve the same accuracy as Faster RCNN.
There are many large companies in the world doing deep learning research. Google, Microsoft, Apple, Amazon, and other companies are actively involved in deep learning research. Most of them do research projects on the one hand, such as "Google Brain," and on the other hand, make practical applications, such as the chat robot "XiaoIce" launched by Microsoft. Apple's intelligent voice assistant "Siri," etc. At the same time, China's research on deep learning is a rising star, and it has tremendous momentum to catch up with the international community. Some big tech companies are starting to get into this space. China's Alibaba, Tencent, Baidu, Huawei, and other companies have also stepped up research. In addition, many start-up companies apply this technology to various industries such as biology, medical care, and advertising.
In the early days of artificial intelligence, it was easy for computers to deal with difficult or even impossible problems for humans to solve. These problems could be described by formal mathematical law. The tasks that AI faces are challenging to describe in a formal notation and undoubtedly easy for humans to perform. For example, people can easily recognize what the other person is saying and recognize objects in images. For such problems, the computer cannot give its judgment.
Computers are good at assisting abstract and formal tasks, but humans find it difficult mental work. Back in the last century, computers beat human players at chess. However, it has only been in recent years that computers have reached human-level performance in speech and image recognition tasks. Usually, a person's thinking development requires much knowledge about the outside world. A considerable part of domain knowledge is subjective and difficult to express clearly with formal structures. Like humans, computers need to acquire the same order of magnitude of knowledge to behave intelligently. Therefore, a key challenge for scholars in artificial intelligence is how to teach computers to learn from this subjective, informal knowledge.
Early research projects have some knowledge base-based methods, which store knowledge in structured symbols in an approximate exhaustive way, and then design corresponding logic rules for computers to understand the declarations of these symbols. It is conceivable that such a project is time-consuming and labor-intensive, and failure is expected. The reason is that these structural symbols and declarations are subjectively selected by humans, who have not yet been able to construct algorithmic rules that can accurately describe the world.
Some scholars have explored a solution to these relatively emotional problems in modern times. The solution is to devise a way for computers to learn from massive amounts of experience by building hierarchical structures to fit things, and relatively simple relationships define the layers. This scheme allows the computer to capture experience from immense knowledge autonomously. The advantage is that it avoids the need for humans to specify learning content for the computer because humans cannot fully know the characteristics that should be learned. Scientists put forward the concept of hierarchy for the first time, which is based on how the human brain works so that computers can learn complex features by building simple models. We call this deep learning [1] because the computer eventually constructs a “ deep ” graph with layers connected by simple rules.
Artificial intelligence systems need to have the ability to fit models from raw data, that is, the ability to learn independently. We usually call this ability to fit machine learning [2]. Using machine learning, computers can fit approximate models of natural things and make judgments about similar things. The current mainstream machine learning work goes through two steps. First, a feature set is manually selected, and then the original data is submitted to the model, and a decision model is fitted according to the selected features.
Things, in reality, are incredibly complex, and it is difficult for people to go deep into things to see the essence. It is challenging to know which features are essential or even what are the natural features. The inspiration from studying biological neural networks is that machines can autonomously discover the laws hidden in knowledge, rather than simply instilling knowledge into computers, which will make computers forget like a naughty child after learning. It is challenging for humans to extract highly abstract features from raw data. At the same time, computers can use simpler models to represent complex concepts by simulating the human brain and solving the critical problem of feature extraction. Deep learning has gradually developed into an algorithm centered on the artificial neural network [3] algorithm.
Artificial neural networks have brought unprecedented expectations to the research of artificial intelligence. The artificial neural network is not a product of modern times. Its feasibility was verified in the 1950s. Why has it been generally accepted until recent years? This is also the significance of studying deep network algorithms. First of all, the ever-increasing amount of data is stored, and there are many datasets dedicated to studying neural networks, which are getting larger and of higher quality. People are desperate for an algorithm to discover the essence of things from massive amounts of data. Secondly, large-scale models have emerged. After decades of development, the number of neuron connections in the model has reached billions, gradually approaching the number of connections in the human brain. Scientists expect this growth to continue steadily for years to come. Finally, the ability of artificial neural network models to make decisions has been improving, and the accuracy rates on significant data sets are constantly being refreshed. Deep learning is developing rapidly, but it is still very young, and there are still many uncharted fields of research and practical applications waiting for human beings to discover.
Since the birth of deep learning, it has attracted many companies and individuals to join research in this field. Books and applications in the direction of deep learning have sprung up in recent years. By reading the works in deep learning, we can quickly grasp the hot spots and trends in the field. In 2015, an article titled "Deep Learning," published in "Nature," officially pushed deep learning to a climax. Subsequently, many colleges and universities and scientific research units invested in deep learning research, fruitful results were obtained, and new ideas emerged one after another.
Object detection [4] is one of the leading research directions of computer vision, and the academic community has been studying it for a long time. Taking 2012 as the dividing line, the mainstream object detection algorithms before 2012 were mainly based on traditional methods. There are mainly Viola-Jones [5] algorithm, HOG feature algorithm [6], DPM model [7], and so on to extract features by manual design. After 2012, due to the great success of AlexNet, the target detection algorithm based on deep learning gradually became popular. Target detection algorithms based on deep learning can be divided into two types. One is the two-stage algorithm based on candidate regions [8] R-CNN [9], FastR-CNN [10], Faster R-CNN [11], etc. This kind of algorithm needs to generate the target candidate frame first and then classify and regress the candidate frame, so the accuracy is relatively high, but the speed is slow; the other is the retrieval algorithm based on target regression such as YOLO [12], Retinanet, etc. One-stage algorithm [13] directly predicts the categories and positions of different targets, and the detection speed is fast, but the accuracy is relatively low.
In 2001, Viola and Jones applied Haar-like wavelet features to the task of face detection. Although wavelet features were used for detection tasks before them, they designed and improved more targeted features and improved the AdaBoost algorithm to cascade the trained classifiers. The algorithm they proposed left a strong mark in the history of face detection, so it was also called the Viola-Jones detector at the time [14]. Navneet Dalal and Bill Triggs proposed the Histograms of Oriented Gradient (HOG) to detect pedestrians, which calculates the gradient direction of a local area and makes statistics the feature of the local area. By calculating the gradient of the horizontal and vertical directions of the image pixels, the gradient direction of each pixel is calculated. Then the local gradient information is counted and normalized to obtain the feature vector of the local image. Gradients and edge directions can well represent the shape of the object detected in the image, even if we do not know the corresponding gradient and edge positions, which can better maintain robustness to the pose and appearance of human objects in the image. Deformable Part Model (DPM) [15] is a part-based object detection algorithm robust to object shapes. Felzenszwalb proposed this algorithm in 2010, and its basic idea is consistent with the HOG feature algorithm. It improves based on HOG features and only retains the cell units of the picture, and directly normalizes the local cell units, no longer blocks the image; it also uses the SVM classifier to address the multi-view and deformation problems of target detection. They are using different component strategies.
In 2014, Ross Girshick and others proposed R-CNN, which introduced the convolutional neural network into the field of target detection and successfully generalized the neural network results in image classification to the field of target detection. The algorithm extracts 2000 category-independent candidate regions for a given input image. It then uses the CNN network proposed by Krizhevsky et al. to extract 4096-dimensional feature vectors for each candidate region. The image size needs to be converted to fit the CNN network in this process. The fixed input is finally input to the SVM for classification, and the final result is output. The average accuracy of the method has been dramatically improved, reaching 53.3% accuracy on VOC 2012. However, the shortcomings of R-CNN are also apparent. One is that feature extraction is performed for each candidate region. There is no shared calculation, resulting in a waste of computing resources and too long detection time; second, it needs to be trained in stages, and the training is cumbersome. Because of the above shortcomings, Girshick and Ross borrowed from SPPnet and proposed Fast R-CNN. Fast R-CNN is higher than R-CNN and SPPnet in detection performance, the training process is no longer staged, and the extracted features no longer need to be cached during the detection process. Based on Fast R-CNN, Shaoqing Ren et al. proposed Faster R-CNN. The algorithm consists of two parts. The first part is a fully convolutional network to generate candidate regions, and the second part is the Fast R-CNN detector. For detection classification. The most prominent feature of Faster R-CNN is that it proposes a regional recommendation network (Rogin Proposal Network, RPN) to replace the search selective algorithm to generate detection boxes, which reduces the amount of calculation when generating candidate regions. Given a picture of any size as input, Faster R-CNN first scales the input picture to a fixed size and then sends it to CNN for feature extraction to obtain a feature map. RPN network calculates the detection frame, and the next part is responsible for completing the classification task.
Although Faster R-CNN has improved the detection speed, its detection speed is still not satisfactory in practical applications. The R-CNN series divides the object detection task into multiple stages, and each stage needs to be trained separately, so the model detection performance is not easy to optimize. For the shortcomings of the R-CNN series. The YOLO (You Only Look Once) model was proposed by Joseph Redmon et al. The algorithm first adjusts the size of the input image and then sends it to the convolutional neural network to obtain the classification and regression results of the box (bounding box). Final test result. YOLO converts the target detection problem into a regression problem, and the model has a simple structure, easy to train fast detection speed, and strong generalization ability. Because of the above shortcomings, the author proposed YOLOV2. Based on YOLOV1, Batch Normalization (BN) was used to speed up the convergence rate of the model and reduce the overfitting of the model. Drawing on the idea of Faster R-CNN, it introduced the " Anchor" mechanism; VGG-16 is no longer used as a feature extractor, and a new backbone network, Darknet-19, is proposed. The residual unit was introduced into YOLOV3, and the network was further deepened. Based on Darknet-19, the Darknet-53 network was proposed for feature extraction; the second is to use the Feature Pyramid Networks (FPN) to achieve multi-scale Detection and strengthen the Detection of small-scale objects. Aiming at the characteristics of a two-stage type algorithm with high accuracy and slow speed and a one-stage type algorithm with low precision and high speed, in order to balance the accuracy and speed, Tsung-Yi Lin et al. proposed the RetinaNet model [16]. They introduced the Focal Loss loss function, modified based on the cross-entropy loss function. By adjusting the loss function, the one-stage model can achieve the same accuracy as Faster RCNN.
There are many large companies in the world doing deep learning research. Google, Microsoft, Apple, Amazon, and other companies are actively involved in deep learning research. Most of them do research projects on the one hand, such as "Google Brain," and on the other hand, make practical applications, such as the chat robot "XiaoIce" launched by Microsoft. Apple's intelligent voice assistant "Siri," etc. At the same time, China's research on deep learning is a rising star, and it has tremendous momentum to catch up with the international community. Some big tech companies are starting to get into this space. China's Alibaba, Tencent, Baidu, Huawei, and other companies have also stepped up research. In addition, many start-up companies apply this technology to various industries such as biology, medical care, and advertising.
In this thesis, by comparing the two-stage RCNN series algorithms and the one-stage YOLO series algorithms, it is found that the YOLO series is more suitable for real-time detection. Then we conducted a detailed comparison of the advantages and disadvantages of YOLOv1 to YOLOv5 and finally decided to choose between YOLOv4 and YOLOv5. YOLOv5 is a single-stage target detection algorithm. This algorithm adds some new improvement ideas based on YOLOv4, which dramatically improves its speed and accuracy, including Mosaic data enhancement at the input, adaptive Anchor box calculation, adaptive image scaling operation; Focus structure and CSP structure of reference end; SPP and FPN+PAN structure of Neck end; loss function GIOU_Loss of output end and DIOU_nms of prediction box screening. After a detailed comparison, we found that although YOLOv5 is slightly weaker than YOLOv4 in performance, it is much stronger than YOLOv4 in flexibility and speed and has a decisive advantage in the rapid deployment of models.
After using this YOLOv5 algorithm to train the data set we obtained in the LG SVL simulator, a real-time detection system with high precision and low latency were initially developed. However, the shortcomings are also pronounced. Since most of the data sets come from the LG SVL simulator, an automatic driving simulator, our data set is not rich in richness, which leads to our excellent car recognition effect. The classification results of other categories are not very satisfactory.
In the following research work, I will learn the lessons of this time, summarize the shortcomings of this time, increase the richness of the data set, continue to conduct more profound research on object detection, strive to learn various algorithm knowledge, and improve existing The algorithm has further improved the detection accuracy and real-time performance of research tasks and is ready to deepen the research direction into the object detection of autonomous driving and contribute to the field of self-driving cars.
After using this YOLOv5 algorithm to train the data set we obtained in the LG SVL simulator, a real-time detection system with high precision and low latency were initially developed. However, the shortcomings are also pronounced. Since most of the data sets come from the LG SVL simulator, an automatic driving simulator, our data set is not rich in richness, which leads to our excellent car recognition effect. The classification results of other categories are not very satisfactory.
In the following research work, I will learn the lessons of this time, summarize the shortcomings of this time, increase the richness of the data set, continue to conduct more profound research on object detection, strive to learn various algorithm knowledge, and improve existing The algorithm has further improved the detection accuracy and real-time performance of research tasks and is ready to deepen the research direction into the object detection of autonomous driving and contribute to the field of self-driving cars.





