“Embrace end-to-end, or leave the smart driving industry in a few years.”
After several years in the smart driving industry, an engineer who spoke under the pseudonym Qin Feng, had grown accustomed to its intense competition. But with the advent of end-to-end (E2E) large model technology, he felt the initial impact would not be on human drivers, but on engineers like himself.
This anxiety is not unique to Qin Feng. Many smart driving engineers told 36Kr that, to stay abreast of new technology, they study the latest industry papers and attend courses on Bilibili during overtime, with some even revisiting graduate textbooks.
The emergence of E2E large models has set off a technological explosion in the smart driving industry this year. In January, Tesla broadened the deployment of the V12 test version of its Full Self-Driving (FSD) software package, making it available to standard users. This version employs an E2E neural network, and many users have reported that its performance is impressive, more human-like than previous iterations.
Tesla CEO Elon Musk has characterized E2E as the ability to output driving commands relying solely on image inputs. Although industry insiders told 36Kr that they do not believe Tesla’s E2E solution is as radical and magical as it sounds, it remains an enticing proposition. Chinese players increasingly believe that with the drive of large models, massive computing power, and vast data, artificial intelligence-powered systems will be able to drive like humans.
Sensing this emerging trend, Chinese car companies and smart driving firms have already begun to take action. Leading players such as Huawei, Xpeng Motors, Nio, Li Auto, and BYD have invested significant manpower and resources to advance E2E solutions. Li Auto and Nio have even established dedicated departments to expedite implementation.
The competition for top talent is also intensifying. When Xiaomi Auto launched its first car, it recruited Wang Naiyan, the former CTO of TuSimple China, to catch up. An industry insider told 36Kr that Huawei even leverages related patents to anchor talent for targeted recruitment.
However, the flip side of this coin is that E2E heavily relies on data-driven methods rather than sheer manpower. Tesla’s team of about 300 people is regarded as a model among leading players. In contrast, the smart driving teams of leading Chinese firms currently have nearly a thousand employees. BYD, which is rapidly advancing in smart driving, boasts a software team of 3,000 people, with Huawei not far behind. In good times, engineers can generally expect an annual salary package of RMB 1 million.
Many engineers believe that if the effectiveness of E2E solutions is further validated, layoffs will likely occur. “200–300 people will be enough,” a former core member of a nascent automaker’s smart driving team told 36Kr firmly. Even fresh graduates with a deep learning background may have an advantage over some engineers entering E2E projects.
Headhunters have also felt the industry’s talent surplus: car companies’ smart driving teams are no longer opening new positions, and personnel need to be streamlined. One headhunter reported switching tracks to recruit talent for robotics companies.
A smart driving engineer, who wished to be known by the pseudonym Tian Wei, told 36Kr that, compared to perception and prediction modules, engineers working on planning and control modules would be more impacted under the current trajectory. This is because E2E solutions differ significantly from traditional alternatives. Traditional solutions are divided into multiple modules such as perception, localization, mapping, prediction, and planning and control, with functions mainly driven by engineers’ code. The perception and planning and control departments usually account for the majority of the team.
The characteristic of the E2E solution is that it shifts emphasis from engineer-driven code to data-driven methods. Ideally, the system inputs images and directly outputs vehicle control commands, with intermediate steps completed by neural networks.
Observing the progress of leading Chinese players, after introducing an E2E solution, multiple modules of the traditional solution are being integrated into two large networks through neural network transformation, targeting primarily the perception and prediction components of large models. The next step in the solution is to integrate perception, prediction, decision-making, and planning, which the industry calls “One Model.”
This new technical route also brings a new talent profile to car companies’ smart driving teams.
An industry insider told 36Kr that the number of people needed for E2E large model teams has decreased, but the talent threshold has become higher. Large models require teams with strong deep learning backgrounds. “During development, very strong infrastructure talents are needed, with a deep understanding of perception and planning and control modules, knowledge of different chip computing platforms’ support capabilities, and various AI inference frameworks.”
However, only a small portion of people are responsible for model building and training. “Probably 90% of the team is providing data for E2E, as well as data closed-loop toolchain support.” The team working on the large model itself is very lean. This is why pioneering AI technology companies like OpenAI, with only 200–300 people, could launch a large language model like ChatGPT, changing the global AI landscape.
For engineers, the impact of E2E technology varies. An industry insider told 36Kr that, in the two major modules of perception and planning control, the perception component originally relied on deep learning technology. Although the visual detection route has shifted from the past convolutional neural network (CNN) to the Transformer-based bird’s eye view (BEV), the impact on engineers is not significant.
But for planning and control engineers, joining an E2E large model team is almost like switching tracks. Traditional planning and control engineers mainly have several specializations: path prediction, path optimization, rule post-processing, and vehicle control. “These are quite subdivided disciplines and are generally unrelated. Except for the path prediction module, engineers specializing in other areas generally lack a deep learning background.”
Tian Wei told 36Kr that, if planning and control engineers want to transition to developing E2E large models, one direction is model training itself, but this requires a very strong deep learning background. “It is possible that fresh graduates who study deep learning have a better understanding of the models than you.”
Secondly, it is data mining and processing to provide enhancements for E2E large models. “But once the toolchain infrastructure is built and the model structure stabilizes, people may no longer be needed.” Lastly, it is model post-processing. The output of E2E large models could still be unreliable, and a small number of engineers will be needed to write rules to cover it.
Engineers’ anxiety also stems from this. “On one hand, the E2E large model itself does not need so many people. On the other hand, everyone wants to do E2E, but the company’s production business still needs people to operate.”
An unnamed smart driving developer expressed regret over missing the opportunity to join the E2E project group due to his company’s current production commitments. However, he is conflicted; even if he joins the E2E team, he would be in a non-core position. Staying in his current role allows him to accumulate significant experience, which might still be relevant to traditional car companies for a few more years.
The risk, however, is that once E2E solutions become widespread, his accumulated expertise could become obsolete. “Maybe I will have to leave the smart driving industry,” he lamented. To transition, Tian Wei has started undertaking graduate-level deep learning courses. He acquired classic deep learning materials and a graphics card to implement simple image recognition algorithms from his textbooks. “At least I need to digest the knowledge first to understand how the model itself operates,” he said. After two months of study and practice, he felt he could slightly understand some E2E large model open-source code.
Tian Wei is not alone in his anxiety; his employer is equally concerned. The company collaborates with an automaker on smart driving production solutions but also has an internal team advancing E2E solutions. According to Tian Wei, with thousands of hours of video data, an E2E demo can be trained. However, the company can only produce a demo proving feasibility, far from achieving mass production.
This new technological divergence will first manifest in resource allocation. Musk emphasized the importance of data for E2E, stating, “Training with one million video cases is barely enough. Two million is slightly better, three million will make you say ‘wow,’ and ten million will be unbelievable.” Additionally, Musk made large-scale purchases of Nvidia graphics cards for training, claiming that by the end of the year, Tesla’s AI training computing power would be equivalent to 90,000 Nvidia H100 GPUs.
For smart driving companies still struggling to be profitable, the threshold is quite high. Without cooperation with car companies, collecting sufficient training data independently is difficult. Moreover, cloud training chips are scarce in China, and many car companies are buying them at high prices. “Production projects and financing are still unclear, making it difficult to invest in E2E solutions long-term.”
Another smart driving engineer felt similarly helpless. After working on his company’s E2E project for half a year, he received a notice to suspend the project. The company needs to focus its efforts and resources on developing an urban mapless smart driving solution. E2E large model development consumes too many resources.
While the new technology has yet to fully arrive in China, its impact on the talent structure and ecosystem of the smart driving industry is already emerging. Leading players will still strive to jump on this bandwagon, heralding an era where giants will master data, chip, and talent resources.
KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Li Anqi for 36Kr.