FB Pixel no scriptSenseAuto’s end-to-end AI model push: Wang Xiaogang emphasizes no plan B | KrASIA
MENU
KrASIA
Insights

SenseAuto’s end-to-end AI model push: Wang Xiaogang emphasizes no plan B

Written by 36Kr English Published on   8 mins read

Share
SenseTime co-founder Wang Xiaogang said the company must win the automotive AI battle, which now centers on end-to-end large model deployment.

Header photo source: SenseTime.

“I often tell my colleagues that a team’s life spans just six months—only by surviving beyond that can we sustain it,” said Wang Xiaogang, president of SenseAuto, the intelligent automotive division of SenseTime.

The automotive industry of yesteryear has undergone a rapid series of changes, with new technologies emerging one after another. In less than three years, intelligent driving systems have extended their coverage from highways to urban and rural roads nationwide. Players risk being left behind and losing their chance to keep competing if they aren’t vigilant—and Wang hopes this is a warning that will motivate his team to keep pace with the industry.

Under Wang’s charge, SenseTime launched SenseAuto in 2021, entering the intelligent automotive market as a tier-one supplier. As co-founder and chief scientist of SenseTime, Wang heads the company’s research on artificial intelligence.

SenseAuto’s primary revenue source was once its intelligent cockpit business, partnering with original equipment manufacturers (OEMs) like SAIC Motor and Chery on hundreds of mass produced models. However, the broader and more rapidly evolving field in the automotive industry is intelligent driving.

End-to-end large models are among the major factors influencing the direction of intelligent driving today. According to Wang, end-to-end large models are crucial for the team to achieve breakthroughs in intelligent driving.

While automakers and tier-one suppliers sought to compete in 2023 by developing low-cost and versatile intelligent driving systems, this changed in May that year. That month, Elon Musk said that Tesla would work toward updating its autonomous driving system with end-to-end large models, gradually shifting the industry’s R&D focus on intelligent driving.

End-to-end large models aim to integrate all processes of intelligent driving into a unified model, allowing the direct production of final results based on raw data inputs, significantly enhancing the generality of intelligent driving systems.

This wave of deep integration between artificial intelligence and intelligent driving presented SenseAuto with an opportunity to leverage the advantages of large models.

During the Beijing Auto Show in April this year, SenseAuto demonstrated its general-purpose end-to-end large model, UniAD, for integrated perception and decision-making in autonomous driving. Reportedly, using only vision sensors and navigation maps, UniAD-equipped vehicles can navigate both urban and rural roads effectively.

Wang told 36Kr that SenseTime’s exploration of end-to-end intelligent driving began with its collaboration with Honda. In 2017, Honda presented SenseTime with a challenge to achieve intelligent driving using only cameras and without high-precision maps. “We realized end-to-end intelligent driving at Honda’s test track back then. Since then, the team has continued researching end-to-end models.”

This collaboration marked the beginning of SenseAuto’s investment in large model R&D. In 2018, SenseTime established a supercomputing center in Shanghai, which now boasts over 45,000 GPUs with a total computing power of 12,000 PFLOPS, enabling continuous 30-day stable training of large models. Ample computing resources mean that SenseAuto’s model iterations are nearly unrestricted.

Model training relies on road data. Wang said to 36Kr that, during the development and testing phases of mass produced vehicle models, the team defines a data operation standard, collecting a complete set of data for end-to-end large model training. Once the vehicles enter the market, SenseTime will have access to richer road data.

To obtain high-quality non-public data, SenseAuto has also developed a method of modeling the world using AI video generation techniques, capable of creating specified scenes as needed for model training.

To expand on its efforts, SenseAuto restructured its team, adding new members from automotive companies and tier-one suppliers to enhance its delivery capabilities.

Unlike most intelligent driving solution providers, SenseAuto is not averse to providing “white box” deliveries. Wang believes that only when automakers fully understand the technology and the limitations of existing solutions will they actively collaborate with the team to co-develop and accelerate product iteration.

SenseAuto aims to integrate end-to-end large models into cars by 2025. For Wang, this is a must-win battle with “no plan B.”

The insights above were derived from an interview conducted by 36Kr with Wang. The following transcript is a translation of that interview and has been edited and consolidated for brevity and clarity.

36Kr: What is driving the transition of autonomous driving algorithms from rule-based to AI-based?

Wang Xiaogang (WX): Firstly, rule-based autonomous driving may encounter thousands of road scenarios daily, each requiring different rules. Over time, the initial rules’ purposes may be forgotten, and maintaining them consumes immense resources. AI-driven large models can increase R&D efficiency by tenfold.

Secondly, GPT-4’s multimodal data flow and real-time reasoning can significantly enhance human-computer interaction (HCI) experiences. Previously, rule-based interactions were fixed, monotonous, and not intelligent enough. Now, with the ability to interact naturally with the car’s interior and exterior cameras, it aligns well with automotive use scenarios.

36Kr: Is a segmented end-to-end model truly an end-to-end large model?

WX: No, it isn’t. A segmented model’s capabilities are weak and can’t comprehend complex scenarios, only solving simplified tasks. Such a model doesn’t need a large network for data feeding and lacks a human-like brain.

For example, bees excel at specific simple tasks based on their biological habits, but their brains are very simple and lack the general capabilities humans have. In new scenarios, bees can’t invent new tools to solve problems, akin to segmented end-to-end models versus integrated end-to-end models. The former’s neural network models are small and only solve specific tasks.

36Kr: End-to-end large models have high upper limits and unpredictable lower limits. How do you control the lower limit?

WX: Initially, rules must be used as a fallback. As end-to-end models develop, the number of rules decreases. It’s a process of adding and deleting code, but stronger large models will require fewer rules.

Today, SenseAuto’s lane keeping perception is excellent, so many rules have been removed. If future scenarios become complex, rules will be added again. However, robust large models will gradually need fewer rules.

In reality, ChatGPT also has many fallback rules when deriving various applications. The core of end-to-end large models is general capability, which enables completing more tasks.

36Kr: Some believe that large-scale production of mapless intelligent driving solutions by automakers benefits end-to-end intelligent driving more, whereas SenseTime directly leaps to end-to-end. What’s the difference?

WX: Most end-to-end large models use lightweight map solutions with simple annotations. Switching technical routes is costly, akin to rebuilding the R&D system from scratch.

All rule-based intelligent driving solutions require thousands of algorithm engineers constantly writing rules and patches to maintain the system. After mass production, these solutions need ongoing maintenance. Changing the technical route means restarting the R&D process.

Rule-based solutions have complex algorithms on the car end, while end-to-end large models have simple car-end network algorithms and complex backend tasks, including data looping, training, and cleaning, to maintain model training stability.

36Kr: Where does SenseAuto source the data needed to train models?

WX: End-to-end large models are a long-term development process that needs to be phased. SenseTime collects data and collaborates with automakers.

For mass produced vehicle models SenseAuto collaborates on, a data operation standard is defined during development and testing. These rule-based intelligent driving system models provide comprehensive data for end-to-end large model training.

When the models hit the market, data flows back, and we deeply cooperate with automakers to clean and select richer road data.

The deeper the data collection process goes, the harder it is to obtain specific desired data, and costs increase. SenseAuto accommodates this by using a world model generated by AI video generation for data collection.

As for the cost of data collection using this method, SenseTime collaborates with different industries to share costs and develop technologies together.

36Kr: What is automakers’ attitude toward data sharing when SenseAuto promotes it?

WX: Automakers are currently willing to share data with us because SenseAuto serves a clear purpose. When automakers know what issues exist, they are willing to provide relevant data to solve the problems. However, they haven’t seen the more general capabilities of end-to-end large models yet. If they do, they will be more motivated to explore data with us.

36Kr: What is the talent profile needed for end-to-end large model development?

WX: The platform system of end-to-end large models is crucial, requiring the team to have strong and comprehensive engineering capabilities. For model training, the team should be innovative and find ways to iterate quickly. When delivering the final solution, an experienced team is needed as a fallback.

36Kr: From an industry perspective, what is the ideal team size for end-to-end large model development?

WX: Many teams developing end-to-end large models allocate the majority of their headcount to manage data collection, testing, and analysis. A team of dozens involved directly in large model work is already large.

36Kr: What do you think the future structure of the automotive industry will be?

WX: The bulk of collaborations will be among automakers, chip companies, and AI companies, with other segments like hardware and tier-one integrators possibly being absorbed.

36Kr: What is SenseAuto’s business model?

WX: SenseAuto has three main businesses: intelligent driving, intelligent cockpit, and AI cloud services. Essentially, SenseAuto provides capabilities to automakers.

I believe the end goal is to empower automakers with foundational capabilities and develop various differentiated applications by harnessing data, rather than delivering standardized products.

36Kr: Unlike other tier-one suppliers, SenseAuto doesn’t seem to require automakers to possess intelligent driving capabilities. Is that correct?

WX: Automakers need to understand the technology. SenseAuto can deliver solutions in “white box” format to automakers. Only when they understand it can they generate valuable data based on their needs, invest resources accordingly, enhance the large model, and advance the entire system.

If automakers rely on tier-one suppliers to solve problems, they will never achieve leapfrog technological development.

End-to-end large models bring general capabilities to intelligent driving, generating new applications with vast imagination and expansion space beyond single-task understanding.

36Kr: Does this mean SenseAuto’s current business model doesn’t emphasize delivery?

WX: Realizing grand ambitions is a process that requires taking steps steadily for quality upkeep and building trust with clients. SenseAuto’s internal priority is to put clients and quality first, and prioritize responding to client needs promptly.

36Kr: How does SenseAuto enhance its delivery capability?

WX: Previously, we had many AI talents. Now, we have brought in numerous experienced professionals from tier-one suppliers and automakers.

In terms of organizational structure, we have R&D personnel at the back end and a comprehensive delivery team at the front end, ensuring sufficient resources for delivery. Our quality control system is also actively being built.

36Kr: How do you allocate your efforts at SenseTime?

WX: Most of my efforts are focused on SenseAuto, with a lot of interaction with the group’s R&D.

Today, cars are crucial for implementing large models because the core of large models is the human-computer interaction experience. Currently, the only human-computer interfaces are phones, cars, and robots.

Phones only offer text-based interaction due to cost considerations, limiting multimodal interactions. Robots’ interactions are related to cars but haven’t reached large-scale production. They also provide limited feedback and can’t form a complete loop.

Cars offer the best scenarios for multimodal interaction, comprising both interior and exterior interactions, and have the potential of high production volumes. Consumers’ acceptance of multimodal large models will continue to grow. Inside the car, users can interact with multimodal large models. Outside, large models can extend the user’s vision, providing information about traffic, buildings, text, and more.

36Kr: Is the 2025 end-to-end large model deployment for SenseAuto a must-win battle?

WX: Yes, there is no plan B. I often tell my colleagues that a team’s life spans just six months—only by surviving beyond that can we sustain it. We have long-term goals for the next five to ten years, but our life is always just six months at a time.

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Tian Zhe for 36Kr.

Share

Auto loading next article...

Loading...