VLA Large Model Applications in Automotive and Robotics Research Report, 2025
ResearchInChina releases "VLA Large Model Applications in Automotive and Robotics Research Report, 2025":
The report summarizes and analyzes the technical origin, development stages, application cases and core characteristics of VLA large models.
It sorts out 8 typical VLA implementation solutions, as well as typical VLA large models in the fields of intelligent driving and robotics, and summarizes 4 major trends in VLA development.
It analyzes the VLA application solutions in the field of intelligent driving of companies such as Li Auto, XPeng Motors, Chery Automobile, Geely Automobile, Xiaomi Auto, DeepRoute.ai, Baidu, Horizon Robotics, SenseTime, NVIDIA, and iMotion.
It sorts out more than 40 large model frameworks or solutions such as robot general basic models, multimodal large models, data generalization models, VLM models, VLN models, VLA models and robot world models.
It analyzes the large models and VLA large model application solutions of companies such as AgiBot, Galbot, Robot Era, Estun, Unitree, UBTECH, Tesla Optimus, Figure AI, Apptronik, Agility Robotics, XPeng IRON, Xiaomi CyberOne, GAC GoMate, Chery Mornine, Leju Robotics, LimX Dynamics, AI2 Robotics, and X Square Robot.
Vision-Language-Action (VLA) model is an end-to-end artificial intelligence model that integrates three modalities: Vision, Language, and Action. Through a unified multimodal learning framework, it integrates perception, reasoning and control, and directly generates executable physical world actions (such as robot joint movement, vehicle steering control) based on visual input (such as images, videos) and language instructions (such as task descriptions).
In July 2023, Google DeepMind launched the RT-2 model, which adopts the VLA architecture. By integrating large language models with multimodal data training, it endows robots with the ability to perform complex tasks. Its task accuracy has nearly doubled compared with the first-generation model (from 32% to 62%), and it has achieved breakthrough zero-shot learning in scenarios such as garbage classification.
The concept of VLA was quickly noticed by automobile companies and rapidly applied to the field of automotive intelligent driving. If "end-to-end" was the hottest term in the intelligent driving field in 2024, then "VLA" will be the one in 2025. Companies such as XPeng Motors, Li Auto, and DeepRoute.ai have released their respective VLA solutions.
When XPeng Motors released the G7 model in July, it took the lead in announcing the mass production of VLA in vehicles. Li Auto plans to equip the i8 model with VLA, which is expected to be revealed at the press conference on July 29. Enterprises such as Geely Automobile, DeepRoute.ai and iMotion are also developing VLA.
Li Auto and XPeng Motors have given different solutions on whether VLA models should be distilled first or reinforced learning first when applied in vehicles
At the pre-sale conference of XPeng Motors' G7, He Xiaopeng used the brain and cerebellum as metaphors to explain the functions of the traditional end-to-end and VLA. He said that traditional end-to-end solution plays the role of cerebellum, "making the car able to drive", while VLA introduces a large language model, playing the role of brain, "making the car drive well".
XPeng Motors and Li Auto have taken slightly different routes in VLA application: Li Auto first distills the cloud-based base large model, and then performs reinforcement learning on the distilled end-side model; XPeng Motors first performs reinforcement learning on the cloud-based base large model, and then distills it to the vehicle end.
In May 2025, Li Xiang mentioned in AI Talk that Li Auto's cloud-based base model has 32 billion parameters, distills a 3.2 billion parameter model to the vehicle end, and then conducts post-training and reinforcement learning through driving scenario data, and will deploy the final driver Agent on the end and cloud in the fourth stage.
XPeng Motors has also divided the factory for training and deploying VLA models into four workshops: the first workshop is responsible for pre-training and post-training of the base model; the second workshop is responsible for model distillation; the third workshop continues pre-training the distilled model; the fourth workshop deploys XVLA to the vehicle end. Dr. Liu Xianming, head of XPeng's world base model, said that XPeng Motors has trained "XPeng World Base Models" with multiple parameters such as 1 billion, 3 billion, 7 billion, and 72 billion in the cloud.
Which solution is more suitable for the intelligent driving environment remains to be seen based on the specific performance of different manufacturers' VLA solutions after being applied in vehicles.
Recently, research teams from McGill University, Tsinghua University, Xiaomi Corporation, and the University of Wisconsin-Madison jointly released a comprehensive review article on VLA models in the field of autonomous driving, "A Survey on Vision-Language-Action Models for Autonomous Driving". The article divides the development of VLA into four stages: Pre-VLA (VLM as explainer), Modular VLA, End-to-end VLA and Augmented VLA, clearly showing the characteristics of VLA in different stages and the gradual development process of VLA.
There are over 100 robot VLA models, constantly exploring in different paths
Compared with the application of VLA large models in automobiles, which have tens of billions of parameters and nearly 1,000 TOPS of computing power, AI computing chips in the robotics field are still optional, and the number of parameters in training data sets is mostly between 1 million and 3 million. There are also controversies over the mixed use of real data and simulated synthetic data and routes. One of the reasons is that the number of cars on the road is hundreds of millions, while the number of actually deployed robots is very small; another important reason is that robot VLA models focus on the exploration of the microcosmic world. Compared with the grand automotive world model, the multimodal perception of robot application scenarios is richer, the execution actions are more complex, and the sensor data is more microscopic.
There are more than 100 VLA models and related data sets in the robotics field, and new papers are constantly emerging, with various teams exploring in different paths.
Exploration 1: VTLA framework integrating tactile perception
In May 2025, research teams from the Institute of Automation of the Chinese Academy of Sciences, Samsung Beijing Research Institute, Beijing Academy of Artificial Intelligence (BAAI), and the University of Wisconsin-Madison jointly released a paper on VTLA related to insertion manipulation tasks. The research shows that the integration of visual and tactile perception is crucial for robots to perform tasks with high precision requirements when performing contact-intensive operation tasks. By integrating visual, tactile and language inputs, combined with a time enhancement module and a preference learning strategy, VTLA has shown better performance than traditional imitation learning methods and single-modal models in contact-intensive insertion tasks.
Exploration 2: VLA model supporting multi-robot collaborative operation
In February 2025, Figure AI released the Helix general Embodied AI model. Helix can run collaboratively on humanoid robots, enabling two robots to cooperate to solve a shared, long-term operation task. In the video demonstrated at the press conference, Figure AI's robots showed a smooth collaborative mode in the operation of placing fruits: the robot on the left pulled the fruit basin over, the robot on the right put the fruits in, and then the robot on the left put the fruit basin back to its original position.
Figure AI emphasized that this is only touching "the surface of possibilities", and the company is eager to see what happens when Helix is scaled up 1000 times. Figure AI introduced that Helix can run completely on embedded low-power GPUs and can be commercially deployed immediately.
Exploration 3: Offline end-side VLA model in the robotics field
In June 2025, Google released Gemini Robotics On-Device, a VLA multimodal large model that can run locally offline on embodied robots. The model can simultaneously process visual input, natural language instructions, and action output. It can maintain stable operation even in an environment without a network.
It is particularly worth noting that the model has strong adaptability and versatility. Google pointed out that Gemini Robotics On-Device is the first robot VLA model that opens the fine-tuning function to developers, enabling developers to conduct personalized training on the model according to their specific needs and application scenarios.
VLA robots have been applied in a large number of automobile factories
When the macro world model of automobiles is integrated with the micro world model of robots, the real era of Embodied AI will come.
When Embodied AI enters the stage of VLA development, automobile enterprises have natural first-mover advantages. Tesla Optimus, XPeng Iron, and Xiaomi CyberOne robots have fully learned from their rich experience in intelligent driving, sensor technology, machine vision and other fields, and integrated their technical accumulation in the field of intelligent driving. XPeng Iron robot is equipped with XPeng Motors' AI Hawkeye vision system, end-to-end large model, Tianji AIOS and Turing AI chip.
At the same time, automobile factories are currently the main application scenarios for robots. Tesla Optimus robots are currently mainly used in Tesla's battery workshops. Apptronik cooperates with Mercedes-Benz, and Apollo robots enter Mercedes-Benz factories to participate in car manufacturing, with tasks including handling, assembly and other physical work. At the model level, Apptronik has established a strategic cooperation with Google DeepMind, and Apollo has integrated Google's Gemini Robotics VLA large model.
On July 18, UBTECH released the hot-swappable autonomous battery replacement system for the humanoid robot Walker S2, which enables Walker S2 to achieve 3-minute autonomous battery replacement without manual intervention.
According to public reports, many car companies including Tesla, BMW, Mercedes-Benz, BYD, Geely Zeekr, Dongfeng Liuzhou Motor, Audi FAW, FAW Hongqi, SAIC-GM, NIO, XPeng, Xiaomi, and BAIC Off-Road Vehicle have deployed humanoid robots in their automobile factories. Humanoid robots such as Figure AI, Apptronik, UBTECH, AI2 Robotics, and Leju are widely used in various links such as automobile and parts production and assembly, logistics and transportation, equipment inspection, and factory operation and maintenance. In the near future, AI robots will be the main "labor force" in "unmanned factories".
New Energy Vehicle 800-1000V High-Voltage Architecture and Supply Chain Research Report, 2025
Research on 800-1000V Architecture: to be installed in over 7 million vehicles in 2030, marking the arrival of the era of full-domain high voltage and megawatt supercharging.
In 2025, the 800-1000V h...
Foreign Tier 1 ADAS Suppliers Industry Research Report 2025
Research on Overseas Tier 1 ADAS Suppliers: Three Paths for Foreign Enterprises to Transfer to NOA
Foreign Tier 1 ADAS suppliers are obviously lagging behind in the field of NOA.
In 2024, Aptiv (2.6...
VLA Large Model Applications in Automotive and Robotics Research Report, 2025
ResearchInChina releases "VLA Large Model Applications in Automotive and Robotics Research Report, 2025": The report summarizes and analyzes the technical origin, development stages, application cases...
OEMs’ Next-generation In-vehicle Infotainment (IVI) System Trends Report, 2025
ResearchInChina releases the "OEMs’ Next-generation In-vehicle Infotainment (IVI) System Trends Report, 2025", which sorts out iterative development context of mainstream automakers in terms of infota...
Autonomous Driving SoC Research Report, 2025
High-level intelligent driving penetration continues to increase, with large-scale upgrading of intelligent driving SoC in 2025
In 2024, the total sales volume of domestic passenger cars in China was...
China Passenger Car HUD Industry Report, 2024
ResearchInChina released the "China Passenger Car HUD Industry Report, 2025", which sorts out the HUD installation situation, the dynamics of upstream, midstream and downstream manufacturers in the HU...
ADAS and Autonomous Driving Tier 1 Suppliers Research Report, 2025 – Chinese Companies
ADAS and Autonomous Driving Tier 1 Suppliers Research Report, 2025 – Chinese Companies
Research on Domestic ADAS Tier 1 Suppliers: Seven Development Trends in the Era of Assisted Driving 2.0
In the ...
Automotive ADAS Camera Report, 2025
①In terms of the amount of installed data, installations of side-view cameras maintain a growth rate of over 90%From January to May 2025, ADAS cameras (statistical scope: front-view, side-view, surrou...
Body (Zone) Domain Controller and Chip Industry Research Report,2025
Body (Zone) Domain Research: ZCU Installation Exceeds 2 Million Units, Evolving Towards a "Plug-and-Play" Modular Platform
The body (zone) domain covers BCM (Body Control Module), BDC (Body Dom...
Automotive Cockpit Domain Controller Research Report, 2025
Cockpit domain controller research: three cockpit domain controller architectures for AI Three layout solutions for cockpit domain controllers for deep AI empowerment
As intelligent cockpit tran...
China Passenger Car Electronic Control Suspension Industry Research Report, 2025
Electronic control suspension research: air springs evolve from single chamber to dual chambers, CDC evolves from single valve to dual valves
ResearchInChina released "China Passenger Car Elect...
Automotive XR Industry Report, 2025
Automotive XR industry research: automotive XR application is still in its infancy, and some OEMs have already made forward-looking layout
The Automotive XR Industry Report, 2025, re...
Intelligent Driving Simulation and World Model Research Report, 2025
1. The world model brings innovation to intelligent driving simulation
In the advancement towards L3 and higher-level autonomous driving, the development of end-to-end technology has raised higher re...
Autonomous Driving Map (HD/LD/SD MAP, Online Reconstruction, Real-time Generative Map) Industry Report 2025
Research on Autonomous Driving Maps: Evolve from Recording the Past to Previewing the Future with "Real-time Generative Maps"
"Mapless NOA" has become the mainstream solution for autonomous driving s...
End-to-End Autonomous Driving Research Report, 2025
End-to-End Autonomous Driving Research: E2E Evolution towards the VLA Paradigm via Synergy of Reinforcement Learning and World Models??The essence of end-to-end autonomous driving lies in mimicking dr...
Research Report on OEMs and Tier1s’ Intelligent Cockpit Platforms (Hardware & Software) and Supply Chain Construction Strategies, 2025
Research on intelligent cockpit platforms: in the first year of mass production of L3 AI cockpits, the supply chain accelerates deployment of new products
An intelligent cockpit platform primarily r...
Automotive EMS and ECU Industry Report, 2025
Research on automotive EMS: Analysis on the incremental logic of more than 40 types of automotive ECUs and EMS market segments
In this report, we divide automotive ECUs into five major categories (in...
Automotive Intelligent Cockpit SoC Research Report, 2025
Cockpit SoC research: The localization rate exceeds 10%, and AI-oriented cockpit SoC will become the mainstream in the next 2-3 years
In the Chinese automotive intelligent cockpit SoC market, althoug...