Cockpit Agent Engineering Research: Breakthrough from Digital AI to Physical AI
Cockpit Agent Engineering Research Report, 2025 starts with the status quo of cockpit agents, summarizes the technical roadmap of the R&D and engineering stages and the characteristics of agents from leading OEMs, and predicts the future trends and priorities of cockpit agent application.
Action: Last Mile Mission
Since foundation models were installed in vehicles in 2023, cockpit AI assistants have assumed different tasks at different stage. In 2025, cockpit AI assistants focus on action, which means they "help users get things done" instead of "giving suggestions”, marking an important step in the transformation from "assistants" to "agents".
One typical scenario for cockpit AI assistants in 2025 is ordering food at restaurants:
In 2024, when a user wanted to order coffee, a cockpit AI assistant could only find nearby coffee shops on the map for the user to manually select and navigate to, but the ordering and payment were all fulfilled by the user himself/herself while the AI ??assistant cannot help at all.
By 2025, when a user orders coffee, the cockpit AI assistant will be able to confirm his/her intention and automatically complete a series of operations such as ordering and payment, without the user having to worry about it, thus optimizing the user experience.
The entire process involves technologies related to long-term memory, tool calling, and multi-agent collaboration.
1.Case 1: Tool Calling
In early 2024, OPEN AI's Function Calling was the mainstream technology used by cockpit agents when calling tools, facilitating the direct interaction between a single model and a single tool.
The Model Context Protocol (MCP) introduced by Anthropic in November 2024 addresses the issue of "multi-component collaboration" on basis of Function Calling and improves the application scenarios and efficiency of Function Calling.
In April 2025, Google proposed the A2A (Agent2Agent) protocol to further standardize the communication and collaboration between different agents.
For example, the agent application solution of Lixiang Tongxue in 2025 includes a MCP/A2A technical framework (another framework is CUA):
MCP/A2A: The IVI agent acts as the leader of the multi-agent system (MAS), assigning tasks to third-party agents, which then complete their respective workflows.
CUA (Cockpit Using Agent): The operating system calls a multimodal foundation model to understand instructions/tasks, decomposes and plans them, generates the final action, and calls applets and apps to complete the instructions/tasks. For example, in the payment scenario, after a series of understandings and plans, Lixiang Tongxue calls the API to connect to Alipay's automotive assistant, and uses the relevant applet through Alipay's ecosystem to complete the payment.
During the training process, the team of Lixiang Tongxue uses the MCP management tool service in the Reward module optimization of the agent reinforcement phase, such as using MCP Hub to provide a catalog of callable tool resources for training tasks and business requests.
In the next phase, Lixiang Tongxue plans to strengthen its multimodal capabilities and implement COA (Chain of Action), which means that the same model continuously thinks about how to call external tools to solve problems and take action, further improving the synergy between different modules for tool calling, reasoning and action.
2.Case 2: GUI Agent
A GUI agent (graphical user interface agent) is a specific LLM agent which processes user commands or requests in natural language, understands the current state of the GUI through screenshots or UI element trees, and performs actions that simulate human-computer interaction, thus spanning various software interfaces.
A GUI agent typically includes modules such as operating environment, prompt project, model inference, action, and memory.
GUI agent technology is still far from fully mature, but some OEMs, including Li Auto, Geely, and Xiaomi, have already started to deploy it.
In the ordering scenario aforementioned, Lixiang Tongxue leverages GUI agent technology when selecting a meal package, so that it can operate the screen components automatically without user intervention. The team of Lixiang Tongxue has pointed out that the operation accuracy of the GUI agent will also affect the final action of the CUA framework (because the payment process requires scanning screenshots, which involves the GUI agent). If the accuracy is too low, it may be difficult to guarantee a stable experience for complex tasks such as registering for parking and paying parking fees.
For example, Xiaomi has launched a GUI agent framework “BTL-UI”, which uses a Group Relative Policy Optimization (GRPO) algorithm within a Markov decision process (MDP). The agent should receive the current screen state, user commands, and historical interaction records at each time step, and then output a structured BTL response, converting the input multimodal information into a comprehensive output that includes visual attention zones, reasoning processes, and command execution.
Its implementation methods and core technologies include:
Bionic interaction framework: Based on the BTL-UI model, it simulates human visual attention allocation (blinking), logical reasoning (thinking), and precise execution (action), supporting complex multi-step tasks (such as cross-application calls and multimodal interactions).
Automated data generation: It automatically analyzes screenshots, identifies the interface elements most relevant to user commands, and generates high-quality attention annotations for these zones.
BTL reward mechanism: It meticulously evaluates each cognitive stage in between, checking whether the AI ??correctly identifies the relevant interface elements, performs reasonable logical reasoning, and generates accurate operation instructions.
OEMs are currently transitioning from L2 reasoners to L3 agents, with L3 further divided into four stages.
According to OPEN AI's definition of AGI, Chinese OEMs are currently in the process of transitioning from L2 reasoners to L3 agents. At each different stage, different problems should be solved, with corresponding characteristics:
At present, most OEMs' cockpit AI assistants have delivered "professional services" to a certain extent. The next goal is to achieve "emotional resonance" and overcome the hurdle of "proactive prediction".
For "emotional resonance", NIO offers "Nomi" as a leading player.
In 2025, most AI assistants' emotional chats are implemented primarily through tone changes simulated by TTS technology, terminology from the knowledge base (such as colloquial interjections), and preset emotional scenario workflows. Compared to other cockpit agents, Nomi has two unique advantages:
1.Physical shell: Nomi can materialize more than 200 dynamic expressions through its shell "Nomi Mate" (upgraded to version 3.0 as of November 2025), giving emotional value in the real world. For example, when Nomi interacts with people via voice, it simulates the head movements that occur when people are talking to each other, and simulates the movement of a person's head turning towards the source of a sound when they hear a sound, thus achieving an arc-shaped head turning trajectory.
2.Emotional settings:
In terms of architecture, a dedicated "emotion engine" module is set up. Through three sub-modules, namely "contextual intelligence", "personalized intelligence" and "emotional expression", it uses voice, vision and multimodal perception technologies to achieve contextual arbitration, derive a series of understandings of the current situation, and realize natural human-like reactions in emotional scenarios.
In terms of settings, Nomi can have a personality. Based on the settings, it can perform search associations through a streaming prediction model similar to GPT, exhibiting unique situational responses and providing a personalized experience for each user (such as simulating multiple MBTI personalities, in contrast to Lixiang Tongxue set as ENFJ).
After achieving "proactive prediction," cockpit agents make a breakthrough from digital AI to physical AI.
Starting from L3.5+, generalization has become one of the limiting factors for agents' ability to flexibly cope with multi-scenario tasks. To improve generalization in different scenarios, agents should not only learn policies (what actions to take in a certain state), but also know about dynamic environmental models (how the world will change after performing an action) to make predictions in direct interaction with the environment.
To avoid limitations caused by the shortage of high-quality data, one solution is to learn in a real physical environment to achieve a breakthrough from digital AI to physical AI.
For example, the team of Lixiang Tongxue has found that the effect of data on improving the model's capabilities decreases after using massive Internet data for training the base model, namely the marginal benefit of scaling law in model pre-training declines.
Therefore, the team of Lixiang Tongxue has changed the training method for the next stage. it will focus on the interaction between the model and the physical world. Through reinforcement learning, the model will judge the correctness of the thinking process and accumulate experience and data in the interaction with the environment.
Fei-Fei Li's team from World Labs has proposed "augmented interactive agents," which feature multimodal capabilities with "cross-reality-agnostic" integration and incorporate an emergent mechanism.
In training intelligent agents, Fei-Fei Li's team has introduced an "in-context prompt" or "implicit reward function" to capture key features of expert behavior. The intelligent agents can be trained by physical world behavior data learned from expert demonstrations for task execution. The data is collected by gathering expert demonstrations in the physical world in the form of "state-action pairs".
In 2025, most OEMs chose a multi-agent approach to build their cockpit AI systems. Multi-agent collaboration is also one of the ways to improve the generalization of agents. Through "domain specialization + scenario linkage + group learning", the generalization limitations of existing agents can be broken through from multiple dimensions.
For example, GAC's "Beibi” agent can recognize intent in complex scenarios through multi-agent collaboration based on foundation model intent recognition, tackling the problems of vertical agents like "lack of unified interaction entry and inefficient collaboration". It eliminates the need for users to operate multiple agents separately (such as adjusting navigation and air conditioning individually), thus improving collaboration efficiency. Its principles include:
Build the core intelligent agent: Fine-tune the pre-trained language model using a pre-set dataset related to automotive scenarios (such as vehicle control, navigation, and other instruction records) to obtain an intent recognition model. Then, build an "intent understanding intelligent agent" based on this model, while adding a caching service to improve response speed.
Parse user intent: Receive user commands (such as voice or touch commands), and infer the intent recognition result (including 1-3 intents and their corresponding confidence scores, e.g., "Find a gas station" confidence score 0.85, "Adjust temperature" confidence score 0.9) from the intent understanding agent, and cache the commands and results.
Call collaborative agents: Make collaborative decisions based on the current scenario (such as driving status, weather), call on target agents related to the intent (such as navigation and vehicle control agents) to work together, and receive the action results of each agent.
Arbitrate, feed back and enforce: Arbitrate based on historical confidence scores (the past success rate of the agents) and the current action result; arbitrate based on the intent recognition model when there are no historical scores, and finally feed back the result to the actuation system (such as the IVI or voice broadcast) to complete the operation.
Cockpit Agent Engineering Research Report, 2025
Cockpit Agent Engineering Research: Breakthrough from Digital AI to Physical AI
Cockpit Agent Engineering Research Report, 2025 starts with the status quo of cockpit agents, summarizes the technical ...
Prospective Study on L3 Intelligent Driving Technology of OEMs and Tier 1 Suppliers, 2025
L3 Research: The Window of Opportunity Has Arrived - Eight Trends in L3 Layout of OEMs and Tier 1 Suppliers
Through in-depth research on 15 OEMs (including 8 Chinese and 7 foreign OEMs) and 9 Tier 1 ...
China Commercial Vehicle IoV and Intelligent Cockpit Industry Research Report 2025
Commercial Vehicle IoV and Cockpit Research: The Third Wave of Passenger Car/Commercial Vehicle Technology Integration Arrives, and T-Box Integrates e-Call and 15.6-inch for Vehicles
I. The third wav...
Intelligent Vehicle Electronic and Electrical Architecture (EEA) and Technology Supply Chain Construction Strategy Research Report, 2025
E/E Architecture Research: 24 OEMs Deploy Innovative Products from Platform Architectures to Technical Selling Points
According to statistics from ResearchInChina, 802,000 passenger cars with domain...
Research Report on Intelligent Vehicle Cross-Domain Integration Strategies and Innovative Function Scenarios, 2025
Cross-Domain Integration Strategy Research: Automakers' Competition Extends to Cross-Domain Innovative Function Scenarios such as Cockpit-Driving, Powertrain, and Chassis
Cross-domain integration of ...
China Autonomous Driving Data Closed Loop Research Report, 2025
Data Closed-Loop Research: Synthetic Data Accounts for Over 50%, Full-process Automated Toolchain Gradually Implemented
Key Points:From 2023 to 2025, the proportion of synthetic data increased from 2...
Automotive Glass and Smart Glass Research Report, 2025
Automotive Glass Report: Dimmable Glass Offers Active Mode, Penetration Rate Expected to Reach 10% by 2030
ResearchInChina releases the Automotive Glass and Smart Glass Research Report, 2025. This r...
Passenger Car Brake-by-Wire (BBW) Research Report, 2025
Brake-by-Wire: EHB to Be Installed in 12 Million Vehicles in 2025
1. EHB Have Been Installed in over 10 Million Vehicles, A Figure to Hit 12 Million in 2025.
In 2024, the brake-by-wire, Electro-Hydr...
Autonomous Driving Domain Controller and Central Computing Unit (CCU) Industry Report, 2025
Research on Autonomous Driving Domain Controllers: Monthly Penetration Rate Exceeded 30% for the First Time, and 700T+ Ultrahigh-compute Domain Controller Products Are Rapidly Installed in Vehicles
L...
China Automotive Lighting and Ambient Lighting System Research Report, 2025
Automotive Lighting System Research: In 2025H1, Autonomous Driving System (ADS) Marker Lamps Saw an 11-Fold Year-on-Year Growth and the Installation Rate of Automotive LED Lighting Approached 90...
Ecological Domain and Automotive Hardware Expansion Research Report, 2025
ResearchInChina has released the Ecological Domain and Automotive Hardware Expansion Research Report, 2025, which delves into the application of various automotive extended hardware, supplier ecologic...
Automotive Seating Innovation Technology Trend Research Report, 2025
Automotive Seating Research: With Popularization of Comfort Functions, How to Properly "Stack Functions" for Seating?
This report studies the status quo of seating technologies and functions in aspe...
Research Report on Chinese Suppliers’ Overseas Layout of Intelligent Driving, 2025
Research on Overseas Layout of Intelligent Driving: There Are Multiple Challenges in Overseas Layout, and Light-Asset Cooperation with Foreign Suppliers Emerges as the Optimal Solution at Present
20...
High-Voltage Power Supply in New Energy Vehicle (BMS, BDU, Relay, Integrated Battery Box) Research Report, 2025
The high-voltage power supply system is a core component of new energy vehicles. The battery pack serves as the central energy source, with the capacity of power battery affecting the vehicle's range,...
Automotive Radio Frequency System-on-Chip (RF SoC) and Module Research Report, 2025
Automotive RF SoC Research: The Pace of Introducing "Nerve Endings" such as UWB, NTN Satellite Communication, NearLink, and WIFI into Intelligent Vehicles Quickens
RF SoC (Radio Frequency Syst...
Automotive Power Management ICs and Signal Chain Chips Industry Research Report, 2025
Analog chips are used to process continuous analog signals from the natural world, such as light, sound, electricity/magnetism, position/speed/acceleration, and temperature. They are mainly composed o...
Global and China Electronic Rearview Mirror Industry Report, 2025
Based on the installation location, electronic rearview mirrors can be divided into electronic interior rearview mirrors (i.e., streaming media rearview mirrors) and electronic exterior rearview mirro...
Intelligent Cockpit Tier 1 Supplier Research Report, 2025 (Chinese Companies)
Intelligent Cockpit Tier1 Suppliers Research: Emerging AI Cockpit Products Fuel Layout of Full-Scenario Cockpit Ecosystem
This report mainly analyzes the current layout, innovative products, and deve...