Autonomous Driving Simulation and World Model Research Report, 2026
Autonomous driving simulation research: "Simulation test + world model"-driven test system has become R&D infrastructure.
The "Autonomous Driving Simulation and World Model Research Report, 2026" mainly focuses on core technologies, industry trends and mainstream solutions in the field of simulation and world models, covering the complete system of simulation testing (X-in-the-loop testing from MIL to VIL, scenario library construction, etc.), as well as the evolution of world model solutions of OEMs/Tier 1 suppliers. It analyzes 14 Chinese and 13 foreign mainstream simulation platforms and world model solution providers, sorts out the synergistic relationship between simulation testing and world models, and demonstrates the core value of world models in data cost reduction, scenario generalization, and decision reasoning by way of research.
The national standard GB/T 47025-2026 Intelligent and Connected Vehicle - Simulation Test Methods and Requirements for Automated Driving Function was released and officially implemented on January 28, 2026. This standard is applicable to Category M and N vehicles with autonomous driving functions or autonomous driving systems, and stipulates the simulation test methods, test requirements and overall criteria for the autonomous driving function. The standard defines a total of 48 special test items in 7 categories. The GB/T 47025-2026 and the released standards, the GB/T 41798-2022 (Intelligent and Connected Vehicles - Track Testing Methods and Requirements for Automated Driving Function) and the GB/T 44719-2024 (Intelligent and Connected Vehicle - Methods and Requirements of Road Test for Automated Driving Functions), constitute a complete verification system of "simulation-field-road" trinity.
In order to speed up the mass production of L3/L4 autonomous vehicles, mature autonomous driving algorithm verification usually follows the golden ratio of "99.9% simulation tests + 0.09% closed field tests + 0.01% public road tests". The recommended national standard GB/T 47025-2026 requires that the error between the sensor model and the actual vehicle should be ≤5%; the consistency between the dynamics model and the actual vehicle should be ≥95%; the behavior of traffic participants should be high-fidelity, etc. This means that the autonomous driving industry has entered a new development stage of compliance access and safety priority. Simulation testing is no longer an auxiliary means for research and development, but has become a legally required link for product access, certification, and safety evidence.
Meanwhile, as a generative AI model, the world model can understand the dynamic laws of the real world (covering physical characteristics and spatial attributes) by building internal representations. It also generates video content with input information such as text, images, videos, and motion data. It is quickly showing great application potential in fields such as autonomous driving and robotics, and is becoming the core technical pillar that drives intelligent systems to leap into high-level perception and decision capabilities.
1. The simulation platform evolves into a “training environment” with high fidelity, physical consistency, and dynamic interaction.
The positioning of the simulation system has been upgraded from a traditional test execution tool to a core data infrastructure supporting algorithm training. The simulation environment is also evolving from visual resemblance to behavioral authenticity, emphasizing physical sensor simulation (such as photons, electrical signals, multi-echo), accurate material properties (such as reflectivity, roughness), vehicle dynamics and traffic flow that comply with physical laws, in a bid to bridge the "Sim-to-Real" gap.
In terms of high fidelity, simulation platform companies are continuing to upgrade their simulation verification capabilities, making high-confidence simulations more detailed. For example, Keymotek's aiSim6 can provide physical sensor simulations, such as cameras (nonlinear response, CMOS noise) and LiDAR (Gaussian rays, multi-echo, weather attenuation), following the ASAM OpenMATERIAL 3D standard and defining precise material physical properties. Furthermore, based on its self-developed PBR Splatting technology, it can dynamically adjust scenario lighting for 3DGS models, dynamically switching lighting conditions such as daytime, dusk, and nighttime on the same road segment, transforming it into a "dynamically configurable training environment" and achieving "physical dynamic neural rendering."
Notably, aiSim 6 applies the Navier-Stokes equations describing fluid motion to environmental particle physics simulations, introducing physical environmental disturbances into the synthetic data link. This allows for realistic simulations of leaf movement caused by vehicle airflow, water splashes from pavements during rain, and the dynamic interaction between manhole cover steam and traffic participants, addressing the shortcomings in physical realism of edge scenarios.
In terms of physical consistency, take the high-fidelity physical simulation of PilotD Technology as an example. The company has independently developed a self-evolving dual-turbine driven data training platform. It uses a high-fidelity world model to generate multi-modal data such as vision and point clouds for closed-loop training of the robot brain. Meanwhile, its data credibility verification technology, namely the "Physical Judge" system, checks the physical rationality of generated data, and performs data screening as well as closed-loop retraining of the world model simultaneously. Based on the self-evolving data dual-turbine, the EAI cerebrum completes fully automatic iterative evolution with the injection of increasingly physically relevant synthetic data, enhancing the algorithm's adaptability and generalization capability in complex real-world scenarios.
The company's self-developed fully physical optical core modeling technology highly restores the optical physicality of data, and uses this to train a multimodal world model data generation architecture with high fidelity in both dynamics and optics, providing AI companies with high-fidelity synthetic data solutions.
In terms of dynamic interaction, for example, SYNKROTRON’s OASIS Traffic solution, a traffic flow synthesis data platform for advanced autonomous driving, is based on real roadside data. It uses AI to generate adversarial traffic flows covering 60 high-interaction scenarios, quantifies hazard levels using TTC/PET, and covers over 30% of long-tail corner cases. It can generate massive dynamic traffic scenario datasets (typical areas, typical traffic scenarios, dynamic participants, natural and confrontational behaviors).
2. The world model transforms from an auxiliary tool to a core foundation.
The world model is committed to internalizing physical laws, such as gravity, collision, and causality, to solve problems with traditional simulation tools such as long-term consistency and interpretability, and to understand the "common sense" of the world. For example, GigaAI’s GigaWorld-1 has excellent physics adherence capabilities and can accurately simulate complex physical interactions such as gravity and collision. Li Auto's MindVLA-o1 uses the native 3D ViT and the predictive latent world model to understand object position relationships and movement patterns in the three-dimensional space structure. It makes use of the world model to generate massive, high-fidelity, and diverse training data to handle the extreme scarcity of real physical interaction data and promote "Sim2Real" migration.
Fusion trend: VLA + world model + reinforcement learning
In the field of autonomous driving, the world model has been upgraded from a single data generator to the core cognition and deduction center of the autonomous driving system, deeply integrated with VLA and reinforcement learning. In algorithm training, VLA is responsible for perception and semantic understanding, the world model for future deduction and prediction, and reinforcement learning for autonomous optimization decision in the virtual world. The three work together. For example,
QCraft's "VLA + world model" unified architecture can not only multiplex end-to-end capabilities that have been verified in millions of mass productions, but can also accurately understand environmental text, complex scenarios and voice commands through language capabilities, achieving triple alignment of model decision, teleoperation and HMI; then with the help of the world prediction model, it can accurately deduce the behavior of traffic participants, road structure changes and dynamic scenario evolution, thereby planning the optimal driving trajectory.
As the "cloud matrix" of VLA 2.0, XPeng X-World is a physical AI simulator that can "think" about driving scenarios. It generates massive scenarios through the world model for training and evaluation, and enables the R&D paradigm to shift from "stacking real vehicle testing" to "stacking computing power training." The model is built based on the leading video generation model WAN 2.2, involving a customized DiT backbone network. Its key innovation lies in the introduction of a perspective-time self-attention mechanism, which forces the model to simultaneously model the temporal dimension and the spatial geometric relationship between the seven surround view camera perspectives during generation, thereby ensuring that the generated virtual world is tightly integrated across perspectives and avoiding objects from "crossing the model" or being misaligned. The underlying layer adopts a 3D causal variational autoencoder (VAE) with high compression ratio, which greatly reduces the computational overhead of multi-channel vide o stream processing and supports long-term modeling.
Core Foundation Cases of World Models:
In the field of autonomous driving, the world model adopts a dual-engine architecture of "cloud training + vehicle reasoning". The cloud is responsible for large-scale training and scenario generation, and the vehicle offers real-time decision and rapid response. For example, on April 24, 2026, Huawei released Qiankun ADS 5, which uses the WEWA 2.0 to improve game-theoretic training and learning efficiency by 10 times, and reduce collision risks by 50%; cloud computing power jumps to the current 60 EFLOPS in 2026, achieving a 21-fold increase from the level in 2023, supporting high-level autonomous driving research and development.
In Huawei's WEWA architecture, the cloud-based WE (World Engine) handles virtual scenario training and model parameter updates. Powered by diffusion generative models, it operates in a mode of simultaneous generation, learning and validation. It can controllably generate various rare scenarios including adjacent vehicle cut-in, dart-out, and sudden braking of leading vehicles, realizing the shift from human training AI to AI self-training. The automotive WA (World Action Model) is in charge of real-time path planning and control.
As a world model, Pony.ai’s PonyWorld 2.0 has self-diagnosis and directional evolution capabilities. AI can independently diagnose shortcomings and proactively guide data collection, becoming the core of the paradigm shift in R&D training. Specifically, PonyWorld 2.0 combines the intention semantics layer of Pony.ai’s automotive model to realize automated traceback and attribution analysis of every driving decision. The system can automatically identify the root cause of the problem and accurately feed the diagnosis results to the model training process.
Based on self-diagnosis results, PonyWorld 2.0 can automatically identify specific scenarios where the accuracy of the world model is insufficient, and proactively generate directional data collection tasks. For example, the system can automatically push instructions: "Please focus on collecting mixed traffic scenario data of non-motorized vehicles and pedestrians under backlight conditions at designated intersections during specific periods." The R&D and testing teams thus collaborate efficiently around the “accuracy requirements” of the world model to achieve directional data collection and model iteration guided by AI.
In the field of EAI, the world model has evolved from a "data engine" to a "cerebrum" or "simulator" of EAI agents, capable of physical deduction, action planning and mission decision.
For example, unlike the traditional WA architecture that relies on inefficient and lengthy video prediction links, the action-centric paradigm of GigaWorld-Policy, the World-Action Model (WAM) developed by GigaAI, breaks the cross-modal coupling bottleneck and delivers a dramatic improvement in inference efficiency via architectural optimization.
It has pioneered the hybrid paradigm model of "Complex Training & Simplified Inference":
During the training phase, GigaWorld-Policy uses a causal mask mechanism to achieve unified modeling of action tokens and future visual tokens, allowing action prediction to fully benefit from the high-density supervision signals provided by future visual dynamics.
During the inference phase, the model completely abandons the video prediction branch, retaining only a lightweight action generation module. It avoids the need to perform inference processes for long sequences of visual tokens, fundamentally circumventing the structural computational redundancy caused by cross-modal architecture coupling in traditional WA models.
Compared to current mainstream WA models (such as Motus and Cosmos Policy), GigaWorld-Policy achieves a 10x improvement in inference speed while maintaining policy quality, truly meeting the real-time requirements of high-frequency closed-loop control for robots. GigaWorld-Policy's average success rate in real-world tasks approaches 85%. Facing strong competitors like Cosmos-Policy, its absolute success rate is raised by more than 30%.
On April 29, 2026, GensPi Technology officially released MotuBrain, a general-purpose world-action model. Positioned as a general-purpose cerebrum for EAI robots, it possesses multi-robot adaptability, multi-task generalization, and long-term task execution capabilities, achieving multi-functionality and multi-type capabilities with a single brain. MotuBrain's core breakthrough lies in its unified modeling of the "world seen" and the "actions to be performed," allowing the robot to not only understand the environment but also predict changes and generate executable action strategies. MotuBrain won the first place on both RoboTwin 2.0 and WorldArena, two authoritative international benchmarks. In WorldArena, MotuBrain ranked first with an overall EWM score of 63.77, and led across multiple key motion dimensions, including Motion Quality, Flow Score, and Motion Smoothness.
3. "Simulation test + world model"-driven test system has become R&D infrastructure.
In the autonomous driving data closed-loop and test system, simulation testing and world models complement each other, offsetting technical shortcomings and complement capability boundaries of each other.
In the fields of autonomous driving and EAI, simulation testing and world models are moving from "separation" to "deep integration." The industry has begun to establish unified standards and promote the construction of an integrated platform of "reconstruction + generation + simulation + training" to enable simulation capabilities from autonomous driving multiplexing to EAI, realizing a broader physical AI ecosystem.
Currently, world models (especially generative world models) have become the core "power plant" of simulation platforms, driving the AI-powered ??automatic generation of simulation scenarios and generating massive and diverse scenarios (especially long-tail and rare scenarios) and high-fidelity sensor data at low cost and with high quality.
On April 24, 2026, 51Sim’s SimOne 4.0 was comprehensively reconstructed and upgraded for the physical AI era, building a "4DGS reconstruction + generative world model" technology base to automatically build interactive, editable, and scalable virtual simulation assets from real vehicle data to achieve large-scale scenario generation. SimOne 4.0 covers the five-core links of data, training, reasoning, verification and delivery, comprehensively helping AI enter the physical world safely and efficiently. Moreover, SimOne4.0 deeply integrates the neural rendering technology solution - NVIDIA Omniverse NuRec at the product level to build a complete data-driven process from real data collection, neural scenario reconstruction to closed-loop simulation execution. In 51Sim’s end-to-end data-driven closed-loop solution, the confidence levels of dynamics, LiDAR, and camera simulations are as high as 95%, 95%, and 90% respectively, and the consistency between simulation testing and field testing reaches 92%.
SimOne 4.0 supports multiple GPU architectures simultaneously. It has achieved systematic adaptation and in-depth optimization with Moore Threads' flagship AI training and inference integrated GPU MTT S5000. The platform enables high-concurrency execution of large-scale 4DGS and world model training tasks, delivering high-quality reconstruction and model training for complex dynamic scenarios within a short time, and driving the continuous evolution of world models and VLA. Up to now, SimOne has empowered more than 100 customers in many EAI fields such as autonomous driving, smart equipment, and robots.
In January 2026, AGIBOT released Genie Sim 3.0, an open-source simulation platform driven by its large language model. Based on NVIDIA Isaac Sim, the platform provides a high-fidelity simulation environment and natural language-driven scenario generation capabilities. It can provide a full-process closed-loop solution from digital asset generation, scenario generalization, data collection to automatic evaluation, significantly speeding up the model training and verification process and reducing dependence on physical hardware.
Highlights of Genie Sim 3.0 include a digital twin-level high-fidelity simulation environment, which pioneeringly deeply integrates three-dimensional reconstruction, visual generation technology and physics engines to achieve the unification of visual realism and physical accuracy. Secondly, it has pioneered natural language-driven scenario generation and generalization. In Genie Sim 3.0, developers can input natural language instructions to drive the platform to automatically generate and generalize thousands of training and test scenarios within minutes, and conduct large-scale parallel training. In addition, the simulation platform also provides a full range of open-source simulation datasets (covering more than 200 tasks with a total duration of tens of thousands of hours) and efficient collection solutions; it has built a three-dimensional evaluation system based on 100,000+ simulation scenarios, etc. It is worth noting that AGIBOT’s world model, Genie Envisioner, is based on NVIDIA Cosmos to realize an end-to-end closed loop from perception to action. GE uses a unified video generative world model as the core to integrate policy learning, evaluation and simulation capabilities into the same framework. AGIBOT provides GE-Sim with powerful general visual and physical prior capabilities by deeply integrating Cosmos Predict 2 into its self-developed action-conditioned world model architecture.
The integration of simulation testing and world models essentially builds a flywheel closed loop of data generation - algorithm training - model verification - continuous evolution. In the two fields of autonomous driving and EAI, the integration paths are highly consistent, both pointing to the ultimate goal of "physical AI": allowing the system to complete closed-loop learning from cognition to action in the virtual world, and then seamlessly migrate to the physical world.