Cutting supply chain AI training time by 400×: A breakthrough in digital twins
Reinforcement learning holds huge promise for supply chains — but until now, training has been painfully slow. Thanks to a new algorithm (Picard Iteration), what once took 10 hours can now be done in just 2 minutes.
In modern supply chains, every decision — where to hold stock, how to route an order, when to replenish — is part of a vast puzzle. Getting these decisions right can mean faster delivery, lower costs, and less waste. But training algorithms to make those decisions efficiently has always faced a bottleneck: simulations take far too long.
The challenge: slow learning system
Companies like Amazon, Walmart, and Alibaba use digital twins, exact virtual replicas of their real-life processes, and are already experimenting with reinforcement learning (RL) to optimize their supply chains. RL algorithms work by simulating countless “what if” scenarios, learning over time which strategies perform best.
The problem? These simulations are inherently sequential: one step must finish before the next begins. For large supply chains, simulating just one month of operations can take hours. Training an algorithm to maturity often requires thousands of such runs. That means weeks or even months before you see usable results — too slow for today’s pace of business.
The breakthrough: Picard Iteration
A new method developed by researchers at Esade, MIT, Columbia, and UBC changes this dynamic. Their algorithm, called Picard Iteration, takes what used to be a serial, step-by-step process and makes it parallel.
Instead of simulating a supply chain as one long chain of events, the method divides the problem into chunks that can be processed at the same time. Each “chunk” makes an informed guess about its neighbors, then updates its guess as new information comes in. After a few rounds, the whole system converges to the same outcome as the slow, sequential method — but dramatically faster.
In tests, Picard Iteration sped up simulations by 400×. What previously took 10 hours can now be done in 2 minutes.
Why this matters for practitioners
For supply chain leaders, speed translates directly into agility. With faster simulations, you can:
- Test more policies: Try out new fulfillment or routing strategies daily, not monthly.
- Adapt quickly: Respond to market shocks, promotions, or disruptions with rapid re-optimization.
- Scale AI: Apply reinforcement learning to real-world supply chain problems that were previously impractical.
The implications go beyond supply chains. Any system where decisions unfold over time — from energy grids to logistics to finance — could benefit.
The road ahead
Digital twins are already transforming how businesses model their operations. By removing the simulation bottleneck, Picard Iteration makes it possible to move from experimentation to execution much faster.
In an environment where companies make thousands of decisions every second, the ability to train decision-making AI 400× faster is not just an efficiency gain — it’s a strategic advantage. Read the full research article here: Speeding up Policy Simulation in Supply Chain RL.
- Compartir en Twitter
- Compartir en Linked in
- Compartir en Facebook
- Compartir en Whatsapp Compartir en Whatsapp
- Compartir en e-Mail
Do you want to receive the Do Better newsletter?
Subscribe to receive our featured content in your inbox.