The Most Overestimated Q Value Regularization in High-Dimensional Discrete Action Spaces for Offline Reinforcement Learning
Summary
This paper addresses a problem in offline reinforcement learning (RL, a type of AI training that learns from pre-collected data without needing new real-world interaction) where Q value overestimation (the AI incorrectly thinking certain actions are better than they actually are) causes training problems in robotic tasks with many possible actions. The researchers propose MQR (most overestimated Q value regularization), an algorithm that specifically penalizes the single action with the worst overestimation rather than equally penalizing all actions, and demonstrate it achieves 99.04% success rates in real-world robotic grasping tasks.
Classification
Original source: http://ieeexplore.ieee.org/document/11304592
First tracked: June 8, 2026 at 02:01 AM
Classified by LLM (prompt v3) · confidence: 85%