#reinforcement learning with calibration rewards