25 05 | Xiaobao Wu

7 papers accepted to ACL 2025.
We release a survey on learning from rewards, including reinforcement learning (in RLHF, DPO, and GRPO), reward-guided decoding, and post-hoc correction.
One paper accepted to ICML 2025.