A2US#model#policy#reinforcement#reasoning#r1#rewardDeepSeek R1 Explained to your grandma0CC posted on 2025/01/270ShareVideo vocabulary