zhangxy-2019

zhangxy-2019

Achievements

critique-GRPO critique-GRPO Public

[ICML 2026 Spotlight] Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Python 67 3
RetroAgent RetroAgent Public

RETROAGENT: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Python 23 3
sgp-tod sgp-tod Public

Python 14 1
Self-Alignment-for-Factuality Self-Alignment-for-Factuality Public

Python 5 1
Effective-Knowledge-Injection Effective-Knowledge-Injection Public

Python 3 1
DeepRL-Tutorials DeepRL-Tutorials Public

Forked from ucla-rlcourse/DeepRL-Tutorials

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch

Jupyter Notebook