You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)
Code for the paper accepted at COLING 2025: "Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem" (Wang et al., 2025)
Code for the paper accepted at ACL 2025 Findings: "FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation" (Wang et al., 2025)
The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models"