[Submitted on 6 Nov 2025 (v1), last revised 30 Nov 2025 (this version, v4)] View a PDF of the paper titled Addressing divergent representations from causal interventions on neural networks, by Satchel Grant and 3 other authors View PDF HTML (experimental) Abstract:A common approach to mechanistic interpretability is to causally manipulate model representations via targeted interventions in order to understand …
Read More »
Deep Insight Think Deeper. See Clearer