Postdoc, Department of Computer Science, ETHZ - ETH Zurich
1 paper at NeurIPS 2025
We show in theory and practice that by allowing non-linear transformations in causal abstraction, any neural network (even random ones) can be perfectly aligned to any algorithm, rendering this interpretability approach meaningless if unconstrained.