9 papers across 3 sessions
We draw attention to the final-model-only setting for training data attribution, propose a further training gold standard for it, and show how various gradient-based methods approximate further training.
We use influence functions to attribute and suppress training examples that promote toxic behaviors in LLMs.
We propose LayerIF, a framework that employs Influence Functions for LLM layer quality estimation. Our method captures task-specific layer importance and improves both expert allocation in LoRA-MoE and layer-wise sparsity distribution in LLM pruning.
We evaluate Rescaled Influence Functions (RIF), a fast and accurate alternative to traditional influence functions for data attribution, particularly effective in high-dimensional settings where standard influence methods fail.
We scale the influence-function-based data valuation method to recent LLMs and their massive training datasets.
In this paper, we propose a fine-grained influence function framework to trace how training data on SFT phase shapes LLM reasoning in math and code tasks.
This paper introduces distributional training data attribution, a data attribution framework that accounts for stochasticity in deep learning training, enabling a mathematical justification for why influence functions work in this setting.
We balance fairness and predictive accuracy by decomposing feature representations and analyzing their impact using influence functions.
We apply the EKFAC-preconditioner on Neumann series iterations to arrive at an unbiased iHVP approximation for TDA that improves influence function and unrolled differentiation performance.