Provable Gradient Editing of Deep Neural Networks

Machine Learning Deep Neural Network Provable Editing Explainable AI Interpretability Grad-CAM Integrated Gradients

Abstract

In explainable AI, DNN gradients are used to interpret the prediction; in safety-critical control systems, gradients could encode safety constraints; in scientific-computing applications, gradients could encode physical invariants. While recent work on provable editing of DNNs has focused on input-output constraints, the problem of enforcing hard constraints on DNN gradients remains unaddressed.

We present ProGrad, the first efficient approach for editing the parameters of a DNN to provably enforce hard constraints on the DNN gradients. Given a DNN

N

with parameters

θ

, and a set

S

of pairs

(x, Q)

of input

x

and corresponding linear gradient constraints

Q

, ProGrad finds new parameters

θ^{'}

such that

⋀_{(x, Q) \in S} \frac{\partial}{\partial x} N (x; θ^{'}) \in Q

while minimizing the changes

∥ θ^{'} - θ ∥

The key contribution is a novel conditional variable gradient of DNNs, which relaxes the NP-hard provable gradient editing problem to a linear program (LP), enabling ProGrad to use an LP solver to efficiently and effectively enforce the gradient constraints.

We experimentally evaluated ProGrad via enforcing

hard Grad-CAM constraints on ImageNet ResNet DNNs;
hard Integrated Gradients constraints on Llama 3 and Qwen 3 LLMs;
hard gradient constraints in training a DNN to approximate a target function as a proxy for safety constraints in control systems and physical invariants in scientific applications.

The results highlight the unique capability of ProGrad in enforcing hard constraints on DNN gradients.