Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

Maria Florina Balcan, Anh Tuan Nguyen, Dravyansh Sharma

CMU· Toyota Technological Institute at Chicago

Data-driven algorithm design generalization guarantee PAC-learnable bound provable hyperparameter tuning

Abstract

Modern machine learning algorithms, especially deep learning-based techniques, typically involve careful hyperparameter tuning to achieve the best performance. Despite the surge of intense interest in practical techniques like Bayesian optimization and random search-based approaches to automating this laborious and compute-intensive task, the fundamental learning-theoretic complexity of tuning hyperparameters for deep neural networks is poorly understood.

Inspired by this glaring gap, we initiate the formal study of hyperparameter tuning complexity in deep learning through a recently introduced data-driven setting. We assume that we have a series of learning tasks, and we have to tune hyperparameters to do well on average over the distribution of tasks.

A major difficulty is that the utility function as a function of the hyperparameter is very volatile, and furthermore, it is given implicitly by an optimization problem over the model parameters. To tackle this challenge, we introduce a new technique to characterize the discontinuities and oscillations of the utility function on any fixed problem instance as we vary the hyperparameter; our analysis relies on subtle concepts, including tools from algebraic geometry, differential geometry, and constrained optimization. We use this to show that the learning-theoretic complexity of the corresponding family of utility functions is bounded.

We instantiate our results and provide sample complexity bounds for concrete applications—tuning a hyperparameter that interpolates neural activation functions and setting the kernel parameter in graph neural networks.