Blog: Stability of Learning Algorithms Part 1.
• Tutorials
This is walk-through on how to derive generalization bounds based on the stability of algorithms. Understanding the proof helps to extend or modify the results according to your needs. For instance, I had a loss function which was unbounded, and since to apply generalization results your loss function must be bounded, I was stuck. So, this my effort to state the proof as clear as possible along with the assumptions needed.
Setup: Let and be a training set . Let be a symmetric learning algorithm i.e. does not depend on the order of the data points in the training set. We define the following two more notations as follows:
Removing data point in the set .
Replacing data point in the set by .
Let be an unknown distribution from which data points are sampled to form a training set . Here, we assume all samples (including the replacement sample) are i.i.d. unless mentioned otherwise. Let denotes the expectation of the function when samples are drawn from the distribution to form . Similarly, let denotes the expectation of the function when is sampled according to the distribution .
We define the generalization error or risk as follows:
Empirical error is defined as follows:
Generalization Bounds based on Stability of Learning Algorithms:
Let’s assume that our algorithm has uniform stability , i.e. it satisfies ,
To derive generalization bounds for uniform stable algorithms, we are going to only use McDiarmid’s inequality. Let be some set and , then inequality is given as (more detail here),
Strategy is to bound using , . We will derive some lemmas that would be helpful to compute variables needed for applying McDiarmid’s inequality.
Since, samples are i.i.d., we have,
Note: introduces (bound on loss) in the equation. If your loss is unbounded then you may be able to bound and just replace by it, in the generalization bound.
Applying McDiarmid’s inequality to derive generalization bounds for uniform stable algorithms:
$$ \mathbf{P}\Bigg( \Big(R(A_S)-R_{emp}(A_S)\Big) - 2\beta \geq \epsilon \Bigg) \leq \underbrace{e^{-\frac{2\epsilon^2}{m(4\beta+\frac{M}{m})^2}}}_{\delta} $$ $$ \implies \mathbf{P}\Bigg( \Big(R(A_S)-R_{emp}(A_S)\Big) \leq 2\beta + (4m\beta+M)\sqrt{\frac{\log \frac{1}{\delta}}{2m}} \Bigg) \geq 1-\delta $$