2

This is part of the exercise 2.13 in the book Foundations of Machine Learning (page 28). You can refer to chapter 2 for the notations.

Consider a family of concept classes $\left\{\mathcal{C}_{s}\right\}_{s}$ where $\mathcal{C}_{s}$ is the set of concepts in $\mathcal{C}$ with size at most $s.$ Suppose we have a PAC-learning algorithm $\mathcal{A}$ that can be used for learning any concept class $\mathcal{C}_{s}$ when $s$ is given. Can we convert $\mathcal{A}$ into a PAC-learning algorithm $\mathcal{B}$ that does not require the knowledge of $s ?$ This is the main objective of this problem.

To do this, we first introduce a method for testing a hypothesis $h,$ with high probability. Fix $\epsilon>0, \delta>0,$ and $i \geq 1$ and define the sample size $n$ by $n=\frac{32}{\epsilon}\left[i \log 2+\log \frac{2}{\delta}\right].$ Suppose we draw an i.i.d. sample $S$ of size $n$ according to some unknown distribution $\mathcal{D}.$ We will say that a hypothesis $h$ is accepted if it makes at most $3 / 4 \epsilon$ errors on $S$ and that it is rejected otherwise. Thus, $h$ is accepted iff $\widehat{R}(h) \leq 3 / 4 \epsilon$

(a) Assume that $R(h) \geq \epsilon .$ Use the (multiplicative) Chernoff bound to show that in that case $\mathbb{P}_{S \sim D^{n}}[h \text { is accepted}] \leq \frac{\delta}{2^{i+1}}$

(b) Assume that $R(h) \leq \epsilon / 2 .$ Use the (multiplicative) Chernoff bounds to show that in that case $\mathbb{P}_{S \sim \mathcal{D}^{n}}[h \text { is rejected }] \leq \frac{\delta}{2^{i+1}}$

(c) Algorithm $\mathcal{B}$ is defined as follows: we start with $i=1$ and, at each round $i \geq 1,$ we guess the parameter size $s$ to be $\widetilde{s}=\left\lfloor 2^{(i-1) / \log \frac{2}{\delta}}\right\rfloor .$ We draw a sample $S$ of size $n$ (which depends on $i$ ) to test the hypothesis $h_{i}$ returned by $\mathcal{A}$ when it is trained with a sample of size $S_{\mathcal{A}}(\epsilon / 2,1 / 2, \widetilde{s}),$ that is the sample complexity of $\mathcal{A}$ for a required precision $\epsilon / 2,$ confidence $1 / 2,$ and size $\tilde{s}$ (we ignore the size of the representation of each example here). If $h_{i}$ is accepted, the algorithm stops and returns $h_{i},$ otherwise it proceeds to the next iteration. Show that if at iteration $i,$ the estimate $\widetilde{s}$ is larger than or equal to $s,$ then $\mathbb{P}\left[h_{i} \text { is accepted}\right] \geq 3 / 8$

Question (a) and (b) are easy to prove, but I have trouble with the question (c). More specifically, I don't know how to use the condition that $\widetilde{s} \geq s$. Can anyone help?