# [NeurIPS 2019 Outstanding New Directions Paper Award] Nagarajan & Kolter on Uniform Convergence

This blogs features NeurIPS 2019 Outstanding New Directions Paper Award winners, Vaishnavh Nagarajan and J. Zico Kolter. They explain how negative results showing that many existing bounds on the performance of deep learning algorithms don’t do what they claim.

By
on
October 21, 2020
Category:
Research Spotlights

This episode features NeurIPS 2019 Outstanding New Directions Paper Award winners, Vaishnavh Nagarajan and J. Zico Kolter from Carnegie Mellon University, co-authors of “Uniform convergence may be unable to explain generalization in deep learning”. They shared surprising results from their recent theoretical analysis and what it implies for deep learning research.

This award is introduced this year by the Outstanding Paper Committee to "highlight work that distinguished itself in setting a novel avenue for future research".

As the committee states: " The paper presents what are essentially negative results showing that many existing (norm based) bounds on the performance of deep learning algorithms don’t do what they claim. They go on to argue that they can’t do what they claim when they continue to lean on the machinery of two-sided uniform convergence. While the paper does not solve (nor pretend to solve) the question of generalization in deep neural nets, it is an instance of the fingerpost’’ (to use Francis Bacon’s phrase) pointing the community to look in a different place."

Vaishnavh Nagarajan is a fifth-year Ph.D. student working on theoretically understanding generalization in deep learning, in both supervised and unsupervised learning setups. Zico Kolter is an associate professor at CMU and also serves as Chief Scientist of AI Research for the Bosch Center for AI in Pittsburgh.

### Paper At A Glance

Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot explain generalization'' -- even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small ϵϵ in our settings, we show that applying (two-sided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than 1−ϵ1−ϵ. Through these findings, we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. [Poster] [presentation slides]

### Full Transcripts

Robin.ly Host - Margaret Laffan

I'm here at NeurlPS 2019 with Vaishnavh Nagarajan and Zico Kolter. Vaishnavh, you won the outstanding New Directions paper award and your paper name is "Uniform convergence may be unable to explain generalization in deep learning". Can you please tell us a bit more about this?

Vaishnavh Nagarajan

Yeah, sure. In this paper, we studied one of the biggest open challenges in deep learning theory, which is called the "generalization puzzle". A lot of deep network models have so many more parameters (than training data points), and standard intuition, like classical learning theory suggests that these sort of models should not perform very well on unseen data. However, we've also observed that in practice, these models have achieved state-of-the-art generalization performance on test data. Understanding this counterintuitive behavior is the generalization puzzle. And a lot of theoretical work has tried to understand this puzzle using a particular tool called the "uniform convergence". That's the phrase in the title. However, even though there's been a lot of work, we still haven't struck the exact answer to this puzzle, we're still struggling to answer this puzzle. In this work, what we do is, we take a step back, and we say that this tool of uniform convergence may not really help us crack the puzzle. So that's the high-level message that we should perhaps try to use other kinds of mathematical tools that are not just uniform convergence, but something else.

Robin.ly Host - Margaret Laffan

When you think about the other mathematical tools, what comes to mind?

Vaishnavh Nagarajan

There are tools, such as algorithmic stability for example. But then, it's still not clear if those will work, either. I personally think we might have to come up with a completely different tool from scratch, perhaps using some of the counterexamples that we provide in our paper. So we provide some examples where this tool fails where uniform convergence fails. So I believe that we could perhaps work from that example and come up with a completely different tool, or maybe even augment some existing tools with more clever modifications.

Robin.ly Host - Margaret Laffan

Zico, from your perspective, why does this deserve the outstanding paper awards? What's the biggest contribution of this research?

J Zico Kolter

One thing that I'm really excited about in this research is that, it's actually fundamentally a negative result. So we're seeing that something doesn't work, which is great to see in AI, because right now AI is a time where everything seems to work. Everything is working amazingly well. But as much as I was saying, we still don't really understand why these things work at a fundamental mathematical level. So a paper that comes along and says, what we thought was maybe the key to figuring this out, is really not going to prove, at least give us the full answer is really exciting. It's really exciting to have these fundamentally negative results because ultimately, negative results drive the community forward. Knowing which things work, which things don't work, plants a signpost that says, it can direct the community into a different direction. And so what I'm most excited about this paper is the possibility of changing directions and changing how we think about this problem in a way that hopefully will change our perspective of how we understand deep learning.

Robin.ly Host - Margaret Laffan

In terms of the research that you did there, is there any limitations to your current research? We're also curious around your future research.

Vaishnavh Nagarajan

Sure. There are two limitations that I can highlight. One is that we show this as a negative result in certain settings, so you cannot immediately say that this would fail in all settings. But the hope is that this intuition suggests that this tool may not work in other general settings either. The other limitation is that, we don't actually solve this puzzle or provide other alternatives, we provide a negative result about an existing tool. So in the future, it would be great to work on using the intuition from our paper to actually develop new learning tools to explain the puzzle. For future work, I've been working on trying to go beyond this tool of uniform convergence to understand the generalization puzzle.

Robin.ly Host - Margaret Laffan

Another curiosity question here: Where would you typically see examples of business applications of generalization?

Vaishnavh Nagarajan

The idea of generalization is one of the most fundamental objectives of doing machine learning or deep learning. You train a model on a training data set, and you expect it to perform well on unseen data. That's the main goal, and we've seen that deep learning is able to manage this somehow. In order to build better and better deep learning algorithms, we need to understand why they really do as well as they do when it comes to generalization. So that's where I think this line of work including our work will be important. It will add to our fundamental understanding of how the systems work and help us improve it.

Robin.ly Host - Margaret Laffan

Zico, from the advisor role that you play at Carnegie Mellon, what are you seeing from your students in terms of the progression maturity of the research?

J Zico Kolter

Sorry, can you elaborate on that a little more? In terms of maturity of the research?

Robin.ly Host - Margaret Laffan

How are you seeing the research mature from your students?

J Zico Kolter

I think that one of the most exciting things about deep learning - and I hope this answers your question - is that there is this wide spectrum of very theoretical work, which frankly, actually has been lacking in deep learning. This work and others in this field are building upon that to very, very applied work. These things are all really intertwined in machine learning, and really have been for a very long time. So it's been true of machine learning very generally, that the theoretical tools that we develop can inform applications and this whole thing can come together in a really amazing way that drives the field forward. So what I'm excited about having a group working on these topics is that some students can focus on much more theoretical questions, can analyze the mathematical principles behind deep learning, whereas some can focus on much more applied questions, how can we apply these techniques and AI to problems like smart energy systems? I have a student working on seeing if you could apply this in the nuclear fusion of all things. There're really unlimited ideas and concepts here. And in my role at the Bosch Center for AI, we're very actively looking into applications of AI, to make AI more robust and deployable in the real world. So what I'm excited about in terms of seeing this field mature, is seeing the more theoretical aspects inform and ultimately build, and lead to more applied and more practical applications for the work, as well as see those successes, feedback and inspire new theory.

Robin.ly Host - Margaret Laffan

Vaishnavh, what got you excited about AI, deep learning, machine learning?

Vaishnavh Nagarajan

My initial experience with research was in classical learning theory and the fundamentals of machine learning theory, but at the same time, deep learning was growing more and more popular. What really attracted me towards deep learning was the vacuum in the theory at that point, and at the same time, deep learning has had so much impact. So, if I were to work on the theoretical aspects of deep learning, it would not only cater to my passion towards theory, but also it would help me have impact on the field.

Robin.ly Host - Margaret Laffan

Zico, my final question for you is, from a research perspective again, where do you see this evolving in the next number of years?

J Zico Kolter

I think it's always hard to make predictions about the future of AI. I was entrenched in the field of machine learning when the deep learning revolution happened, and I never would have dreamed that we would have reached this level that we have right now. So I want to preface all predictions by saying that I don't have the most established track record here.

Having said that, if I were to hazard a guess as to where the field is moving, I think we're moving to a point where the tools and the lessons that we're learning from deep learning are more and more applicable, but also can be applied to more and more domains that are more structured, and really more influenced by classical programming. So we can stop thinking about deep learning as kind of this magic box of big operators that are linear operators mixed with nonlinear operators and iterates a number of times, and start thinking about machine learning more as: if there were general programs that just have unknown parameters, how can we learn those parameters? How can we integrate structured programming with some metadata to achieve the best of both worlds when it comes to the structure interoperability of classical programming, and the flexibility and end-to-end data-driven nature of machine learning? I think these two things really can come together in future work in machine learning. That's what I'm most excited about pushing forward.

Robin.ly Host - Margaret Laffan

For you Vaishnavh, how was the response being to your paper from your peer group?

Vaishnavh Nagarajan

I've had a lot of exciting discussions with people who have read this paper, some who had certain disagreements and some who were quite surprised with what we showed. Many of us have been working on this particular tool to understand -- the uniform convergence-based tools -- to understand generalization, including myself. I have published papers using those tools. But now that we have a negative result, it can be a bit confusing. On one hand, we thought this technique will help us understand generalization, but now there seems to be a little bit of suspicion. So I've had a lot of exciting discussions. And a lot of people have also appreciated the fact that this gives a high level idea of where we could head from here. So yeah, I've had a great experience. And also at the poster session, I had a lot of people who don't really work in theory come and try to understand what's going on, which was awesome, trying to explain these theoretical ideas.

Robin.ly Host - Margaret Laffan

Congratulations again for the significant award. Thank you both for joining us today.

Vaishnavh Nagarajan

Thanks a lot.

J Zico Kolter

Thank you very much.