Max Welling @ University of Amsterdam & Qualcomm: The Future of Distributed Learning
Max Weiling of University of Amsterdam and VP of Technologies at Qualcomm Netherlands goes over some insights on machine learning, computing, AI, and more.
Robin.ly Exclusive Interview at NeurIPS 2019
Prof. Max Welling, research chair in machine learning at the University of Amsterdam and VP of Technologies at Qualcomm Netherlands shared insightful perspectives on distributed machine learning, edge computing, the differences in AI research between Europe and the US, and highlights from 6 research papers accepted at NeurIPS 2019.
Prof. Welling has published over 300 peer-reviewed articles in machine learning, computer vision, deep learning, Bayesian inference, generative modeling, and Graph Convolutional Networks. He received his Ph.D. in quantum physics under the supervision of Nobel laureate Gerard 't Hooft at the Utrecht University. He also serves as a senior fellow at the Canadian Institute for Advanced Research.
NeurIPS 2019 Paper Highlights
Margaret Laffan: We're here with Professor Max Welling from the University of Amsterdam at NeurIPS 2019. Professor, six papers that you advised got accepted by NeurIPS this year. Is there any paper that you would like to highlight to us?
First, I want to say that it's like having six children and asking which of your children is your favorite one. With that caveat, maybe I can say there are two papers which are somewhat on a similar topic. One of them won a competition which was the Fast MRI Competition, where you want to predict a high-resolution MRI image, but from many fewer observations. And this can be used, for instance, for making the length that you stay in an MRI machine a lot shorter, which would cut costs.
But even maybe more exciting, there are now also MRI machines which can do real-time imaging - that’s the goal of these - and then at the same time, do radiation therapy, let’s say for cancer. And if you're moving in that machine that is doing the radiation therapy, then clearly, you want to have a system that moves with the breathing of the patient, because then you'll hit the right tissue that you want to hit. And not hitting it is very bad because you'll hit healthy tissue and not the tissue that you should be killing. So with that in mind, we participated in that competition and we were going to accelerate the MRI reconstruction speed by eight times.
The paper we present here at NeurIPS to achieve that is called “Invertible Recurrent Inference Machine” that does that reconstruction using deep learning technology. Before people used mostly compressed sensing, which is a different technique where you don’t learn from other data. So we use that technology very successfully. The paper is now presented here in NeurIPS.
At the same time, we have another paper which is related to it, which also implements this idea in order to do a task, there is a lot of classical engineering solutions out there which work really well but not as well as one might want it because the human imagination is always a bit limited, I guess. If the complexities of the real world are always bigger than what you can put in a model, but the data does contain that information. And so what we do is, the philosophy is to say, let's not throw away those models, which are being built by humans, but use them and just use deep learning to correct the errors in them. So that's the philosophy that we're implementing. And then there's another paper that looks at graphical models and figures out how to do it in that context.
1. Distributed Machine Learning
Margaret Laffan: What are some of the trends and challenges of machine learning that you've observed over the past 10 years? I know that's a broad time frame, but at a high level, what have you seen over the last 10 years, and how machine learning has progressed during this time?
Clearly, like any other field, machine learning is also subject to fashion, right? And so it's like, there is a five-to-ten cycle where people get really excited about a certain topic, either because the theory is very beautiful or it just works really well. I started in where graphical models and independent component analysis was the talk of the day, and support vector machines, nonparametric methods, Bayesian methods, and nonparametric Bayesian methods, and also about deep learning. So what you see is that the field is subject to these fashions and I think it's fine, because we zoom in on a new, very promising tool and we work it out and we get the most out of it. And then somebody else has to come up with a really smart idea to move the ship again a little bit.
Margaret Laffan: Your previous startup Cypher was acquired by Qualcomm in 2017. Edge computing, of course, is a big topic. What's your opinion of why this is such a big topic? What can we achieve with edge computing?
What we see is that a lot of the data is collected in a very distributed sense. We have sensors in our bodies, in our cars, in our homes, on the streets. So the data is collected very distributedly, and we may not always want to share that data with big corporations, for them to build their services that we use.
One thing that you can do with this federated learning or distributed learning or edge computing, is to keep that data away from the cloud, but keep it on your device or in your home or in your factory or something like that. You don't have to share it. And then you train a model in a distributed way. Every time I have to share something, maybe I do want to send some summary to the cloud or to some other device, I noise it up or I do something so that it becomes privacy safe. So for privacy reasons, it is good to do computation close by and not do everything in the cloud.
There are also reasons like latency. Because if I'm driving a car, and I need to respond immediately to something that's happening on the road, let's say a kid walks on the road, then I don't want to be relying on a bad connection or on a connection that could be bad so that the cloud computes for me, this is a kid you should break. So just for safety reasons and to minimize the latency, you want to do it on the device itself.
Those are two important reasons, I think where you want to do a lot of the computation and storage of your data close by the reliability, the latency, and the privacy.
Margaret Laffan: What do you see in terms of the cost of doing that?
Max Welling: I think it will probably scale well. I'm more worried about the cost of running very large neural networks into the cloud, because there's this funny phenomenon that bigger is better with a neural network. So the more compute we throw at it, the bigger we make our models, somehow the better they perform, and we don't know precisely why that is, but we do know that they will use increasingly more energy to do the computations for us.
And at some point, that's just not a viable economic model anymore. So I already hear companies say, we might saturate there, so we might just not be able to use more complex models because the revenue we are getting is less than the investment in energy that we have to make. So there's a big move towards making everything more energy efficient, you're shrinking the models and quantizing them and running them at low precision, etc. So that they can run on your device and cheaply, because I have my phone anyway, and I hook it in the outlet at night. It's doing nothing there, it is just sitting there. Why not do a nice computation and warm your room? You’re just turning electricity into heat, and at the same time doing useful computation.
So I think there's very interesting synergies as you can think of. What if we build a bunch of GPUs, and you hang them in your home. And then, if you want to do a computation, you're part of a network, you turn a knob and you say, I want my home heated to this level of temperature, and you just allow computations to happen on that device. And you just nicely heat your room, which you need to do anyway. But at the same time, you're participating in a large computation. I think that's a much more scalable model.
2. From Research To Commercialization
Margaret Laffan: Many leaders we talked to talked about the impact of machine learning on climate change. And I know that there's going to be some workshops on that later this week as well here at NeurIPS. And you've served on the board of NeurIPS since 2015 on the foundation. How have you seen it evolved in the last four years?
First of all, there's exceptional growth. But there's something else that was very interesting which basically happened last year, which is a very strong move to be more inclusive and more diverse. This happened last year that we had a name change. The name was “NIPS”, and it wasn't felt to be appropriate anymore, and we turned it into “NeurIPS”.
And if you see the number of things that we have done this year, in order to make the conference more inclusive, including lots of minority groups, which are having their own sub-conference, like “Black in AI”, “Women in AI”, “LatinX in AI” and “Queer in AI”, they all have their own sub-conference going, which you think is really good. And right here, we have child care and all these things. So I think we have evolved a lot over the last year to make inclusion a really important theme for NeurIPS.
Margaret Laffan: Of course, NeurIPS represents the best AI research in the world. Where do you see the gap between research and commercialization right now?
The first thing is that the gap is closing in some interesting ways. I also work for Qualcomm, so half of my time is spent at Qualcomm. And of course, there I can observe in the first hand how papers which are developed in academia within a day through arXiv, end up at the desk of researchers and a company and they implement it, run with it if it works. It is an extremely efficient mechanism that takes results from academia and moves them into corporate research.
But the opposite direction is also happening, which is that the companies are actually hiring a lot of talents in the talent pool, and they're building their own research labs and contributing to the ecosystem. They're contributing papers and results and open-source software like TensorFlow and PyTorch. But they’re also giving back in terms of organizing conferences, like many of the program chairs and board members, they all serve on companies. So in some sense, they’re also donating time in that way.
Margaret Laffan: You have completed AI research in both Europe and North America. Can you share some of the differences and similarities that you've experienced between both these regions?
Max Welling:Yeah, it's interesting. I think there's an Anglo-Saxon model, and then there is a Continental Europe model. I think the UK is much more on the Anglo-Saxon model. For the Anglo-Saxon model, what happens is that, as an assistant professor, you start your own research group, so you're basically completely independent. You're on your own, and you grow that group. You also have to, of course, get your own funding in. I think that's a really good model because it gives freedom to the new researchers that are coming.
In Continental Europe, you often see a hierarchical model, which is a very large group, with full professors, associate professors and assistant professors in some kind of pyramid structure. The advantage of that in principle is that you can be more coherent together so you can tackle a really large problem together. The downside is I think that it diminishes the freedom of the new researchers. I think as a senior researcher, it is really important to listen to young researchers because they bring new ideas into fresh directions. And if they get forced to work on the topic that the full professor likes to work on, then I think that process is inhibited. So I think I mostly like the Anglo-Saxon model in that respect.
3. 2020 Trends in AI & Deep Learning
Margaret Laffan: The final question for you, Professor Welling: What do you expect to see the next major trends in AI and deep learning in 2020?
One obvious one I guess is reinforcement learning, we see it a lot. You just have to look at the numbers. And you see that reinforcement learning is on the rise as a topic. But I think what is perhaps less obvious is the fact that people are trying to build reinforcement learning algorithms. These algorithms interact with the world and also generalize well if you move them into a new orientation, and so in a new situation and context. That's what we think of when we say “artificial general AI”, which means it's not just something you train in one specific topic and then you ask it to do that and it does it very well, but if you move it into a new context, it just completely fails, that’s narrow AI. Humans are clearly much more flexible, we learned something in one context, and then we get put into a new context that we've never seen before, we can still do very well. So we also want our official agents to have this property.
One clear direction that I see, but I think it's not very much on the radar is that, the research in causality is being used to achieve that. So try to figure out what the true physics of the world is, what causes what. And if you have this causal structure of the world, you understand much more about the actual world. And then if you move it to a new context, you can generalize a lot better in this new context. So I see there's a lot of movement in that domain.
I think we will also see a continuation in making deep learning and machine learning more energy-efficient, that trend will continue. I think it will be very important because Moore's law is hitting a ceiling, and somehow we have to innovate in order to keep the growth going. So we are building chips which are extremely specialized to one particular task. And then, of course, all these chips that are specialized need to work together. So there’s a whole interesting set of challenges that has to be tackled.
Margaret Laffan: A lot to look forward to.
Yes, absolutely. Work is not finished.
Margaret Laffan: Professor Welling, thank you so much for joining us today.
Thank you. It was very interesting.