[CVPR 2019 AI Talk] Gang Hua, VP & Chief Scientist @ Wormpex AI Research
Dr. Hua shared inspiring thoughts on the trends and challenges of computer vision and the future of Artificial Intelligence.
Dr. Gang Hua is the VP and Chief Scientist at Wormpex AI Research, the research branch of a fast growing Chinese convenience store chain Bianlifeng. Prior to Wormpex, Gang was the director of Computer Vision Science at Microsoft, and an Associate Professor at Stevens Institute of Technology. He’s an IEEE Fellow, an IAPR Fellow and an ACM Distinguished Scientist. His research focuses on computer vision, pattern recognition, machine learning, robotics, and more.
Dr. Hua served as the Program Chair of the CVPR 2019 conference. This episode is a live recording of our interview with Dr. Hua at the conference. He shared inspiring thoughts on the trends and challenges of computer vision and the future of Artificial Intelligence.
Margaret Laffan: Gang, it’s such a pleasure to meet you today. We're delighted that you can take time out of your busy schedule at CVPR to join us for this conversation. So thank you for joining us.
Thanks. I think it's my great pleasure indeed. Thank you.
Margaret: Well, we're looking forward to a very active discussion here. So you know, I want to discuss first with you, you're an accomplished computer vision scientist. Of course, you're one of the chairs here of CVPR. And when we think about computer vision research, can you tell us what type of progress you've seen over the last number of years in computer vision research?
First, I wouldn't say I’m well accomplished. I think I'm doing fine in this community. This is a great community to be in. In terms of research progress, I would say if we look back at the last several years, one of the biggest awards is about how deep learning has been successfully getting into computer vision and made a lot of progress. But we should say that, as we are using deep learning and leveraging the power of data to solve a lot of our problems, there is one side of the research which has been neglected a little bit, and we're happy to see most of the things are coming back, which is about really modeling the physics and combining it with the power of statistical learning, deep learning method, we are going to make even more progress out of it.
Margaret Laffan: Absolutely. And we know when you come to CVPR, there are so many choices that you have in terms of the program of activities and the content and the education. How do you maintain that integrity of the prospectus so that the best academics are able to present at the conference?
Sure. So I would first say that CVPR remains one of the top research communities is mainly because of the people. The quality of the research in this community is largely maintained by our well-established reviewing process. Our program chairs, our area chairs and our responsible reviewers, those are the guys who ensure that we have high standards on our research. This review process is fairly objective, people are responsible and giving constructive feedback. And of course, our authors who wrote all those great papers running through this strict reviewing process, and then we select the best out of these high-quality papers to present in this five-day conference. I think that's how we ensure that the quality of our research and the reputation of the community.
Margaret Laffan: And we’ve interviewed many of them here today and throughout the week, and they feel so proud and honored to be selected and talk about what they've learned in the research and that how they've spent great time, so really this is a great place to meet, for that brain trust sharing and that as well.
Margaret Laffan: So we might look at some other areas in terms of your career trajectory and progression today as well. You've worked in three main areas of vision understanding, facial recognition and vision creation at Microsoft. What are the breakthroughs and challenges in these areas brought by deep learning?
I would say, the first part, the vision understanding, we are really working on trying to understand the scenes from videos. I would say if we looked into progress in this research area, I would say it benefited from a lot of other technological components in computer vision, such as object detection and recognition, human detection, human emotion analysis, human pose estimation, all those things. Because the way to understand is very comprehensive, you need to put potentially several computer vision technologies together to come out with an understanding of an action event. So that research area has benefited from some of the advancement in fundamental components.
So for facial recognition, I would say, it’s a combination of data and precise models like deep learning, deep networks which made tremendous progress and made commercialization possible. But if we're looking into facial recognition, indeed there is a long history of establishing benchmarks in early years as 1990s, actually it’s the US government who moved things forward along that line.
In terms of content creation, I would say that's also benefited from a family of deep learning methods called deep generative models. Generative model is not new in computer vision, there are a lot of research work around generative models for content generation. But the coming of these deep generate models made it so easy to fit any type of data distribution, made this huge push on this direction. I would say we are trying to make the process of creating artistic content to be more accessible to users. I think there's really, really huge progress made in these three areas in the past.
Margaret Laffan: Absolutely. And then when you think about the progress that's been made, certainly from academia, from your experience at Microsoft, what applications - either business or consumer - have you seen that excite you the most about how far we've come? And what are being demonstrated today in real world?
Sure. So it’s a great question indeed. I would say in the computer vision field, if you're in this field, look into the past 30 years, there has been a lot of discussions about what would be the killer application of computer vision. Of course, you can think about a lot of scenarios, for example, in army usage, those are really killer, but I'm not excited about those directions. What really makes me excited is about computer vision really plays a role in digitizing the physical world. In the past 30 years, we were largely in the internet economy. You're on the internet, everything's digitized. But there is a huge factor. Consider this meeting room here, our activities, what we are doing is not digitized. Think about in retail environment, the customers’ behaviors, how they interact with the products. Those are not digitized. So with the technology advancements, I would say that computer vision could play a central role in digitizing our environments, so that we can make better digital decisions all the way, and then use the intelligent decision to make our lives better. So I think that's what makes me feel excited, the application of computer vision indeed.
Margaret Laffan: We’re always talking about where we've come from. And then we're always talking about the future as well, including the current situations. So regarding the future of AI, what can we learn from the latest progress and neuroscience research? Where do you see this best applied?
Yeah, that's a great question. I would say, even deep learning is potentially, biologically inspired, but it is still far away from our human brain system. If we really try to see how research from each side can help each other, I would say people are finding a lot of evidence, that's well established. Even 30 years ago, people already found that the first several layers of a convolutional net is learning Gabor filters, which is really verified by research on human visual system. Actually, the visual first several layers of processing units in our human brain is actually doing Gabor filter from our vision.
But in terms of more synergistically, how neuroscience can guide AI research, I would say that must be from - we look into it in a holistic way. For example, there are certain things that people have more and more understanding, for example, consciousness, as a concept of human brain. Consciousness is a thinking process, potentially, it's something unique in human and in certain type of animals. Can we ever build a machine which has consciousness? That's a big question to ask, right? At this moment, I wouldn't say we have any solutions, but people are making some strides into that, which is understanding how the conscious thinking process becomes and then try to build computers or computational models, which at least inspired by the consciously thinking process, and having some logics out of those. I would say potentially maybe one day, along our way to artificial general intelligence, we may need to think about, is there a chance for us to really build a machine with consciousness? Only up to that point, we see that our AI system has really caught up with human intelligence. But at this moment, I would say we have a long way to go. And I definitely hope to see the findings in neuroscience guiding more of our research in AI. But at this moment, I would say the interaction is still fairly limited. I hope to see more about it.
Margaret Laffan: Again, I won't ask you for a timeline.
Yeah, the timeline is always difficult to predict. But what we are glad to see is researchers having more conversations from those two communities. I would say, maybe in 50 years, we can build a machine with consciousness. I don't know, maybe sooner. It’s super challenging.
Margaret Laffan: Let's talk about autonomous driving as well, because that's one of the most important applications of computer vision. And when we think about that, and we think about the work that is going into that entire research and development and so forth, Waymo has famously said: When you're 90% there, you still have 90% to go. And yet we know that 90% takes 10x the effort. And then when you think about that, you think about all the edge cases that surround us. With all of this happening, and knowing that those edge cases require so much work and effort to have a lot associated with them, what would be the new approaches of learning beyond deep learning to solve these cases?
This indeed is a very difficult question to answer. I would say I don't have an absolute answer in this phase. I remember professor Jitendra Malik said, back in maybe CVPR 2005: In computer vision, 90% of the cases are easy to solve, they are boring questions. So, the researchers really should look into the problem remaining in the rest 10% or 5% to solve. Of course, today we highly rely on machine learning to address a lot of computer vision challenges. But in this 10%, I would say we will need to approach them in a very systematic way, because we could mark them into two general categories, that this 10% could be in some common patterns, which are a lot of corner cases, but they are just in general very difficult cases, occlusion or other type of physical phenomenons there. And there’re, of course, really rare cases, you just don't have sufficient amount of data to train your system. We need to essentially use our knowledge to digest from our knowledge and to identify, to make sense out of those corner cases. I think it’s kind of learning paradigm shift, there's a word about it, transfer learning, transfer knowledge from one task to another.
And those are on the right direction. But I wouldn’t say we addressed the problem, then if we look into even newer learning paradigms, human learn from our language conversation. For example, two of us are making a good conversation here and we could learn from each other. But for this machine, we cannot really describe something then all of a sudden, our machine learning model starts to know what it is. But today, I would say we need something potentially like that to really address the rest 10%.
Margaret Laffan: Is there a timeline?
No. This kind of thing is hard to predict. I would say in the next 20, 30 years, maybe. We are going to make progress there with the collective intelligence from this community and also other communities. Now it's a stage maybe everything comes back together.
Margaret Laffan: You bring up the important aspect of this as well, which is around the human centered approaches as well. When we think about that, can you explain more about active learning? How you see that coming together, especially when you resolve the reasoning challenge for machine learning?
Sure. So I think active learning is conceptual, I'm not talking about any specific active learning approach. The concept of active learning is really that your learning machine participates in the learning process in an active way. That means it knows in what aspect it doesn't function well, so that it proactively asking for more input or figure out the ways to digest more knowledge. That's the gist of active learning, and it will make the learning to be more efficient, because the learners are constantly aware what knowledge is lacking there. So it would either proactively ask for human to provide more input or just figure out from a huge knowledge base, figure out a solution by themselves. I would say that is how learning really should be. Looking into the current machine learning paradigm, we’re feeding a lot of data and the machine learner is just stuffing the data into a model with whatever data you feed it. Then running some testing, you would know in some corner cases, it wouldn't work, but the model itself is not aware, it knows whether it is making a mistake or not. Sometimes, it does not even gave you a good confidence measure. So being able to do active learning means that we need to build models, which is aware of which part is not confident enough, so that it can focus the learning on those aspects and improve it.
Margaret Laffan: And bring it up to that certain level? And then you go again to get the next level of maturity surrounding that. Again, a little bit of way to go, right?
Sure. There’s still a long way to go.
Margaret Laffan: We will hold into that and meet you in 50 years and have a coffee and have a conversation. Let’s see how far we've come.
Sure. Well, maybe earlier, if it comes earlier.
Margaret Laffan: But there's a lot in this conversation with you where we're hearing around maturity of models, maturity of computer vision, facial recognition, all these areas coming together. So it brings a curiosity question for me, given your current role. You've recently joined a retail startup Wormpex as the VP and Chief Scientist. So congratulations on your new position. We'd love to know more about who Wormpex is? What do you do? What's your business vision? And where do you see yourselves going in the retail space?
Sure, I actually put some information in my LinkedIn profile. So Wormpex AI Research is the research branch of one of the largest convenience store chain, a newly established convenience store chain in China. Its Chinese name is Bianlifeng (便利蜂). So we established this research institute, mainly we are trying to build technologies, which can digitize the operation of the whole convenience store chain operating system. So think about when we're looking into the business of convenience store, it is really conventional business. But if we look into today's technology, we really try to see how we can digitize each stage of the operation process from storefront to warehouse to manufacture, so that we can have an end-to-end digital decision system, make digital decisions and use those intelligent decision factor back to the physical operations. So that we can be more efficient and we can save a lot of costs in that way. And we can drive our whole logistics system to be more efficient, and we save more, and we can drive our profit margin to be higher and higher. So that's essentially what my research institute is focusing on.
Of course, as a research institute, we have three missions. When I established it, I set up three missions. The first mission is, we want to be business focused. We want to use business to drive the definition of what technology we would prioritize to build, but we want to develop a technology to also drive the business operation to be better and better. The second mission of this research institute is, we want to build the state-of-the-art technologies. We want to establish a high bar, so that we can really establish ourselves in the technology space. And the third mission of this research institution is, we want to do exploratory things to secure the future. That means we also will have certain freedom to also do a little bit exploratory research to explore what is possible. So in that way, we can do forward steps and try to be prepared for maybe the next wave of the technology advancement. So I hope this explained our mission in a clear way to you
Margaret Laffan: Absolutely, thank you. What I'm trying to visualize is as a consumer, if I was to walk into your convenience store in two years’ time, what might I expect to be my experience?
That's a great question. Indeed, I would say the best AI technology for the customer is, when you step in our store, you will see the products you like. You won't be even aware we did anything, it's just an intuitive process. Your experience shouldn't be: “Oh, I want to buy something.” I get into the store: “Okay, that thing is there.” I can easily find it and I pick it up. I can finish my shopping maybe in five minutes or two minutes with a very convenient checkout.
Margaret Laffan: And when I think about that in the US, I mean, we have Amazon Go now. That, again, is all around that ease of convenience of shopping, primarily the payments. So what you're doing is slightly different, if I hear correctly. It can be that, but much more around that intuitiveness, knowing your customer, having a good sense of their profile and what they're purchasing, some personalization. I'm completely intrigued. I feel we could talk about this for a long time to go. But at this point, we're gonna have to wrap it up and say thank you so much for your time. This was a great conversation. I hope you enjoyed it too.
Yeah, I really enjoyed it. Thank you. It was great.
Host: Thank you for your time today.