[CVPR 2019 AI Talk] Gang Hua, VP & Chief Scientist @ Wormpex AI Research

Dr. Hua shared inspiring thoughts on the trends and challenges of computer vision and the future of Artificial Intelligence.

October 21, 2020
Tech Leaders

Dr. Gang Hua is the VP and Chief Scientist at Wormpex AI Research, the research branch of a fast growing Chinese convenience store chain Bianlifeng. Prior to Wormpex, Gang was the director of Computer Vision Science at Microsoft, and an Associate Professor at Stevens Institute  of Technology. He’s an IEEE Fellow, an IAPR Fellow and an ACM Distinguished  Scientist. His research focuses on computer vision, pattern recognition,  machine learning, robotics, and more.

Dr. Hua served as the Program Chair of the CVPR 2019 conference. This episode is a live recording of our interview with Dr. Hua at the conference.  He shared inspiring thoughts on the trends and challenges of computer vision and the future of Artificial Intelligence.

Full interview:

[CVPR 2019 AI Talk] Gang Hua, VP & Chief Scientist @ Wormpex AI Research

Highlight 1: Killer Applications of Computer Vision

Highlight 2: How can AI research benefit from progress in neuroscience?

Full Transcripts

Margaret Laffan:  Gang, it’s such a pleasure to meet you today. We're delighted that you can  take time out of your busy schedule at CVPR to join us for this conversation.  So thank you for joining us.

Gang Hua:

Thanks. I think it's my great pleasure  indeed. Thank you.

Margaret:  Well, we're looking forward to a very active discussion here. So you know, I  want to discuss first with you, you're an accomplished computer vision  scientist. Of course, you're one of the chairs here of CVPR. And when we think  about computer vision research, can you tell us what type of progress you've  seen over the last number of years in computer vision research?

Gang Hua:

First, I wouldn't say I’m well accomplished.  I think I'm doing fine in this community. This is a great community to be in.  In terms of research progress, I would say if we look back at the last several  years, one of the biggest awards is about how deep learning has been  successfully getting into computer vision and made a lot of progress. But we  should say that, as we are using deep learning and leveraging the power of  data to solve a lot of our problems, there is one side of the research which  has been neglected a little bit, and we're happy to see most of the things are  coming back, which is about really modeling the physics and combining it with  the power of statistical learning, deep  learning method, we are going to make even more progress out of it.

Margaret Laffan:  Absolutely. And we know when you come to CVPR, there are so many choices that  you have in terms of the program of activities and the content and the  education. How do you maintain that integrity of the prospectus so that the  best academics are able to present at the conference?

Gang Hua:

Sure. So I would first say that CVPR remains  one of the top research communities is mainly because of the people. The  quality of the research in this community is largely maintained by our  well-established reviewing process. Our program chairs, our area chairs and  our responsible reviewers, those are the guys who ensure that we have high  standards on our research. This review process is fairly objective, people are  responsible and giving constructive feedback. And of course, our authors who  wrote all those great papers running through this strict reviewing process,  and then we select the best out of these high-quality papers to present in  this five-day conference. I think that's how we ensure that the quality of our  research and the reputation of the community.

Margaret Laffan: And we’ve interviewed many of them here today and throughout the week, and  they feel so proud and honored to be selected and talk about what they've  learned in the research and that how they've spent great time, so really this  is a great place to meet, for that brain trust sharing and that as well.

Gang Hua:


Margaret Laffan:  So we might look at some other areas in terms of your career trajectory and  progression today as well. You've worked in three main areas of vision  understanding, facial recognition and vision creation at Microsoft. What are  the breakthroughs and challenges in these areas brought by deep learning?

Gang Hua:

I would say, the first part, the vision  understanding, we are really working on trying to understand the scenes from  videos. I would say if we looked into progress in this research area, I would  say it benefited from a lot of other technological components in computer  vision, such as object detection and recognition, human detection, human  emotion analysis, human pose estimation, all those things. Because the way to  understand is very comprehensive, you need to put potentially several computer  vision technologies together to come out with an understanding of an action  event. So that research area has benefited from some of the advancement in  fundamental components.

So for facial recognition, I would say, it’s  a combination of data and precise models like deep learning, deep networks  which made tremendous progress and made commercialization possible. But if  we're looking into facial recognition, indeed there is a long history of  establishing benchmarks in early years as 1990s, actually it’s the US government  who moved things forward along that line.

In terms of content creation, I would say  that's also benefited from a family of deep learning methods called deep  generative models. Generative model is not new in computer vision, there are a  lot of research work around generative models for content generation. But the  coming of these deep generate models made it so easy to fit any type of data  distribution, made this huge push on this direction. I would say we are trying  to make the process of creating artistic content to be more accessible to  users. I think there's really, really huge progress made in these three areas  in the past.

Margaret Laffan:  Absolutely. And then when you think about the progress that's been made,  certainly from academia, from your experience at Microsoft, what applications  - either business or consumer - have you seen that excite you the most about  how far we've come? And what are being demonstrated today in real world?

Gang Hua:

Sure. So it’s a great question indeed. I  would say in the computer vision field, if you're in this field, look into the  past 30 years, there has been a lot of discussions about what would be the  killer application of computer vision. Of course, you can think about a lot of  scenarios, for example, in army usage, those are really killer, but I'm not  excited about those directions. What really makes me excited is about computer  vision really plays a role in digitizing the physical world. In the past 30  years, we were largely in the internet economy. You're on the internet, everything's  digitized. But there is a huge factor. Consider this meeting room here, our  activities, what we are doing is not digitized. Think about in retail  environment, the customers’ behaviors, how they interact with the products.  Those are not digitized. So with the technology advancements, I would say that  computer vision could play a central role in digitizing our environments, so  that we can make better digital decisions all the way, and then use the  intelligent decision to make our lives better. So I think that's what makes me  feel excited, the application of computer vision indeed.

Margaret Laffan: We’re always talking about where we've come from. And then  we're always talking about the future as well, including the current  situations. So regarding the future of AI, what can we learn from the latest  progress and neuroscience research? Where do you see this best applied?

Gang Hua:

Yeah, that's a great question. I would say,  even deep learning is potentially, biologically inspired, but it is still far  away from our human brain system. If we really try to see how research from  each side can help each other, I would say people are finding a lot of  evidence, that's well established. Even 30 years ago, people already found  that the first several layers of a convolutional net is learning Gabor  filters, which is really verified by research on human visual system.  Actually, the visual first several layers of processing units in our human  brain is actually doing Gabor filter from our vision.

But in terms of more synergistically, how  neuroscience can guide AI research, I would say that must be from - we look  into it in a holistic way. For example, there are certain things that people  have more and more understanding, for example, consciousness, as a concept of  human brain. Consciousness is a  thinking process, potentially, it's something unique in human and in certain  type of animals. Can we ever build a machine which has consciousness? That's a  big question to ask, right? At this moment, I wouldn't say we have any  solutions, but people are making some strides into that, which is  understanding how the conscious thinking process becomes and then try to build  computers or computational models, which at least inspired by the consciously  thinking process, and having some logics out of those. I would say potentially  maybe one day, along our way to artificial general intelligence, we may need  to think about, is there a chance for us to really build a machine with  consciousness? Only up to that point, we see that our AI system has really caught  up with human intelligence. But at this moment, I would say we have a long way  to go. And I definitely hope to see the findings in neuroscience guiding more  of our research in AI. But at this moment, I would say the interaction is  still fairly limited. I hope to see more about it.

Margaret Laffan: Again, I won't ask you for a timeline.

Gang Hua:

Yeah, the timeline is always difficult to  predict. But what we are glad to see is researchers having more conversations  from those two communities. I would say, maybe in 50 years, we can build a machine with consciousness. I  don't know, maybe sooner. It’s super challenging.

Margaret Laffan: Let's talk about autonomous driving as well, because that's one of the most  important applications of computer vision. And when we think about that, and  we think about the work that is going into that entire research and  development and so forth, Waymo has famously said: When you're 90% there, you  still have 90% to go. And yet we know that 90% takes 10x the effort. And then  when you think about that, you think about all the edge cases that surround  us. With all of this happening, and knowing that those edge cases require so  much work and effort to have a lot associated with them, what would be the new  approaches of learning beyond deep learning to solve these cases?

Gang  Hua:

This indeed is a very difficult question to  answer. I would say I don't have an absolute answer in this phase. I remember  professor Jitendra Malik said, back in maybe CVPR 2005: In computer vision,  90% of the cases are easy to solve, they are boring questions. So, the  researchers really should look into the problem remaining in the rest 10% or  5% to solve. Of course, today we highly rely on machine learning to address a  lot of computer vision challenges. But in this 10%, I would say we will need  to approach them in a very systematic way, because we could mark them into two  general categories, that this 10% could be in some common patterns, which are  a lot of corner cases, but they are just in general very difficult cases,  occlusion or other type of physical phenomenons there. And there’re, of course,  really rare cases, you just don't have sufficient amount of data to train your  system. We need to essentially use our knowledge to digest from our knowledge  and to identify, to make sense out of those corner cases. I think it’s kind of  learning paradigm shift, there's a word  about it, transfer learning, transfer knowledge from one task to another.

And those are on the right direction. But I  wouldn’t say we addressed the problem, then if we look into even newer  learning paradigms, human learn from our language conversation. For example,  two of us are making a good conversation here and we could learn from each  other. But for this machine, we cannot really describe something then all of a  sudden, our machine learning model starts to know what it is. But today, I  would say we need something potentially like that to really address the rest 10%.

Margaret Laffan: Is there a timeline?

Gang Hua:

No. This kind of thing is hard to predict. I  would say in the next 20, 30 years, maybe. We are going to make progress there  with the collective intelligence from this community and also other  communities. Now it's a stage maybe everything comes back together.

Margaret Laffan:  You bring up the important aspect of this as well, which is around the human  centered approaches as well. When we think about that, can you explain more  about active learning? How you see that coming together, especially when you  resolve the reasoning challenge for machine learning?

Gang Hua:

Sure. So I think active learning is  conceptual, I'm not talking about any specific active learning approach. The  concept of active learning is really that your learning machine participates  in the learning process in an active way. That means it knows in what aspect  it doesn't function well, so that it proactively asking for more input or  figure out the ways to digest more knowledge. That's the gist of active  learning, and it will make the learning to be more efficient, because the  learners are constantly aware what knowledge is lacking there. So it would  either proactively ask for human to provide more input or just figure out from  a huge knowledge base, figure out a solution by themselves. I would say that  is how learning really should be. Looking into the current machine learning  paradigm, we’re feeding a lot of data and the machine learner is just stuffing  the data into a model with whatever data you feed it. Then running some  testing, you would know in some corner cases, it wouldn't work, but the model  itself is not aware, it knows whether it is making a mistake or not.  Sometimes, it does not even gave you a good confidence measure. So being able  to do active learning means that we need to build models, which is aware of  which part is not confident enough, so that it can focus the learning on those  aspects and improve it.

Margaret Laffan:  And bring it up to that certain level? And then you go again to get the next  level of maturity surrounding that. Again, a little bit of way to go, right?

Gang Hua:

Sure. There’s still a long way to go.

Margaret Laffan: We will hold into that and meet you in 50 years and have a coffee and have a  conversation. Let’s see how far we've come.

Gang Hua:

Sure. Well, maybe earlier, if it comes  earlier.

Margaret Laffan:  But there's a lot in this conversation with you where we're hearing around  maturity of models, maturity of computer vision, facial recognition, all these  areas coming together. So it brings a curiosity question for me, given your  current role. You've recently joined a retail startup Wormpex as the VP and  Chief Scientist. So congratulations on your new position. We'd love to know  more about who Wormpex is? What do you do? What's your business vision? And  where do you see yourselves going in the retail space?

Gang Hua:

Sure, I actually put some information in my  LinkedIn profile. So Wormpex AI Research is the research branch of one of the  largest convenience store chain, a newly established convenience store chain  in China. Its Chinese name is Bianlifeng (便利蜂). So we established this  research institute, mainly we are trying to build technologies, which can  digitize the operation of the whole convenience store chain operating system.  So think about when we're looking into the business of convenience store, it  is really conventional business. But if we look into today's technology, we  really try to see how we can digitize each stage of the operation process from  storefront to warehouse to manufacture, so that we can have an end-to-end  digital decision system, make digital decisions and use those intelligent  decision factor back to the physical operations. So that we can be more  efficient and we can save a lot of costs in that way. And we can drive our  whole logistics system to be more efficient, and we save more, and we can  drive our profit margin to be higher and higher. So that's essentially what my  research institute is focusing on.

Of course, as a research institute, we have  three missions. When I established it, I set up three missions. The first  mission is, we want to be business focused. We want to use business to drive  the definition of what technology we would prioritize to build, but we want to  develop a technology to also drive the business operation to be better and better.  The second mission of this research institute is, we want to build the  state-of-the-art technologies. We want to establish a high bar, so that we can  really establish ourselves in the technology space. And the third mission of  this research institution is, we want to do exploratory things to secure the  future. That means we also will have certain freedom to also do a little bit  exploratory research to explore what is possible. So in that way, we can do  forward steps and try to be prepared for maybe the next wave of the technology  advancement. So I hope this explained our mission in a clear way to you

Margaret Laffan:  Absolutely, thank you. What I'm trying to visualize is as a consumer, if I was  to walk into your convenience store in two years’ time, what might I expect to  be my experience?

Gang Hua:

That's a great question. Indeed, I would say  the best AI technology for the customer is, when you step in our store, you  will see the products you like. You won't be even aware we did anything, it's  just an intuitive process. Your experience shouldn't be: “Oh, I want to buy  something.” I get into the store:  “Okay, that thing is there.” I can easily find it and I pick it up. I can  finish my shopping maybe in five minutes or two minutes with a very convenient  checkout.

Margaret Laffan:  And when I think about that in the US, I mean, we have Amazon Go now. That,  again, is all around that ease of convenience of shopping, primarily the  payments. So what you're doing is slightly different, if I hear correctly. It  can be that, but much more around that intuitiveness, knowing your customer,  having a good sense of their profile and what they're purchasing, some  personalization. I'm completely intrigued. I feel we could talk about this for  a long time to go. But at this point, we're gonna have to wrap it up and say  thank you so much for your time. This was a great conversation. I hope you  enjoyed it too.

Gang Hua:

Yeah, I really enjoyed it. Thank you. It was  great.

Host: Thank you for your time today.

For more videos on CVPR 2019, check out our collection.

Robin.ly CVPR 2019 talks - Crossminds.ai

Sign up with Crossminds.ai to get personalized recommendations of the latest tech research videos!

Join Crossminds Now!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form

Robin.ly is a content platform dedicated to helping engineers and researchers develop leadership, entrepreneurship, and AI insights to scale their impacts in the new tech era.