From Deep Learning of Disentangled Representations to Higher-level Cognition

One of the main challenges for artificial intelligence remains unsupervised learning, at which humans are much better than machines, and which we link to another challenge: bringing deep learning to higher-level cognition.

Yoshua Bengio from the University of Montreal is immediately recognizable as one of the key figures in the Deep Learning revolution that’s taken place in the last five years. And so for those of you who think he’s new to the field, just jumped in this century, he’s been at it for more than 30 years. One of his earliest papers was something called data driven execution of multi-layer networks for automatic speech recognition, which was published in 1988.

And many of the important advances that we see today in speech, vision, text, and images, machine translation are directly attributable to Yoshua’s work and that of his students.

And that work, that recognition is evident in many ways. If you start with something like citations, his work has been cited last year alone, more than 33,000 times. That’s a career for dozens of people and just a one year sample of Yoshua’s influence.

The other way in which his work is felt, is in the long stream of algorithmic innovations that he’s had. Most recently, they come in the form of unsupervised learning, notably the work on generative adversarial networks, on attention models, such as gating that’s been used for machine translation, but really opens up whole other doors to using a variety of other data structures.

And then perhaps the one that’s a little more hidden, the form of influence it’s a little more hidden, is his tremendous work in education and in supporting the community.

Microsoft Research

One thought on “From Deep Learning of Disentangled Representations to Higher-level Cognition”

Sergio Pissanetzky says:

February 14, 2018 at 10:10 pm

Unsupervised learning is solved. You need to build a neural network that is causal, calculate the action functional, and minimize it. You get the patterns directly. There are no synaptic weights, not back-propagation, no limits on depth or width. The network represents a causal set. Learning and knowledge integration are simple aggregation of causal pairs. The network grows as it learns. The causal set has mathematical properties that solve problems: adaptation, compression, continuous learning, self-explanation, one-shot learning, multiple domains and multiple sensors, invariant hierarchies, connect the dots, and many more. There are papers published, and several posts on Facebook group Not Only Deep Learning (NoDL).