Self-taught AI Demonstrates Similarities to Brain Function

 Self-supervised learning enables a neural network to choose what is important for itself. The technique may be what makes our own brains so effective.


Self-taught AI Demonstrates Similarities to Brain Function


For almost a decade, many of the most outstanding artificial intelligence systems have been trained by utilizing a massive database of labeled data.


Self-taught AI Demonstrates Similarities to Brain Function


To "train" an artificial neural network to accurately recognize a tabby from a tiger, a picture may be tagged "tabby cat" or "tiger cat." 

The plan has been tremendously effective while being grossly inadequate.


Such "supervised" training requires data that has been laboriously labeled by humans, and neural networks sometimes take shortcuts, learning to correlate the labels with minimum and sometimes superficial information. For example, a neural network could tell that a picture is of a cow if there is grass in it since cows are typically photographed in fields.


"We're breeding a generation of algorithms which are like undergrads who skipped class for the entire semester and then cram the night before the final," said Alexei Efros, a computer scientist at the University of California, Berkeley. "They don't truly understand the topic, yet they pass the exam."


Furthermore, for researchers interested in the junction of animal and machine intelligence, this "supervised learning" may be restricted in its ability to provide information about biological brains. 

Animals, including humans, do not learn from labeled data sets. They explore the environment on their own for the most part, and as a result, they build a deep and strong awareness of the world.



Some computational neuroscientists are now investigating neural networks trained with little or no human-labeled data. 


These "self-supervised learning" algorithms have shown tremendous effectiveness in modeling human language and, more recently, picture recognition. Recent research has shown that computer models of mammalian vision and auditory systems developed using self-supervised learning models have a better relationship to brain function than their supervised-learning equivalents. 


According to some neuroscientists, artificial neural networks seem to be revealing some of the real techniques human brains utilize to learn.


Inadequate Supervision

Brain models influenced by artificial neural networks reached maturity approximately ten years ago, around the same time a neural network called AlexNet transformed the challenge of categorizing unfamiliar photos. 


That network, like all neural networks, was composed of layers of artificial neurons, which are computational units that make connections with varying strengths, or "weights." If a neural network fails to properly categorize a picture, the learning algorithm changes the weights of the connections between the neurons to reduce the likelihood of misclassification in the following round of training. 


The method keeps doing this with all of the training photos, changing the weights until the error rate of the network is good enough.


Around the same time, neuroscientists used neural networks like AlexNet and its descendants to create the first computer models of the monkey visual system. 

When monkeys and artificial neural nets were given identical pictures, for example, the activity of actual neurons and artificial neurons exhibited an interesting correlation. Following that were artificial hearing and odor-detecting models.



However, as the area developed, researchers recognized the limits of supervised training. 


For example, Leon Gatys, a computer scientist at the University of Tübingen in Germany at the time, and his colleagues took a photograph of a Ford Model T and then superimposed a leopard skin pattern over the shot, resulting in a weird but readily recognized image. 



The original picture was accurately identified as a Model T by a prominent artificial neural network, while the changed image was identified as a leopard. It had been hooked on the texture and had no concept of the form of an automobile (or a leopard, for that matter).


Self-directed learning procedures are intended to prevent such issues. Humans do not label the data in this strategy.

 Friedemann Zenke, a computational neuroscientist at the Friedrich Miescher Institute for Biomedical Research in Basel, Switzerland, said that "the labels originate from the data itself." 


Self-supervised algorithms, in essence, leave gaps in the data and rely on the neural network to fill the blanks. 

The training method in a so-called big language model, for example, will display to the neural network the first few words of a phrase and ask it to predict the following word. 


When fed a vast corpus of material from the internet, the model seems to acquire the language's grammatical structure, displaying exceptional linguistic abilities – without needing external labels or supervision.

In computer vision, a similar endeavor is ongoing. Kaiming will arrive in late 2021. He and his colleagues demonstrated their "masked auto-encoder," which is based on a technology pioneered by Efros' team in 2016. 


The self-supervised learning technique masks photos randomly, hiding about three-quarters of each. 

The masked auto-encoder converts the unmasked sections into latent representations, which are compressed mathematical descriptions of an object. 


(In the instance of a picture, the latent representation may be a mathematical description that captures the form of an item in the image, among other things). 

The representations are then converted back into complete pictures via a decoder.



The encoder-decoder combo is trained using the self-supervised learning method to convert masked pictures to their full representations.


 Any disparities between the original and reconstructed pictures are supplied back into the system to assist it in learning. 


This procedure is repeated for each batch of training photos until the system's error rate is acceptable. 

When a trained masked auto-encoder was shown a picture of a bus with about 80% of it covered up, it was able to correctly figure out how the bus was built.


self-supervised learning

Some neuroscientists find parallels between how humans learn and systems like this. 

"There's no question that 90% of what the brain does is self-supervised learning," said Blake Richards, a computational neuroscientist at McGill University and the Quebec Artificial Intelligence Institute Mila.

 Biological brains are supposed to constantly forecast, for example, an object's future position as it moves or the next word in a phrase, in the same way, that a self-supervised learning system tries to predict the gap in a picture or a piece of text. 


And our brains learn from their errors on their own as well—just a tiny portion of our brain's input comes from an external source, effectively stating, "wrong response."

Take, for example, the visual systems of humans and other primates. 



Neuroscientists have been confused about why the visual systems of animals have two different pathways: the ventral visual stream, which detects objects and faces, and the dorsal visual stream, which analyzes movement. 

These pathways are called the "what" and "where" pathways, respectively.


Richards and his colleagues developed a self-supervised model that suggests a solution. 

They developed an AI by combining two distinct neural networks: The first, known as the ResNet architecture, was intended for image processing; the second, known as a recurrent network, could maintain track of a series of past inputs in order to forecast the next predicted input. 



To train the combined AI, the scientists began with a series of, say, 10 frames from a movie, and let the ResNet analyze each one individually. 


The recurrent network than predicted the latent representation of the 11th frame, rather than only matching the first ten. 


The self-supervised learning algorithm compared the forecast to the actual value and told the neural networks to adjust their weights to improve the prediction.


Richards' team discovered that an AI trained with a single ResNet was effective at object identification but not at movement classification. 


However, when scientists divided the same ResNet into two paths (without affecting the overall number of neurons), the AI created representations for objects in one and movement in the other, allowing downstream classification of both features—much as human brains do.


To put the AI to the test, the researchers gave it a series of films that had previously been given to mice at the Allen Institute for Brain Science in Seattle. 


Mice, like primates, have brain areas that are specialized for both static imagery and movement. 

While the mice watched the movies, the Allen researchers watched how the visual cortex of their brains worked.


Richards' team discovered parallels in how the AI and biological brains responded to the videos. 

During training, one of the artificial neural network's paths looked like the ventral parts of the mouse's brain that detect objects, and the other looked like the dorsal parts that focus on movement.


According to Richards, the findings indicate that the human visual system includes two specialized routes that assist in forecasting the visual future; a single pathway is insufficient.


Human auditory system models give a similar story.

 In June, a team led by Jean-Rémi King, a Meta AI research scientist, trained Wav2Vec 2.0, an AI that employs a neural network to turn audio into latent representations. The researchers mask some of these representations, which are subsequently fed into a transformer component neural network. 



The transformer anticipates the disguised information during training. 

Throughout the process, the AI learns to convert sounds into latent representations—again, no labels are required. 


Scientists used 600 hours of speech data to train the network. According to King, this is about what a young person would learn in the first two years of experience.



After training the system, the researchers fed it portions of audiobooks in English, French, and Mandarin. The researchers then compared the AI's performance against data from 412 humans, a mix of native speakers of the three languages, who had their brains scanned in an fMRI scanner while listening to identical lengths of audio. 


Despite the noisy and low-resolution fMRI pictures, King claims that his neural network and human brains "not only correlate with one another, but also correlate in a systematic fashion": The activity in the AI's early levels corresponds to activity in the primary auditory cortex, whilst the activity in the AI's deepest layers corresponds to activity in the brain's upper layers, in this instance, the prefrontal cortex. 

"It's very lovely data," Richards added. 

"It's not conclusive, but it's another strong piece of evidence that suggests that, yes, we learn a lot of language by trying to guess what will be said next."


Untreated Pathologies 

Not everyone is persuaded. Josh McDermott, a computational neuroscientist at MIT, has worked on models of visual and auditory perception utilizing both supervised and self-guided learning. 


His group has created what he refers to as "metamers," which are synthetic audio and visual signals that are incomprehensible to humans. 


On the other hand, metamers seem indistinguishable from genuine signals to an artificial neural network. This shows that the representations formed in the deeper layers of the neural network, even with self-supervised learning, do not correspond to the representations in human brains. 


These techniques for self-supervised learning "are advances in the sense that you can acquire representations that can enable a lot of recognition behaviors without having all these labels," McDermott said. However, they retain many of the pathologies of supervised models.


The algorithms themselves need further development as well. In Meta AI's Wav2Vec 2.0, for example, the AI only predicts latent representations for a few tens of milliseconds of sound – less time than it takes to speak a perceptually distinguishable noise, much less a word. 



Understanding brain function will require more than self-directed learning. 

For one thing, the brain is densely packed with feedback connections, while existing models have few if any. 


The logical next step would be to employ self-supervised learning to train highly recurrent networks, which is a tough operation, and then compare the activity in such networks to actual brain activity. 


The activity of artificial neurons in self-supervised learning models must also be matched to the activity of particular real neurons. "Hopefully, [our] conclusions will be verified with single-cell recordings in the future," King added.


If the observed parallels between brains and self-supervised learning models hold true for additional sensory tasks, it will be even more evidence that whatever magic our brains are capable of needs some type of self-supervised learning. 


At least, that's the kind of lovely theory we'd want to work with.

Comments
No comments
Post a Comment



    Reading Mode :
    Font Size
    +
    16
    -
    lines height
    +
    2
    -