Can AI draw inspiration from infant learning? Maybe it can!
We may be unknowingly doing some things right with the foundation models such as GPT-3, BERT, DALL-E 2, etc. Three key conclusions published in a recent paper based on an in-depth study of how infants learn show some directions we could possibly take to make the next leap in machine intelligence. The conclusions were:
This conclusion emphasizes that not only the training algorithm or the data matters for infant learning, but also the instantiation of the architecture and the starting conditions. This could perhaps be the reason why pre-trained, pre-instantiated architectures (transformer or convolution based) in the foundation models (BERT, GPT-3, DALL-E 2 etc.) have started to show great results in machine intelligence too.
The process of human evolution may have arrived at these "optimal" architectures for infant learning which get instantiated right in the womb. Our years of playing around with DNN architectures could have also resulted in us arriving at some "optimal" architectures for machine learning which are yielding some highly encouraging results. These are rivaling human capabilities, albeit for narrow tasks, in language, vision, image caption, and image generation.
Infants learn from richer multimodal inputs and their statistical relations. Multi-modal statistical correlations allow us to reason what a “brown dog” would look like without having seen one by knowing the statistical correlation between the image and the concept of “dog” and the concept of the color “brown.” Complex reasoning and commonsense, which is so natural to infants even at 12 months, is still quite difficult for machines. Infants can quickly distinguish between objects and agents and their goals, and hence can act to interact with the agent. Reasoning is an area of active research both from the Cognitive, Developmental, & Intuitive Psychology (AGENT, BIB, VOE etc.), as well as the AI (Neuro-Symbolic AI) community.
The use of multimodal representations for machine learning was of course a no-brainer for the community as they quickly realized how multimodal input streams lead to more robust representations. This has led to new kinds of multi-modal, multi-task capable foundation models such as GATO, DALL-E 2, MMF etc.. DALL-E 2’s ability to generate images from disparate concepts show how some of these foundation models are starting to understand cross-modality relationships.
The infant brain is an evolutionary and active-learning system that undergoes changes as required by their development stages. Locomotion and caregiver inputs play key roles in the infant actively exploring new signals or exploiting and fine-tuning existing signals. One of the challenges DNN architectures suffer from is the limited ability to make changes in the network as learning progresses. It is done in some limited ways by regularization techniques, learning in layers, etc.
Can we incorporate active-learning in DNN models where the system pro-actively chooses which new features to explore or exploit a feature by getting into more lower level details of it?
Prasenjit Day leads the innovation arm of Merlyn Mind and heads our India operations. He lives in Bangalore, India.