“We shape our tools and thereafter our tools shape us.”
― Marshall McLuhan
Artificial intelligence (AI) systems are generally designed to solve one traditional AI task. While such weak systems are undoubtedly useful as decision-making aiding tools, future AI systems will be strong and general, consolidating common sense and general problem solving capabilities (a16z podcast “Brains, Bodies, Minds … and Techno-Religions” brings some great examples of what general artificial intelligence could be capable of). To achieve general intelligence — a human-like ability to use previous experiences to solve arising problems — AI agents’ “brains” would need to (biologically) evolve their experiences into a variety of new tasks. This is where Universe comes in.
In December, OpenAI introduced Universe, a software platform for training an AI's general intelligence to become skilled at any task that a human can do with a computer. Universe builds upon OpenAI’s Gym, a toolkit designed for the development and comparing of reinforcement learning algorithms (the environment acts as the tutor, providing periodic feedback/“reward” to an agent which in turn will either encourage or discourage subsequent actions). The Universe software essentially allows any program to be turned into a Gym environment by launching it behind a virtual desktop avoiding the requirement for Universe to have direct access to the programs source code and other protected internal data.
OpenAI perceives such interaction as a validation for artificial intelligence: many applications are essentially micro-virtual worlds and exposing AI learning techniques to them will lead to more trained agents, capable of tackling a diverse range of (game) problems quickly and well. Being able to master new, unfamiliar environments in this way is a first step toward general intelligence, allowing AI agents to “anticipate,” rather than forever getting stuck in a singular “single task” loop.
However, as much as Universe is a unique experience vessel for artificial intelligence, it is a unique *visual* experience vessel, enabling an agent to interact with any external software application via pixels (by using keyboard, and mouse), each of these applications constituting different HCI environment sources. It is the access to a vast digital universe full of variety of *visual* training tasks.
But isn’t it missing out on all the fun of full tactile experience? Shouldn’t there be a digitized training somatosensory platform for AI agents, to recognize and interpret the myriad of tactile stimuli to grasp onto the experience of a physical world? The somatosensory system is the part of the central nervous system that is involved with decoding a wide range of tactile stimuli comprising object recognition, texture discrimination, sensory-motor feedback and eventually inter-social communication exchange — for our perception and reaction to stimuli originating outside and inside of our body and for the perception and control of body position and balance. One of the more essential aspects of general intelligence that gives us a common sense of understanding the world is being placed in the environment and being able to interact with things in the world — embedded in all of us is the instinctual ability of telling apart any mechanical forces upon the skin (temperature, texture, intensity of the tactile stimuli).
Our brain is indeed the core of all human thought and memory, constantly organizing, identifying, perceiving the environment that surrounds us and interpreting it through our senses, in a form of the data flow. And yet, studies have taught us that multiple senses can stimulate the central nervous center. (Only) estimated 78% of all perceived by brain data flow is visual, while the remaining part originates from sound (12%), touch (5%), smell (2.5%), and taste (2.5%) — and that is assuming that we deciphered all of the known senses. So by training general AI purely via its visual interaction, will we be getting a 78% general artificial intelligence? Enter the “embodied cognition” theory.
Embodied cognition
Embodied cognition is a research theory that is generally all about the vast difference of having an active body and being situated in a structured environment adept to the kind of tasks that the brain has to perform in order to support adaptive task success. Here I refer to the team as the existence of a memory system that encodes data of agent’s motory and sensory competencies, stressing the importance of action for cognition, in such way that an agent is capable to tangibly interact with the physical world. The aspects of the agent's body beyond its brain play a significant causative and physically integral role in its cognitive processing. The only way to understand the mind, how it works, and subsequently train it is to consider the body and what helps the body and mind to function as one.
This approach is in line with a biological learning pattern based on “Darwinian selection” that proposes intelligence to be only be measured in the context of the surrounding environment of the organism studied: “…we must always consider the embodiment of any intelligent system. The preferred embodiment reflects that the mind and its surrounding environment (including the physical body of the individual) are inseparable and that intelligence only exists in the context of its surrounding environment.”
Stacked neural networks must emulate evolution’s hierarchical complexity (Commons, 2008)
Current notions of neural networks (NNSs) are indeed based on the known evolutionary processes of executing tasks and share some properties of biological NNSs in the attempt to tackle general problems but as architecture inspiration thus without necessarily closer copying a real biological system. One of such first design steps is the advancement to develop AI NNSs, that can closely imitate general intelligence, follows the model of hierarchical complexity (HC), in terms of data acquisition. Stacked NNs based on this model could imitate evolution's environmental/behavioral processes and reinforcement learning (RL). However, computer-implemented systems or robots generally do not indicate generalized higher learning adaptivity — the capacity to go from learning ability to learning another without dedicated programming.
Established NNs are limited for two reasons. The first one of the problems is that AI models are based on the notions of Turing machines. Almost all AI models are based on words or text. But Turing machines are not enough to really produce intelligence. At the lowest stages of development, they need effectors that produce a variety of responses — movement, grasping, emoting, and so on. They must have extensive sensors to take in more from the environment. Even though Carpenter and Grossberg's (1990, 1992) neural networks were to model simple behavioral processes, however, the processes they were to model were too complex. This resulted in NNs that were relatively unstable and were not highly adaptable. When one looks at evolution, however, one sees that the first NNs that existed were, for example, in Aplysia, Cnidarians (Phylum Cnidaria), and worms. They were specialized to perform just a few tasks even though some general learning was possible.
Animals, including humans, pass through a series of ordered stages of development (see “Introduction to the Model of Hierarchical Complexity,” World Futures, 64: 444-451, 2008). Behaviors performed at each higher stage of development always successfully address task requirements that are more hierarchically complex than those required by the immediately preceding order of hierarchical complexity. Movement to a higher stage of development occurs by the brain combining, ordering, and transforming the behavior used at the preceding stage. This combining and ordering of behaviors thus must be non-arbitrary.
Somatosensory system emulation
Neuroscience has discovered classification of specific regions, processes, and interactions down to molecular level for memory and thought reasoning. Neurons and synapses are both actively involved in thought and memory, and with the help of brain imaging technology (e.g. Magnetic Resonance Imaging (MRI), Nuclear Magnetic Resonance Imaging, or Magnetic Resonance Tomography (MRT)), brain activity can be analyzed at the molecular level. All perceived data in the brain is represented in the same way, through the electrical firing patterns of neurons. The learning mechanism is also the same: memories are constructed by strengthening the connections between neurons that fire together, using a biochemical process known as long-term potentiation. Recently atomic magnetometers have begun development of inexpensive and portable MRI instruments without large magnets (used in traditional MRI machines to image parts of the human anatomy, including the brain). There are over 10 billion neurons in the brain, each of which has synapses that are involved in memory and learning, which can also be analyzed by brain imaging methods, soon in-real time. It has been proven that new brain cells are created whenever one learns something new by physically interacting with their environment. Whenever stimuli in the environment or through a thought makes a significant enough impact on the brain perception, new neurons are created. During this process synapses carry on electro-chemical activities that directly reflect activity related to both memory and thought, from a tactile point of sensation. The sense of touch, weight, and all other tactile sensory stimuli need to be implemented as the concrete “it” value that is assigned to an agent by the nominal concept. By reconstructing 3D neuroanatomy from molecular level data, sensory activity in the brain at the molecular level can be detected, measured, stored, and reconstructed of a subset of the neural projections, generated by an automated segmentation algorithm, to convey the neurocomputational sensation to an AI agent. Existence of such somatosensory Universe-like database, focused on the training of AI agents, beyond visual interaction, may bring us closer to the 100% general AI.