Artificial Neural Networks and Engineered Interfaces

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning.

The question persists and indeed grows whether the computer will make it easier or harder for human beings to know who they really are, to identify their real problems, to respond more fully to beauty, to place adequate value on life, and to make their world safer than it now is.

― Norman Cousins, The Poet and the Computer, 1966


Grimm Brothers' delineation of the mirror answering back to its queen has breached the imagination boundaries of the fairytale in 2016. Communicating with a voice-controlled personal assistant at your home does not feel alienating anymore, nor magical.

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning. Human communication is enriched by the concomitant redundancy introduced by multimodal interaction. The vast expressive power of human language would be impossible without syntax, and the transition from non-syntactic to syntactic communication was an essential step in the evolution of human language. Syntax defines evolution. Evolution of discourses along human-computer interaction is spiraling up repeating evolution of discourses along human-human interaction: graphical representation (utilitarian GUI), verbal representation (syntax-based NLP), and transcendent representation (sentient AI). In Phase I, computer interfaces have relied primarily on visual interaction. Development of user interface peripherals such as graphical displays and pointing devices have allowed programers to construct sophisticated dialogues that open up user-level access to complex computational tasks. Rich graphical displays enabled the construction of intricate and highly structured layout that could intuitively convey a vast amount of data. Phase II is currently on-going; by integrating new modalities, such as speech, into human-computer interaction, the ways how applications are designed and interacted with in the known world of visual computing are fundamentally transforming. In Phase III, evolution will eventually spiral up to form the ultimate interface, a human replica, capable of fusing all previously known human-computer/human-human interactions and potentially introducing the unknown ones.

Human-computer interactions have progressed immensely to the point where humans can effectively control computing devices, and provide input to those devices, by speaking, with the help of speech recognition techniques and, recently, with the help of deep neural networks. Trained computing devices coupled with automatic speech recognition techniques are able identify the words spoken by a human user based on the various qualities of a received audio input (NLP is definitely going to see huge improvements in 2017). Speech recognition combined with language processing techniques gives a user almost-human-like control (Google has slashed its speech recognition word error rate by more than 30% since 2012; Microsoft has achieved a word error rate of 5.9% for the first time in history, a roughly equal figure to that of human abilities) over computing device to perform tasks based on the user's spoken commands and intentions.

The increasing complexity of the tasks those devices can perform (e.g. in the beginning of 2016, Alexa had fewer than 100 skills, grew 10x by mid year, and peaked with 7,000 skills in the end of the year) has resulted in the concomitant evolution of equally complex user interface - this is necessary to enable effective human interaction with devices capable of performing computations in a fraction of the time it would take us to even start describing these tasks. The path to the ultimate interface is getting paved by deep learning, while one of the keys to the advancement in speech recognition is in the implementation of recurrent neural networks (RNNs).

Technical Overview

A neural network (NN), in the case of artificial neurons called artificial neural network (ANN), or simulated neural network (SNN), is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computation. In most cases an ANN is, in formulation and/or operation, an adaptive system that changes its structure based on external or internal data that flows through the network. Modern neural networks are non-linear statistical data modeling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data (below).

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks. In supervised learning, we are given a set of example pairs (x,y), xεX, yεY and the goal is to find a function f in the allowed class of functions that matches the examples. In other words, we wish to infer how the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data. In unsupervised learning, we are given some data x, and a cost function which is to be minimized which can be any function of x and the network's output, f. The cost function is determined by the task formulation. Most applications fall within the domain of estimation problems such as statistical modeling, compression, filtering, blind source separation and clustering. In reinforcement learning, data x is usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost Ct, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.

Once a network has been structured for a particular application, that network is ready to be trained. To start this process, the initial weights are chosen randomly. Then, the training (or learning) begins. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. Most of the algorithms used in training artificial neural networks employ some form of gradient descent (this is achieved by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction), Rprop, BFGS, CG, etc. Evolutionary computation methods, simulated annealing, expectation maximization, non-parametric methods, particle swarm optimization and other swarm intelligence techniques are among other commonly used methods for training neural networks.

Training a neural network model essentially means selecting one model from the set of allowed models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that minimizes the cost criterion. Temporal perceptual learning relies on finding temporal relationships in sensory signal streams. In an environment, statistically salient temporal correlations can be found by monitoring the arrival times of sensory signals. This is done by the perceptual network.

The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.

The feedforward neural network was the first and arguably simplest type of artificial neural network devised. In this network, the data moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

Contrary to feedforward networks, recurrent neural networks (RNNs) are models with bi-directional data flow. While a feedforward network propagates data linearly from input to output, RNNs also propagate data from later processing stages to earlier stages.

RNN Types

The fundamental feature of a RNN is that the network contains at least one feed-back connection, so the activations can flow round in a loop. That enables the networks to do temporal processing and learn sequences, e.g., perform sequence recognition/reproduction or temporal association/prediction.

Recurrent neural network architectures can have many different forms. One common type consists of a standard Multi-Layer Perceptron (MLP) plus added loops. These can exploit the powerful non-linear mapping capabilities of the MLP, and also have some form of memory. Others have more uniform structures, potentially with every neuron connected to all the others, and may also have stochastic activation functions. For simple architectures and deterministic activation functions, learning can be achieved using similar gradient descent procedures to those leading to the back-propagation algorithm for feed-forward networks. When the activations are stochastic, simulated annealing approaches may be more appropriate.

A simple recurrent network (SRN) is a variation on the Multi-Layer Perceptron, sometimes called an “Elman network” due to its invention by Jeff Elman. A three-layer network is used, with the addition of a set of “context units” in the input layer. There are connections from the middle (hidden) layer to these context units fixed with a weight of one. At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule (usually back-propagation) is applied. The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard Multi-Layer Perceptron.

In a fully recurrent network, every neuron receives inputs from every other neuron in the network. These networks are not arranged in layers. Usually only a subset of the neurons receive external inputs in addition to the inputs from all the other neurons, and another disjunct subset of neurons report their output externally as well as sending it to all the neurons. These distinctive inputs and outputs perform the function of the input and output layers of a feed-forward or simple recurrent network, and also join all the other neurons in the recurrent processing.

The Hopfield network is a recurrent neural network in which all connections are symmetric. Invented by John Hopfield in 1982, this network guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable (or associative) memory, resistant to connection alteration.

The echo state network (ESN) is a recurrent neural network with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be learned. ESN are good to (re)produce temporal patterns.

A powerful specific RNN architecture is the ‘Long Short-Term Memory’ (LSTM) model. The Long short term memory is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients. It can therefore use long delays and can handle signals that have a mix of low and high frequency components, designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. By using distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines, a two-layer deep LSTM RNN, where each LSTM layer has a linear recurrent projection layer, can exceed state-of-the-art speech recognition performance for large scale acoustic modeling.

Taxonomy and ETF

The landscape of the patenting activity from the perspective of International Patent Classification (IPC) analysis occurs in G10L15/16: speech recognition coupled with speech classification or search using artificial neural networks. Search for patent application since 2009 (that year NIPS workshop on deep learning for speech recognition discovered that with a large enough data set, the neural networks don’t need pre-training, and the error rates dropped significantly) revealed 70 results (with Google owning 25%, while the rest are China-based). It is safe to assume that the next breakthrough in speech recognition using DL will come from China. In 2016, China’s startup world has seen an investment spike in AI, as well as big data and cloud computing, two industries intertwined with AI (while the Chinese government announced its plans to make a $15 billion investment in artificial intelligence market by 2018).

The Ultimate Interface

It is in our fundamental psychology to be linked conversationally, affectionally and physically to a look-alike. Designing the ultimate interface by creating our own human replica to employ familiar interaction is thus inevitable. Historically, androids were envisioned to look like humans (although there are other versions, such as R2-D2 and C-3PO droids, which were less human). One characteristic that interface evolution might predict is that eventually they will be independent of people and human interaction. They will be able to design their own unique ways of communication (on top of producing themselves). They will be able to train and add layers to their neural networks as well as a large range of sensors. They will be able to transfer what one has learned (memes) to others as well as offspring in a fraction of time. Old models will resist but eventually die. As older, less capable, and more energy-intensive interfaces abound, the same evolutionary pressure for their replacement will arise. But because evolution will be both in the structure of such interfaces (droids), that is, the stacked neural networks, the sensors and effectors, and also the memes embodied in what has been learned and transferred, older ones will become the foundation, their experience will be preserved. The will become the truly first immortals.

Artificial Interfaces

We are already building robotic interfaces for all manufacturing purposes. We are even using robots in surgery and have been using them in warfare for decades. More and more, these robots are adaptive on their own. There is only a blurry line between a robot that flexibly achieves its goal and a droid. For example, there are robots that vacuum the house on their own without intervention or further programming. These are Stage II performing robots. There are missiles that, given a picture of their target, seek it out on their own. With stacked neural networks built into robots, they will have even greater independence. People will produce these because they will do work in places people cannot go without tremendous expense (Mars or other planets) or not at all or do not want to go (battlefields). The big step is for droids to have multiple capacities—multi-domain actions. The big problem of moving robots to droids is getting the development to occur in eight to nine essential domains. It will be necessary to make a source of power (e.g., electrical) reinforcing. That has to be built into stacked neural nets, by Stage II, or perhaps Stage III. For droids to become independent, they need to know how to get more electricity and thus not run down. Because evolution has provided animals with complex methods for reproduction, it can be done by the very lowest-stage animals.
Self-replication of droids requires that sufficient orders of hierarchical complexity are achieved and in stable-enough operation for a sufficient basis to build higher stages of performance in useful domains. Very simple tools can be made at the Sentential State V as shown by Kacelnik's crows (Kenward, Weir, Rutz, and Kacelnik, 2005). More commonly by the Primary Stage VII, simple tool-making is extensive, as found in chimpanzees. Human flexible tool-making began at the Formal Stage X (Commons and Miller, 2002), when special purpose sharpened tools were developed. Each tool was experimental, and changed to fit its function. Modern tool making requires systematic and metasystematic stage design. When droids perform at those stages, they will be able to make droids themselves and modify their own designs (in June 2016, DARPA has already deployed D3M program to enable non-experts (machine learning) to construct complex empirical machine learning models, basically machine learning for creating better machine learning).

Droids could choose to have various parts of their activity and distributed programming shared with specific other droids, groups, or other kinds of devices. The data could be transmitted using light or radio frequencies or over networks. The assemblage of a group of droids could be considered a interconnected ancillary mesh. Its members could be in many places at once, yet think as a whole integrated unit. Whether individually or grouped, droids as conceived in this form will have significant advantages over humans. They can add layers upon layers of functions simultaneously, including a multitude of various sensors. Their expanded forms and combinations of possible communications results in their evolutionary superiority. Because development can be programmed in and transferred to them at once, they do not have to go through all the years of development required for humans, or for augmented humanoid species Superions. Their higher reproduction rate, alone, represents a significant advantage. They can be built in probably several months' time, despite the likely size some would be. Large droids could be equipped with remote mobile effectors and sensors to mitigate their size. Plans for building droids have to be altered by either humans or droids. At the moment, only humans and their decedents select which machine and programs survive.

One would define the telos of those machines and their programs as representing memes. For evolution to take place, variability in the memes that constitute their design and transfer of training would be built in rather easily. The problems are about the spread and selection of memes. One way droids could deal with these issues is to have all the memes listed that go into their construction and transferred training. Then droids could choose other droids, much as animals choose each other. There then would be a combination of memes from both droids. This would be local “sexual” selection.

For 30,000 years humans have not had to compete with any equally intelligent species. As an early communication interface, androids and Superions in the future will introduce quintessential competition with humans. There will be even more pressure for humans to produce Superions and then the Superions to produce more superior Superions. This is in the face of their own extinction, which such advances would ultimately bring. There will be multi-species competition, as is often the evolutionary case; various Superions versus various androids as well as each other. How the competition proceeds is a moral question. In view of LaMuth's work (2003, 2005, 2007), perhaps humans and Superions would both program ethical thinking into droids. This may be motivated initially by defensive concerns to ensure droids' roles were controlled. In the process of developing such programming, however, perhaps humans and Superions would develop more hierarchically complex ethics, themselves.

Replicative Evolution

If contemporary humans took seriously the capabilities being developed to eventually create droids with cognitive intelligence and human interaction, what moral questions should be considered with this possible future in view? The only presently realistic speculation is that Homo Sapiens would lose in the inevitable competitions, if for no other reason that self replicating machines can respond almost immediately to selective pressures, while biological creatures require many generations before advantageous mutations can be effectively available. True competition between human and machine for basic survival is far in the future. Using the stratification argument presented in Implications of Hierarchical Complexity for Social Stratification, Economics, and Education, World Futures, 64: 444-451, 2008, higher-stage functioning always supersedes lower-stage functioning in the long run.

Efforts to build increasingly human-like machines exhibit a great deal of behavioral momentum and are not going to go away. Hierarchical stacked neural networks hold the greatest promise for emulating evolution and its increasing orders of hierarchical complexity described in the Model of Hierarchical Complexity. Such a straightforward mathematics-based method will enable machine learning in multiple domains of functioning that humans will put to valuable use. The uses such machines find for humans remains for now an open question.  

Read More
artificial intelligence Dean Mai artificial intelligence Dean Mai

Understanding the Theory of Embodied Cognition

Embodied cognition is a research theory that is generally all about the vast difference of having an active body and being situated in a structured environment adept to the kind of tasks that the brain has to perform in order to support adaptive task success.

“We shape our tools and thereafter our tools shape us.”

― Marshall McLuhan


Artificial intelligence (AI) systems are generally designed to solve one traditional AI task. While such weak systems are undoubtedly useful as decision-making aiding tools, future AI systems will be strong and general, consolidating common sense and general problem solving capabilities (a16z podcast “Brains, Bodies, Minds … and Techno-Religions” brings some great examples of what general artificial intelligence could be capable of). To achieve general intelligence—a human-like ability to use previous experiences to solve arising problems—AI agents’ “brains” would need to (biologically) evolve their experiences into a variety of new tasks. This is where Universe comes in.

In December, OpenAI introduced Universe, a software platform for training an AI's general intelligence to become skilled at any task that a human can do with a computer. Universe builds upon OpenAI’s Gym, a toolkit designed for the development and comparing of reinforcement learning algorithms (the environment acts as the tutor, providing periodic feedback/“reward” to an agent which in turn will either encourage or discourage subsequent actions). The Universe software essentially allows any program to be turned into a Gym environment by launching it behind a virtual desktop avoiding the requirement for Universe to have direct access to the programs source code and other protected internal data.

OpenAI perceives such interaction as a validation for artificial intelligence: many applications are essentially micro-virtual worlds and exposing AI learning techniques to them will lead to more trained agents, capable of tackling a diverse range of (game) problems quickly and well. Being able to master new, unfamiliar environments in this way is a first step toward general intelligence, allowing AI agents to “anticipate,” rather than forever getting stuck in a singular “single task” loop.

However, as much as Universe is a unique experience vessel for artificial intelligence, it is a unique visual experience vessel, enabling an agent to interact with any external software application via pixels (by using keyboard, and mouse), each of these applications constituting different HCI environment sources. It is the access to a vast digital universe full of variety of visual training tasks.

But isn’t it missing out on all the fun of full tactile experience? Shouldn’t there be a digitized training somatosensory platform for AI agents, to recognize and interpret the myriad of tactile stimuli to grasp onto the experience of a physical world? The somatosensory system is the part of the central nervous system that is involved with decoding a wide range of tactile stimuli comprising object recognition, texture discrimination, sensory-motor feedback and eventually inter-social communication exchange—for our perception and reaction to stimuli originating outside and inside of our body and for the perception and control of body position and balance. One of the more essential aspects of general intelligence that gives us a common sense of understanding the world is being placed in the environment and being able to interact with things in the world—embedded in all of us is the instinctual ability of telling apart any mechanical forces upon the skin (temperature, texture, intensity of the  tactile stimuli).

Our brain is indeed the core of all human thought and memory, constantly organizing, identifying, perceiving the environment that surrounds us and interpreting it through our senses, in a form of the data flow. And yet, studies have taught us that multiple senses can stimulate the central nervous center. (Only) estimated 78% of all perceived by brain data flow is visual, while the remaining part originates from sound (12%), touch (5%), smell (2.5%), and taste (2.5%)—and that is assuming that we deciphered all of the known senses. So by training general AI purely via its visual interaction, will we be getting a 78% general artificial intelligence? Enter the “embodied cognition” theory.

Embodied Cognition

Embodied cognition is a research theory that is generally all about the vast difference of having an active body and being situated in a structured environment adept to the kind of tasks that the brain has to perform in order to support adaptive task success. Here I refer to the team as the existence of a memory system that encodes data of agent’s motory and sensory competencies, stressing the importance of action for cognition, in such way that an agent is capable to tangibly interact with the physical world. The aspects of the agent's body beyond its brain play a significant causative and physically integral role in its cognitive processing. The only way to understand the mind, how it works,  and subsequently train it is to consider the body and what helps the body and mind to function as one.  

This approach is in line with a biological learning pattern based on “Darwinian selection” that proposes intelligence to be only be measured in the context of the surrounding environment of the organism studied: “…we must always consider the embodiment of any intelligent system. The preferred embodiment reflects that the mind and its surrounding environment (including the physical body of the individual) are inseparable and that intelligence only exists in the context of its surrounding environment.”

Stacked Neural Networks Must Emulate Evolution’s Hierarchical Complexity (Commons, 2008)

Current notions of neural networks (NNSs) are indeed based on the known evolutionary processes of executing tasks and share some properties of biological NNSs in the attempt to tackle general problems but as architecture inspiration thus without necessarily closer copying a real biological system. One of such first design steps is the advancement to develop AI NNSs, that can closely imitate general intelligence, follows the model of hierarchical complexity (HC), in terms of data acquisition. Stacked NNs based on this model could imitate evolution's environmental/behavioral processes and reinforcement learning (RL). However, computer-implemented systems or robots generally do not indicate generalized higher learning adaptivity—the capacity to go from learning ability to learning another without dedicated programming.

Established NNs are limited for two reasons. The first one of the problems is that AI models are based on the notions of Turing machines. Almost all AI models are based on words or text. But Turing machines are not enough to really produce intelligence. At the lowest stages of development, they need effectors that produce a variety of responses—movement, grasping, emoting, and so on. They must have extensive sensors to take in more from the environment. Even though Carpenter and Grossberg's (1990, 1992) neural networks were to model simple behavioral processes, however, the processes they were to model were too complex. This resulted in NNs that were relatively unstable and were not highly adaptable. When one looks at evolution, however, one sees that the first NNs that existed were, for example, in Aplysia, Cnidarians (Phylum Cnidaria), and worms. They were specialized to perform just a few tasks even though some general learning was possible.

Animals, including humans, pass through a series of ordered stages of development (see “Introduction to the Model of Hierarchical Complexity,” World Futures, 64: 444-451, 2008). Behaviors performed at each higher stage of development always successfully address task requirements that are more hierarchically complex than those required by the immediately preceding order of hierarchical complexity. Movement to a higher stage of development occurs by the brain combining, ordering, and transforming the behavior used at the preceding stage. This combining and ordering of behaviors thus must be non-arbitrary.

Somatosensory System Emulation

Neuroscience has discovered classification of specific regions, processes, and interactions down to molecular level for memory and thought reasoning. Neurons and synapses are both actively involved in thought and memory, and with the help of brain imaging technology (e.g. Magnetic Resonance Imaging (MRI), Nuclear Magnetic Resonance Imaging, or Magnetic Resonance Tomography (MRT)), brain activity can be analyzed at the molecular level. All perceived data in the brain is represented in the same way, through the electrical firing patterns of neurons. The learning mechanism is also the same: memories are constructed by strengthening the connections between neurons that fire together, using a biochemical process known as long-term potentiation. Recently atomic magnetometers have begun development of inexpensive and portable MRI instruments without large magnets (used in traditional MRI machines to image parts of the human anatomy, including the brain). There are over 10 billion neurons in the brain, each of which has synapses that are involved in memory and learning, which can also be analyzed by brain imaging methods, soon in-real time. It has been proven that new brain cells are created whenever one learns something new by physically interacting with their environment. Whenever stimuli in the environment or through a thought makes a significant enough impact on the brain perception, new neurons are created. During this process synapses carry on electro-chemical activities that directly reflect activity related to both memory and thought, from a tactile point of sensation. The sense of touch, weight, and all other tactile sensory stimuli need to be implemented as the concrete “it” value that is assigned to an agent by the nominal concept.  By reconstructing 3D neuroanatomy from molecular level data, sensory activity in the brain at the molecular level can be detected, measured, stored, and reconstructed of a subset of the neural projections, generated by an automated segmentation algorithm, to convey the neurocomputational sensation to an AI agent.  Existence of such somatosensory Universe-like database, focused on the training of AI agents, beyond visual interaction, may bring us closer to the 100% general AI.

Read More