More recently, cognitive psychology and artificial intelligence (AI) researchers have been motivated by the need to explore the concept of intuitive physics in infants’ object perception skills and understand whether further theoretical and practical applications in the field of artificial intelligence could be developed by linking intuitive physics’ approaches to the research area of AI—by building autonomous systems that learn and think like humans. A particular context of intuitive physics explored herein is the infants’ innate understanding of how inanimate objects persist in time and space or otherwise follow principles of persistence, inertia and gravity—the spatio-temporal configuration of physical concepts—soon after birth, occurring via the domain-specific perceptual causality (Caramazza & Shelton, 1998). The overview is structured around intuitive physics techniques using cognitive (neural) networks with the objective to harness our understanding of how artificial agents may emulate aspects of human (infants’) cognition into a general-purpose physics simulator for a wide range of everyday judgments and tasks.
Such neural networks (deep learning networks in particular) can be generally characterized by collectively-performing neural-network-style models organized in a number of layers of representation, followed by a process of gradually refining their connection strengths as more data is introduced. By mimicking the brain’s biological neural networks, computational models that rapidly learn, improve and apply their subsequent learning to new tasks in unstructured real-world environments can undoubtedly play a major role in enabling future software and hardware (robotic) systems to make better inferences from smaller amounts of training data.
On the general level, intuitive physics, naïve physics or folk physics (terms used here synonymously) is the universally similar human perception of fundamental physical phenomena, or an intuitive (innate) understanding all humans have about objects in the physical world. Further, intuitive physics is defined as "...the knowledge underlying the human ability to understand the physical environment and interact with objects and substances that undergo dynamic state changes, making at least approximate predictions about how observed events will unfold" (Kubricht, Holyoak & Lu, 2017).
During the past few decades, motivated by the technological advances (brain imaging, eye gaze detection and reaction time measurement in particular), several researchers have established guiding principles on how innate core concepts and principles constrain knowledge systems that emerge in the infants’ brain—principles of gravity, inertia, and persistence (with its corollaries of solidity, continuity, cohesion, boundedness, and unchangeableness)—by capturing empirical physiological data. To quantify infants’ innate reaction to a particular stimulus, researchers have relied on the concept of habituation, or a decrease in responsiveness to a stimulus after repeated exposure to the same stimulus (i.e., shows a diminished duration in total looking time of visual face, object or image recognition). Thus, habituation is operationalized as amount of time an infant allocates to stimuli with less familiar stimuli receive more attention—when new stimulus is introduced and perceived as different, the infant increases the duration of responding at the stimulus (Eimas, Siqueland, Juscyk, & Vigorito, 1971). In the context of intuitive physics, in order to understand how ubiquitous infants’ intuitive understanding is, developmental researchers rely on violation of expectation of physical phenomena. If infants understand the implicit rules, the more newly introduced stimulus violates his or her expectations, the more they will attend to it in an unexpected situation (suggesting that preference is associated with the infant's ability to discriminate between the two events).
Core principles
A variety of studies and theoretical work defined what physical principles are and explored how they are represented during human infancy. In particular, in the context of inertia, the principle invokes infants’ expectation of how objects in motion follow an uninterrupted path without sporadic changes in velocity or direction (Kochukhova & Gredeback, 2007; Luo, Kaufman & Baillargeon, 2009). In the context of gravity, the principle refers to infants’ expectation of how objects fall after being released (Needham & Baillargeon, 1993; Premack & Premack, 2003). Lastly, in the context of persistence, the principle guides infants’ expectation of how objects would obey continuity (objects cannot spontaneously appear or disappear into thin air), solidity (two solid objects cannot occupy the same space at the same time), cohesion (objects cannot spontaneously break apart as they move), fuse with another object (boundedness), or change shape, pattern, size, or color (unchangeableness) (Spelke et al., 1992; Spelke, Phillips & Woodward, 1995; Baillargeon, 2008). An extensive evidence that can be drawn from theories in the field of research on cognitive development in infancy aptly shows that, across a wide range of situations, infants can predict outcomes of physical interactions involving gravity, object permanence and conservation of shape and number as young as two months old (Spelke, 1990; Spelke, Phillips & Woodward, 1995).
The concept of continuity was originally proposed and described by Elizabeth Spelke, one of the cognitive psychologists who established the intuitive physics movement. Spelke defined and formalized various object perception experimental frameworks, such as occlusion and containment, both hinging on the continuity principle—infants’ innate recognition that objects exist continuously in time and space. As a continuous construct on the foundations of this existing knowledge, research work in the domain of early development could lead to further insights into how humans attain their physical knowledge across childhood, adolescence and adulthood. For example, in one of their early containment event tests, Hespos and Baillargeon demonstrated that infants shown a tall cylinder fitting into the tall container were unfazed by the expected physical outcome; contrarily, when infants were shown the tall cylinder placed into a much shorter cylindrical container, the unexpected outcome confounded them. These findings demonstrated that infants as young as two months expected that containers cannot hold objects that physically exceed them in height (Hespos & Baillargeon, 2001). In the occlusion event test example, infants’ object tracking mechanism was demonstrated by way of a moving toy mouse and a screen. The infants were first habituated by a toy moving back and forth behind a screen, then a part of the screen was removed to introduce the toy into infants’ view when moving; when the screen was removed, the test led infants of three months old to be surprised because the mouse failed to be hidden when behind the screen.
In the concept of solidity test, Baillargeon demonstrated that infants as young at three months of age, habituated to the expected event of a screen rotating from 0° to 180° back and forth until it was blocked by the placed box (causing it to reverse its direction and preventing from completing its full range of motion), looked longer at the unexpected event wherein the screen rotated up and then continued to rotate through the physical space where the box was positioned (Baillargeon, 1987).
Analogously to the findings demonstrating that infants are sensitive to violations of object solidity, the concept of cohesion captures infants’ ability to comprehend that objects are cohesive and bounded. Kestenbaum demonstrated that infants successfully understand partially overlapping boundaries or the boundaries of adjacent objects, dishabituated when objects’ boundaries cannot correspond in position within their actual physical limits (Kestenbaum, Termine, & Spelke, 1987).
Lastly, there has been converging evidence for infants at the age of two months and possibly earlier to have already developed object appearance-based expectations, such as an object does not spontaneously change its color, texture, shape or size. When infants at the age of six months were presented with an Elmo face, they were successfully able to discriminate a change in the area size of the Elmo face (Brannon, Lutz, & Cordes, 2006).
Innateness
Evidently, infants possess sophisticated cognitive ability seemingly early on to be able to discriminate between expected and unexpected object behavior and interaction. This innate knowledge of physical concepts has been argued to allow infants to track objects over time and discount physically implausible trajectories or states, contributing to flexible knowledge generalization to new tasks, surroundings and scenarios, which, one may assume in the evolutionary context, is iterated towards a more adaptive mechanism that would allow them to survive in new environments (Leslie & Keeble, 1987).
In this regard, the notion of innateness, first introduced by Plato, has long been the subject of debate in the psychology of intuitive physics. Previous studies have argued whether the human brain comes prewired with a network that precedes the development of cortical regions (or domain-specific connections)—connectivity precedes function—specialized for specific cognitive functions and inputs (e.g., ones that control face recognition, scene processing or spatial depth inference) (Kamps, Hendrix, Brennan & Dilks, 2019) versus whether specific cognitive functions arise collectively from accumulating visual inputs and experiences—function precedes connectivity (Arcaro & Livingstone, 2017). In one recent study, the researchers used resting-state functional magnetic resonance imaging (rs-fMRI), which measures the blood oxygenation level-dependent signal to evaluate spontaneous brain activity in a resting state, to assess brain region connections in infants as young as 27 days of age. The researchers reported that the face recognition and scene-processing cortical regions were interconnected, suggesting innateness caused the formation of domain-specific functional modules in the developing brain. Additional supporting studies, using auditory and tactile stimuli, have also shown discriminatory responses in congenitally blind adults, presenting evidence that face- and scene-sensitive regions develop in visual cortex without any input functions and, thus, may be innate (Büchel, Price, Frackowiak, & Friston, 1998). Contrary to the notion of connectivity precedes function, previous empirical work on infant monkeys has alternatively shown a discrepancy between the apparent innateness of visual maps and prewired domain-specific connections, suggesting experience caused the formation of domain-specific functional modules in the infant monkeys’ temporal lobe (Arcaro & Livingstone, 2017). Thus, the framework of intuitive physics, does not encompass nor is restricted merely to humans—often invoking similar cognitive expectations in other living species and even (subjected to training) computational models.
Intuitive physics and artificial intelligence
Despite recent progress in the field of artificial intelligence, humans are still arguably better than computational systems at performing general purpose reasoning and various broad object perception tasks, making inferences based on limited or no experience, such as in spatial layout understating, concept learning, concept prediction and more. The notion of intuitive physics has been a significant focus in the field of artificial intelligence research as part of the effort to extend the cognitive ability concepts of human knowledge to algorithmic-driven reasoning, decision-making or problem-solving. A fundamental challenge in the robotics and artificial intelligence fields today is building robots that can imitate human spatial or object inference actions and adapt to an everyday environment as successfully as an infant. Specifically, as a part of the recent advancement in artificial intelligence technologies, namely machine learning and deep learning, researchers have begun to explore how to build neural “intuitive physics” models that can make predictions about stability, collisions, forces and velocities from static and dynamic visual inputs, or interactions with a real or simulated environment. Such knowledge-based, probabilistic simulation models therefore could be both used to understand the cognitive and neural underpinning of naive physics in humans, but also to provide artificial intelligence systems (e.g. autonomous vehicles) with higher levels of perception, inference and reasoning capabilities.
Intuitive physics or spatio-temporal configuration of metaphysical concepts of objects—arrangements of objects, material classification of objects, motions of objects and substances or their lack thereof—are the fundamental building blocks of complex cognitive frameworks, leading to a desire of their further investigation, analysis and understanding. Generally, in the field of artificial intelligence specifically, there has been growing interest in looking at the origins and development of such frameworks, an attempt originally described by Hayes: "I propose the construction of a formalization of a sizable portion of common-sense knowledge about the everyday physical world: about objects, shape, space, movement, substances (solids and liquids), time..." (Hayes, 1985).
However, in the context of practical emulation of intuitive physics concepts for solving physics-related tasks, despite its potential benefits, the implementation and understanding of neural “intuitive physics” models in the computational settings are still not fully developed and focus mainly on controlled physics-engine reconstruction while, in contrast to the process of infant learning, also require a vast amount of training data as input. Given computational models’ existing narrow problem-solving ability to complete tasks precisely over and over again, the emulation of infants’ intuitive physics cognitive abilities can give technology researchers and developers the opportunity to potentially design physical solutions on a broader set of conditions, with less training data, resources and time (i.e., as it is currently required in the self-driving technology development areas). For deep networks trained on physics-related data input, it is yet to be shown whether models are able to correctly integrate object concepts and generalize acquired knowledge—general physical properties, forces and Newtonian dynamics—beyond training contexts in an unconstructed environment.
Future directions
It is desired to further continue attempts of integrating intuitive physics and deep learning models, specifically in the domain of object perception. By drawing a distinction between differences in infants’ knowledge acquisition abilities via an “intuitive physics engine” and artificial agents, such an engine one day could be adapted into existing and future deep learning networks. Even at a very young age, human infants seem to possess a remarkable (innate) set of skills to learn rich conceptual models. Whether such models can be successfully built into artificial systems with the type and quantity of data accessible to infants is not yet clear. However, the combination of intuitive physics and machine (deep) learning could be a significant step towards more human-like learning computational models.