Dean Mai 4/15/25 Dean Mai 4/15/25

Multi-Agent Systems with Rollback Mechanisms

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Take a financial institution handling millions of transactions a day. The workflow demands market analysis, regulatory compliance, trade execution, and ledger updates with each step reliant on different datasets, domain knowledge, and timing constraints. Trying to capture all of this within a single, monolithic AI model is impractical; the task requires decomposition into manageable subtasks, each handled by a tailored component. MAS offer exactly that. They formalize a modular approach, where autonomous agents handle specialized subtasks while coordinating toward shared objectives. Each agent operates with local context and local incentives, but participates in a global system dynamic. These systems are not just theoretical constructs but operational priorities, recalibrating how enterprises navigate complexity. But with that autonomy comes a new category of risk. AI systems don’t fail cleanly: a misclassification in trade validation or a small error in compliance tagging can ripple outward with real-world consequences—financial, legal, reputational. Rollback mechanisms serve as a counterbalance. They let systems reverse errors, revert to stable states, and preserve operational continuity. But as we embed more autonomy into core enterprise processes, rollback stops being a failsafe and starts becoming one more layer of coordination complexity.

Core Structure of MAS

A multi-agent system is, at its core, a combination of autonomous agents, each engineered for a narrow function, yet designed to operate in concert. In a supply chain setting for example, one agent might forecast demand using time-series analysis, another optimize inventory with constraint solvers, and a third schedule logistics via graph-based routing. These agents are modular, communicating through standardized interfaces—APIs, message queues like RabbitMQ, or shared caches like Redis—so that the system can scale and adapt. Coordination is handled by an orchestrator, typically implemented as a deterministic state machine, a graph-based framework like LangGraph, or a distributed controller atop Kubernetes. Its job is to enforce execution order and resolve dependencies, akin to a workflow engine. In trading systems, for example, this means ensuring that market analysis precedes trade execution, preventing premature actions on stale or incomplete information. State management underpins this coordination, with a shared context. It’s typically structured as documents in distributed stores like DynamoDB or MongoDB, or when stronger guarantees are needed, in systems like CockroachDB.

The analytical challenge lies in balancing modularity with coherence. Agents must operate independently to avoid bottlenecks, yet their outputs must align to prevent divergence. Distributed systems principles like event sourcing and consensus protocols become essential tools for maintaining system-level coherence without collapsing performance. In the context of enterprise applications, the necessity of robust rollback mechanisms within multi-agent systems cannot be overstated. These mechanisms are essential for preventing data corruption and inconsistencies that can arise from individual agent failures, software errors, or unexpected interactions. When one agent fails or behaves unexpectedly, the risk isn’t local. It propagates. For complex, multi-step tasks that involve the coordinated actions of numerous agents, reliable rollback capabilities ensure the integrity of the overall process, allowing the system to recover gracefully from partial failures without compromising the entire operation.

Rollback Mechanisms in MAS

The probabilistic outputs of AI agents, driven by models like fine-tuned LLMs or reinforcement learners, introduce uncertainty absent in deterministic software. A fraud detection agent might errantly flag a legitimate transaction, or an inventory agent might misallocate stock. Rollback mechanisms mitigate these risks by enabling the system to retract actions and restore prior states, drawing inspiration from database transactions but adapted to AI’s nuances.

The structure of rollback is a carefully engineered combination of processes, each contributing to the system’s ability to recover from errors with precision and minimal disruption. At its foundation lies the practice of periodically capturing state snapshots that encapsulate the system’s configuration—agent outputs, database records, and workflow variables. These snapshots form the recovery points, stable states the system can return to when things go sideways. They’re typically stored in durable, incrementally updatable systems like AWS S3 or ZFS, designed to balance reliability with performance overhead. Choosing how often to checkpoint is its own trade-off. Too frequent, and the system slows under the weight of constant I/O; too sparse, and you risk losing ground when things fail. To reduce snapshot resource demands, MAS can use differential snapshots (capturing only changes) or selectively logging critical states, balancing rollback needs with efficiency. It’s also worth noting that while rollback in AI-driven MAS inherits ideas from database transactions, it diverges quickly due to the probabilistic nature of AI outputs. Traditional rollbacks are deterministic: a set of rules reverses a known state change. In contrast, when agents act based on probabilistic models their outputs are often uncertain. A fraud detection agent might flag a legitimate transaction based on subtle statistical quirks. An inventory optimizer might misallocate stock due to noisy inputs. That’s why rollback in MAS often needs to be triggered by signals more nuanced than failure codes: confidence thresholds, anomaly scores, or model-based diagnostics like variational autoencoders (VAEs) can serve as indicators that something has gone off-track.

In modern MAS, every action is logged, complete with metadata like agent identifiers, timestamps, and input hashes via systems such as Apache Kafka. These logs do more than support debugging; they create a forensic trail of system behavior, essential for auditability and post-hoc analysis, particularly in regulated domains like finance and healthcare. Detecting when something has gone wrong in a system of autonomous agents isn’t always straightforward. It might involve checking outputs against hard-coded thresholds, leveraging statistical anomaly detection models like VAEs, or incorporating human-in-the-loop workflows to catch edge cases that models miss. Once identified, rollback decisions are coordinated by an orchestrator that draws on these logs and the system’s transactional history to determine what went wrong, when, and how to respond.

The rollback is a toolkit of strategies selected based on the failure mode and the system’s tolerance for disruption. One approach, compensating transactions, aims to undo actions by applying their logical inverse: a payment is reversed, a shipment is recalled. But compensating for AI-driven decisions means accounting for uncertainty. Confidence scores, ensemble agreement, or even retrospective model audits may be needed to confirm that an action was indeed faulty before undoing it. Another approach, state restoration, rolls the system back to a previously captured snapshot—resetting variables to a known-good configuration. This works well for clear-cut failures, like misallocated inventory, but it comes at a cost: any valid downstream work done since the snapshot may be lost. To avoid this, systems increasingly turn to partial rollbacks, surgically undoing only the affected steps while preserving valid state elsewhere. In a claims processing system, for instance, a misassigned medical code might be corrected without resetting the entire claim’s status, maintaining progress elsewhere in the workflow. But resilience in multi-agent systems isn’t just about recovering, it’s about recovering intelligently. In dynamic environments, reverting to a past state can be counterproductive if the context has shifted. Rollback strategies need to be context-aware, adapting to changes in data, workflows, or external systems. They need to ensure that the system is restored to a state that is still relevant and consistent with the current environmental conditions. Frameworks like ReAgent provide early demonstration on what this could look like: reversible collaborative reasoning across agents, with explicit backtracking and correction pathways. Instead of merely rolling back to a prior state, agents revise their reasoning in light of new evidence. By allowing agents to backtrack and correct their reasoning, such frameworks offer a form of intelligent rollback that is more nuanced than simply reverting to a prior state.

Building robust rollback in MAS requires adapting classical transactional principles—atomicity, consistency, isolation, durability (ACID)—to distributed AI contexts. Traditional databases enforce strict ACID guarantees through centralized control, but MAS often trade strict consistency for scalability, favoring eventual consistency in most interactions. Still, for critical operations, MAS can lean on distributed coordination techniques like two-phase commits or the Saga pattern to approximate ACID-like reliability without introducing system-wide bottlenecks. The Saga pattern, in particular, is designed to manage large, distributed transactions. It decomposes them into a sequence of smaller, independently executed steps, each scoped to a single agent. If something fails midway, compensating transactions are used to unwind the damage, rolling the system back to a coherent state without requiring every component to hold a lock on the global system state. This autonomy-first model aligns well with how MAS operate: each agent governs its own local logic, yet contributes to an eventually consistent global objective. Emerging frameworks like SagaLLM advance this further by tailoring saga-based coordination to LLM-powered agents, introducing rollback hooks that are not just state-aware but also constraint-sensitive, ensuring that even when agents fail or outputs drift, the system can recover coherently. These mechanisms help bridge the gap between high-capacity, probabilistic reasoning and the hard guarantees needed for enterprise-grade applications involving multiple autonomous agents.

To ground this, consider a large bank deploying an MAS for real-time fraud detection. The system might include a risk-scoring agent (such as a fine-tuned BERT model scoring transactions for risk), a compliance agent enforcing AML rules via symbolic logic, and a settlement agent updating ledger entries via blockchain APIs. A Kubernetes-based orchestrator sequences these agents, with Kafka streaming in transactional data and DynamoDB maintaining distributed state. Now suppose the fraud detection agent flags a routine payment as anomalous. The error is caught either via statistical anomaly detection or a human override and rollback is initiated. The orchestrator triggers a compensating transaction to reverse the ledger update, a snapshot is restored to reset the account state, and the incident is logged for regulatory audits. In parallel, the system might update its anomaly model or confidence thresholds—learning from the mistake rather than simply erasing it. And integrating these AI-native systems with legacy infrastructure adds another layer of complexity. Middleware like MuleSoft becomes essential, not just for translating data formats or bridging APIs, but for managing latency, preserving transactional coherence, and ensuring the MAS doesn’t break when it encounters the brittle assumptions baked into older systems.

The stochastic nature of AI makes rollback an inherently fuzzy process. A fraud detection agent might assign a 90% confidence score to a transaction and still be wrong. Static thresholds risk swinging too far in either direction: overreacting to benign anomalies or missing subtle but meaningful failures. While techniques like VAEs are often explored for anomaly detection, other methods, such as statistical process control or reinforcement learning, offer more adaptive approaches. These methods can calibrate rollback thresholds dynamically, tuning themselves in response to real-world system performance rather than hardcoded heuristics. Workflow topology also shapes rollback strategy. Directed acyclic graphs (DAGs) are the default abstraction for modeling MAS workflows, offering clear scoping of dependencies and rollback domains. But real-world workflows aren’t always acyclic. Cyclic dependencies, such as feedback loops between agents, require more nuanced handling. Cycle detection algorithms or formal methods like Petri nets become essential for understanding rollback boundaries: if an inventory agent fails, for instance, the system might need to reverse only downstream logistics actions, while preserving upstream demand forecasts. Tools like Apache Airflow or LangGraph implement this. What all this points to is a broader architectural principle: MAS design is as much about managing uncertainty and constraints as it is about building intelligence. The deeper challenge lies in formalizing these trade-offs—balancing latency versus consistency, memory versus compute, automation versus oversight—and translating them into robust architectures.

Versatile Applications

In supply chain management defined by uncertainty and interdependence, MAS can be deployed to optimize complex logistics networks, manage inventory levels dynamically, and improve communication and coordination between various stakeholders, including suppliers, manufacturers, and distributors. Rollback mechanisms are particularly valuable in this context for recovering from unexpected disruptions such as supplier failures, transportation delays, or sudden fluctuations in demand. If a critical supplier suddenly ceases operations, a MAS with rollback capabilities could revert to a previous state where perhaps alternate suppliers had been identified and contingencies pre-positioned, minimizing the impact on the production schedule. Similarly, if a major transportation route becomes unavailable due to unforeseen circumstances, the system could roll back to a prior plan and activate pre-arranged contingency routes. We’re already seeing this logic surface in MAS-ML frameworks that combine MAS with machine learning techniques to enable adaptive learning with structured coordination to give supply chains a form of situational memory.

Smart/advanced manufacturing environments, characterized by interconnected machines, autonomous robots, and intelligent control systems, stand to benefit even more. Here, MAS can coordinate the activities of robots on the assembly line, manage complex production schedules to account for shifting priorities, and optimize the allocation of manufacturing resources. Rollback mechanisms are crucial for ensuring the reliability and efficiency of these operations by providing a way to recover from equipment malfunctions, production errors, or unexpected changes in product specifications. If a robotic arm malfunctions during a high-precision weld, a rollback mechanism could revert the affected components to their prior state and resume the task to another available robot or a different production cell. The emerging concept of an Agent Computing Node (ACN) within multi-agent manufacturing systems offers a path for easy(ier) deployment of these capabilities. Embedding rollback at the ACN level could allow real-time scheduling decisions to unwind locally without disrupting global coherence, enabling factories that aren’t just smart, but more fault-tolerant by design.

In financial trading platforms, which operate in highly volatile and time-sensitive markets where milliseconds equate to millions and regulatory compliance is enforced in audit logs, MAS can serve as algorithmic engines behind trading, portfolio management, and real-time risk assessment. Rollback here effectively plays a dual role: operational safeguard and regulatory necessity. Rollback capabilities are essential for maintaining the accuracy and integrity of financial transactions, recovering from trading errors caused by software glitches or market anomalies, and mitigating the potential impact of extreme market volatility. If a trading algorithm executes a series of erroneous trades due to a sudden, unexpected market event, a rollback mechanism could reverse these trades and restore the affected accounts to their previous state, preventing significant financial losses. Frameworks like TradingAgents, which simulate institutional-grade MAS trading strategies, underscore the value of rollback not just as a corrective tool but as a mechanism for sustaining trust and interpretability in high-stakes environments.

In cybersecurity, multi-agent systems can be leveraged for automated threat detection, real-time analysis of network traffic for suspicious activities, and the coordination of defensive strategies to protect enterprise networks and data. MAS with rollback mechanisms are critical for enabling rapid recovery from cyberattacks, such as ransomware or data breaches, by restoring affected systems to a known clean state before the intrusion occurred. For example, if a malicious agent manages to infiltrate a network and compromise several systems, a rollback mechanism could restore those systems to a point in time before the breach, effectively neutralizing the attacker's actions and preventing further damage. Recent developments on Multi-Agent Deep Reinforcement Learning (MADRL) for autonomous cyber defense has begun to formalize this concept: “restore” as a deliberate, learnable action in a broader threat response strategy, highlighting the importance of rollback-like functionalities.

Looking Ahead

The ecosystem for MAS is evolving not just in capability, but also in topology with frameworks like AgentNet proposing fully decentralized paradigms where agents can evolve their capabilities and collaborate efficiently without relying on a central orchestrator. The challenge lies in coordinating these individual rollback actions in a way that maintains the integrity and consistency of the entire multi-agent system. When there’s no global conductor, how do you coordinate recovery in a way that preserves system-level integrity? There are recent directions exploring how to equip individual agents with the ability to rollback their actions locally and states autonomously, contributing to the system's overall resilience without relying on a centralized recovery mechanism.

Building scalable rollback mechanisms in large-scale MAS, which may involve hundreds or even thousands of autonomous agents operating in a distributed environment, is shaping up to be a significant systems challenge. The overhead associated with tracking state and logging messages to enable potential rollbacks starts to balloon as the number of agents and their interactions increase. Getting rollback to work at this scale requires new protocol designs that are not only efficient, but also resilient to partial failure and misalignment.

But the technical hurdles in enterprise settings are just one layer. There are still fundamental questions to be answered. Can rollback points be learned or inferred dynamically, tuned to the nature and scope of the disruption? What’s the right evaluation framework for rollback in MAS—do we optimize for system uptime, recovery speed, agent utility, or something else entirely? And how do we build mechanisms that allow for human intervention without diminishing the agents’ autonomy yet still ensure overall system safety and compliance?

More broadly, we need ways to verify the correctness and safety of these rollback systems under real-world constraints, not just in simulated testbeds, especially in enterprise deployments where agents often interact with physical infrastructure or third-party systems. As such, this becomes more of a question of system aliment based on varying internal business processes and constraints. For now, there’s still a gap between what we can build and what we should build—building rollback into MAS at scale requires more than just resilient code. It’s still a test of how well we can align autonomous systems in a reliable, secure, and meaningfully integrated way against partial failures, adversarial inputs, and rapidly changing operational contexts.

Dean Mai 4/6/25 Dean Mai 4/6/25

Garbage Collection Tuning In Large-Scale Enterprise Applications

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Consider a high-frequency trading platform processing orders in microseconds. After exhausting traditional performance levers (scaling cores, rebalancing threads, optimizing code paths), unexplained latency spikes persisted. The culprit? GC pauses—intermittent, multi-hundred-millisecond interruptions from the JVM's G1 collector. These delays, imperceptible in consumer applications, are catastrophic in environments where microseconds mean millions. Over months, the engineering team tuned G1, minimized allocations, and restructured the memory lifecycle. Pauses became predictable. The broader point is that GC, long relegated to the domain of implementation detail, is now functioning as an architectural constraint with competitive implications. In latency-sensitive domains, it functions less like background maintenance and more like market infrastructure. Organizations that treat it accordingly will find themselves with a structural advantage. Those that don’t risk falling behind.

Across the enterprise software landscape, memory management is undergoing a quiet but significant reframing. Major cloud providers—AWS, Google Cloud, and Azure—are increasingly standardizing on managed runtimes like Java, .NET, and Go, embedding them deeply across their platforms. Kubernetes clusters now routinely launch thousands of containers, each with its own runtime environment and independent garbage collector running behind the scenes. At the same time, workloads are growing more demanding—spanning machine learning inference, real-time analytics, and distributed databases. These are no longer the relatively simple web applications of the early 2000s, but complex, large-scale systems with highly variable allocation behavior. They are allocation-heavy, latency-sensitive, and highly bursty. As a result, the old mental ‘set a heap size, pick a collector, move on’ model for GC tuning is increasingly incompatible with modern workloads and breaking down. The market is beginning to demand more nuanced, adaptive approaches. In response, cloud vendors, consultancies, and open-source communities are actively exploring what modern memory management should look like at scale.

At its core, GC is an attempt to automate memory reclamation. It is the runtime’s mechanism for managing memory—cleaning up objects that are no longer in use. When memory is allocated for something like a trade order, a customer record, or a neural network layer, the GC eventually reclaims that space once it’s no longer needed. But the implementation is a compromise. In theory, this process is automatic and unobtrusive. In practice, it’s a delicate balancing act. The collector must determine when to run, how much memory to reclaim, and how to do so without significantly disrupting application performance. If it runs too frequently, it consumes valuable CPU resources. If it waits too long, applications can experience memory pressure and even out-of-memory errors. Traditional collection strategies—such as mark-and-sweep, generational, or copying collectors—each bring their own trade-offs. But today, much of the innovation is happening in newer collectors like G1, Shenandoah, ZGC, and Epsilon. These are purpose-built for scalability and low latency, targeting the kinds of workloads modern enterprises increasingly rely on. The challenge, however, is that these collectors are not truly plug-and-play. Their performance characteristics hinge on configuration details. Effective tuning often requires deep expertise and workload-specific knowledge—an area that’s quickly gaining attention as organizations push for more efficient and predictable performance at scale.

Take G1: the default garbage collector in modern Java. It follows a generational model, dividing the heap into young and old regions, but with a key innovation: it operates on fixed-size regions, allowing for incremental cleanup. The goal is to deliver predictable pause times—a crucial feature in enterprise environments where even a 500ms delay can have real financial impact. That said, G1 can be challenging to tune effectively. Engineers familiar with its inner workings know it offers a wide array of configuration options, each with meaningful trade-offs. Parameters like -XX:MaxGCPauseMillis allow developers to target specific latency thresholds, but aggressive settings can significantly reduce throughput. For instance, the JVM may shrink the heap or adjust survivor space sizes to meet pause goals, which can lead to increased GC frequency and higher allocation pressure. This often results in reduced throughput, especially under bursty or memory-intensive workloads. Achieving optimal performance typically requires balancing pause time targets with realistic expectations about allocation rates and heap sizing. Similarly, -XX:G1HeapRegionSize lets you adjust region granularity, but selecting an inappropriate value may lead to memory fragmentation or inefficient heap usage. Benchmark data from OpenJDK’s JMH suite, tested on a 64-core AWS Graviton3 instance, illustrates just how sensitive performance can be. In one case, an untuned G1 configuration resulted in 95th-percentile GC pauses of around 300ms. In one specific configuration and workload scenario, careful tuning reduced pauses significantly. The broader implication is clear: organizations with the expertise to deeply tune their runtimes unlock performance. Others leave it on the table.

Across the industry, runtime divergence is accelerating. .NET Core and Go are steadily gaining traction, particularly among cloud-native organizations. Each runtime brings its own approach to GC. The .NET CLR employs a generational collector with a server mode that strikes a good balance for throughput, but it tends to underperform in latency-sensitive environments. Go’s GC, on the other hand, is lightweight, concurrent, and optimized for low pause times—typically around 1ms or less (under typical workloads). However, it can struggle with memory-intensive applications due to its conservative approach to memory reclamation. Running a brief experiment with a Go-based microservice simulating a payment gateway (10,000 requests per second and a 1GB heap), with default settings, delivers 5ms pauses at the 99th percentile. By adjusting the GOMEMLIMIT setting to trigger more frequent cycles, it was possible to reduce pauses to 2ms, but this came at the cost of a 30% increase in memory usage (hough results will vary depending on workload characteristics). With many performance optimizations, these are the trade-offs and they’re workload-dependent.

Contemporary workloads are more erratic. Modern systems stream events, cache large working sets, and process thousands of concurrent requests. The traditional enterprise mainstay (CRUD applications interacting with relational databases) is being replaced by event-driven systems, streaming pipelines, and in-memory data grids. Technologies like Apache Kafka are now ubiquitous, processing massive volumes of logs, while Redis and Hazelcast are caching petabytes of state. These modern systems generate objects at a rapid pace, with highly variable allocation patterns: short-lived events, long-lived caches, and everything in between. In one case, a logistics company running a fleet management platform on Kubernetes, saw full GC pauses every few hours. Their Java pods were struggling with full garbage collections every few hours, caused by an influx of telemetry data. After switching to Shenandoah, Red Hat’s low-pause collector, they saw GC pauses drop from 1.2 seconds to just 50ms. However, the improvement came at a cost—CPU usage increased by 15%, and they needed to rebalance their cluster to prevent hotspots. This is becoming increasingly common: latency improvements now have architectural consequences.

Vendor strategies are also diverging. The major players—Oracle, Microsoft, and Google—are all aware that GC can be a pain point, though their approaches vary. Oracle is pushing ZGC in OpenJDK, a collector designed to deliver sub-millisecond pauses even on multi-terabyte heaps. It’s a compelling solution (benchmarks from Azul show it maintaining stable 0.5ms pauses on a 128GB heap under heavy load) but it can be somewhat finicky. It utilizes a modern kernel with huge pages enabled (doesn’t require them but performs better with them), and its reliance on concurrent compaction demands careful management to avoid excessive CPU usage. Microsoft’s .NET team has taken a more incremental approach, focusing on gradual improvements to the CLR’s garbage collector. While this strategy delivers steady progress, it lags behind the more radical redesigns seen in the Java ecosystem. Google’s Go runtime stands apart, with a GC built for simplicity and low-latency performance. It’s particularly popular with startups, though it can be challenging for enterprises with more complex memory management requirements. Meanwhile, niche players like Azul are carving out a unique space with custom JVMs. Their flagship product, Zing, combines ZGC-like performance (powered by Azul’s proprietary C4 collector comparable to ZGC in terms of pause times) with advanced diagnostics that many describe as exceptionally powerful. Azul’s “we tune it for you” value proposition seems to be resonating—their revenue grew over 95% over the past three years, according to their filings.

Consultancies are responding as well. The Big Four—Deloitte, PwC, EY, and KPMG—are increasingly building out teams with runtime expertise and now including GC tuning in digital transformation playbooks. Industry case studies illustrate the tangible benefits: one telco reportedly reduced its cloud spend by 20% by fine-tuning G1 across hundreds nodes, while a major retailer improved checkout latency by 100ms after migrating to Shenandoah. Smaller, more technically focused firms like ThoughtWorks are taking an even deeper approach, offering specialized profiling tools and tailored workshops for engineering teams. So runtime behavior is no longer a backend concern—it’s a P&L lever.

The open-source ecosystem plays a vital dual role in fueling the GC innovation while introducing complexity by fragmenting tooling. Many of today’s leading collectors such as Shenandoah, ZGC, and G1 emerged from OSS community-driven research efforts before becoming production-ready. However, a capability gap persists: tooling exists, but expertise is required to extract value from it. Utilities like VisualVM and Eclipse MAT provide valuable insights—heap dumps, allocation trends, and pause time metrics—but making sense of that data often requires significant experience and intuition. In one example, a 10GB heap dump from a synthetic workload revealed a memory leak caused by a misconfigured thread pool. While the tools surfaced the right signals, diagnosing and resolving the issue ultimately depended on hands-on expertise. Emerging projects like GCViewer and OpenTelemetry’s JVM metrics are improving visibility, but most enterprises still face a gap between data and diagnosis that’s increasingly monetized. For enterprises seeking turnkey solutions, the current open-source tooling often falls short. As a result, vendors and consultancies are stepping in to fill the gap—offering more polished, supported options, often at a premium.

One emerging trend worth watching: no-GC runtimes. Epsilon, a no-op collector available in OpenJDK, effectively disables garbage collection, allocating memory until exhaustion. While this approach is highly specialized, it has found a niche in environments where ultra-low latency is paramount, leverage it for short-lived, high-throughput workloads where every microsecond counts. It’s a tactical tool: no GC means no pauses, but also no safety net. In a simple benchmark of allocating 100 million objects on a 1GB heap, Epsilon delivered about 20% higher throughput than G1—in a synthetic, allocation-heavy workload designed to avoid GC interruptions—with no GC pauses until the heap was fully consumed. That said, this approach demands precise memory sizing, as there’s no safety net once the heap fills up. And since Epsilon does not actually perform GC, the JVM shuts down when the heap is exhausted. So in systems that handle large volumes of data and require high reliability, this behavior poses a significant risk. Running out of memory could lead to system crashes during critical operations, making it unsuitable for environments that demand continuous uptime and stability

Rust represents a divergence in runtime philosophy: its ownership model frontloads complexity in exchange for execution-time determinism. Its ownership model eliminates the need for garbage collection entirely, giving developers fine-grained control over memory. It’s gaining popularity in systems programming, though enterprise adoption remains slow—retraining teams accustomed to Java or .NET is often a multi-year effort. Still, these developments are prompting a quiet reevaluation in some corners of the industry. Perhaps the challenge isn’t just tuning GC, it’s rethinking whether we need it at all in certain contexts.

Directionally, GC is now part of the performance stack, not a postscript. The enterprise software market appears to be at an inflection point. Due to AI workloads, latency and throughput are no longer differentiators; there’s a growing shift toward predictable performance and manual memory control. In this landscape, GC is emerging as a more visible and persistent bottleneck. Organizations that invest in performance, whether through specialized talent, intelligent tooling, or strategic vendor partnerships, stand to gain a meaningful advantage. Cloud providers will continue refining their managed runtimes with smarter defaults, but the biggest performance gains will likely come from deeper customization. Consultancies are expected to expand GC optimization as a service offering, and we’ll likely see more specialized vendors like Azul carving out space at the edges. Open-source innovation will remain strong, though the gap between powerful raw tools and enterprise-ready solutions may continue to grow. And in the background, there may be a gradual shift toward no-GC alternatives as workloads evolve in complexity and scale. Hardware changes (e.g., AWS Graviton) amplify memory management pressure due to higher parallelism; with more cores there are more objects, and more stress on memory management systems. Ultimately, managed runtimes will improve, but improvements will mostly serve the median case. High-performance outliers will remain underserved—fertile ground for optimization vendors and open-source innovation.

For now, GC tuning doesn’t make headlines, but it does shape the systems that do as it increasingly defines the boundary between efficient, scalable systems and costly, brittle ones. The organizations that master memory will move faster, spend less, and scale cleaner. Those that don’t may find themselves playing catch-up—wondering why performance lags and operational expenses continue to climb. GC isn’t a solved problem. It’s a leverage point—in a market this dynamic, even subtle shifts in infrastructure performance can have a meaningful impact over time.

Dean Mai 12/13/24 Dean Mai 12/13/24

Specialization and Modularity in AI Architecture with Multi-Agent Systems

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations.

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations. They are inefficient for specialized tasks, requiring significant resources for even simple queries, and can be cumbersome to update and scale. Mono-LLMs are difficult to scale because every improvement impacts the entire system, leading to complex update cycles and reduced agility. Multi-agent systems, on the other hand, introduce a more modular and task-specific approach, enabling specialized agents to handle discrete problems with greater efficiency and adaptability.

This modularity is particularly valuable in enterprise settings, where the range of tasks—data analysis, decision support, workflow automation—requires diverse expertise. Multi-agent systems make it possible to deploy agents with specific capabilities, such as generating code, providing real-time insights, or managing system resources. For example, a compiler agent in an MAS setup is not just responsible for executing code but also participates in optimizing the process. By incorporating real-time feedback, the compiler can adapt its execution strategies, correct errors, and fine-tune outputs based on the context of the task. This is especially useful for software teams working on rapidly evolving projects, where the ability to test, debug, and iterate efficiently can translate directly into faster product cycles.

Feedback systems are another critical component of MAS, enabling these systems to adapt on the fly. In traditional setups, feedback loops are often reactive—errors are identified post hoc, and adjustments are made later. MAS integrate feedback as part of their operational core, allowing agents to refine their behavior in real-time. This capability is particularly useful in scenarios where decisions must be made quickly and with incomplete information, such as supply chain logistics or financial forecasting. By learning from each interaction, agents improve their accuracy and relevance, making them more effective collaborators in decision-making processes.

Memory management is where MAS ultimately demonstrate practical improvements. Instead of relying on static memory allocation, which can lead to inefficiencies in resource use, MAS employ predictive memory strategies. These strategies allow agents to anticipate their memory needs based on past behavior and current workloads, ensuring that resources are allocated efficiently. For enterprises, this means systems that can handle complex, data-heavy tasks without bottlenecks or delays, whether it’s processing customer data or running simulations for product design.

Collaboration among agents is central to the success of MAS. Inter-agent learning protocols facilitate this by creating standardized ways for agents to share knowledge and insights. For instance, a code-generation agent might identify a useful pattern during its operations and share it with a related testing agent, which could then use that information to improve its validation process. This kind of knowledge-sharing reduces redundancy and accelerates problem-solving, making the entire system more efficient. Additionally, intelligent cleanup mechanisms ensure that obsolete or redundant data is eliminated without disrupting ongoing operations, balancing resource utilization and system stability. Advanced memory management thus becomes a cornerstone of the MAS architecture, enabling the system to scale efficiently while maintaining responsiveness. It also makes MAS particularly well-suited for environments where cross-functional tasks are the norm, such as coordinating between sales, operations, and customer service in a large organization.

The infrastructure supporting MAS is designed to make these systems practical for enterprise use. Agent authentication mechanisms ensure that only authorized agents interact within the system, reducing security risks. Integration platforms enable seamless connections between agents and external tools, such as APIs or third-party services, while specialized runtime environments optimize the performance of AI-generated code. In practice, these features mean enterprises can deploy MAS without requiring a complete overhaul of their existing tech stack, making adoption more feasible and less disruptive.

Consider a retail operation looking to improve its supply chain. With MAS, the system could deploy agents to predict demand fluctuations, optimize inventory levels, and automate vendor negotiations, all while sharing data across the network to ensure alignment. Similarly, in a software development context, MAS can streamline workflows by coordinating code generation, debugging, and deployment, allowing teams to focus on strategic decisions rather than repetitive tasks.

What makes MAS particularly compelling is their ability to evolve alongside the organizations they serve. As new challenges emerge, agents can be updated or added without disrupting the entire system. This modularity makes MAS a practical solution for enterprises navigating the rapid pace of technological change. By focusing on specific, well-defined tasks and integrating seamlessly with existing workflows, MAS provide a scalable, adaptable framework that supports real-world operations.

This shift to multi-agent systems is not about replacing existing tools but enhancing them. By breaking down complex problems into manageable pieces and assigning them to specialized agents, MAS make it easier for enterprises to tackle their most pressing challenges. These systems are built to integrate, adapt, and grow, making them a practical and valuable addition to the toolkit of modern organizations.

Dean Mai 11/26/24 Dean Mai 11/26/24

Adopting Function-as-a-Service (FaaS) for AI workflows

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences.

Function-as-a-Service (FaaS) stands at the crossroads of cloud computing innovation and the evolving needs of modern application development. It isn’t just an incremental improvement over existing paradigms; it is an entirely new mode of thinking about computation, resources, and scale. In a world where technology continues to demand agility and abstraction, FaaS offers a lens to rethink how software operates in a fundamentally event-driven, modular, and reactive manner.

At its essence, FaaS enables developers to execute isolated, stateless functions without concern for the underlying infrastructure. The abstraction here is not superficial but structural. Traditional cloud models like Infrastructure-as-a-Service (IaaS) or even Platform-as-a-Service (PaaS) hinge on predefined notions of persistence—instances, containers, or platforms that remain idle, waiting for tasks. FaaS discards this legacy. Instead, computation occurs as a series of discrete events, each consuming resources only for the moment it executes. This operational principle aligns deeply with the physics of computation itself: using resources only when causally necessary.

To fully grasp the implications of FaaS, consider its architecture. The foundational layer is virtualization, which isolates individual functions. Historically, the field has relied on virtualization techniques like hypervisors and container orchestration to allocate resources effectively. FaaS narrows this focus further. Lightweight microVMs and unikernels are emerging as dominant trends, optimized to ensure rapid cold starts and reduced resource overhead. However, this comes at a cost: such architectures often sacrifice flexibility, requiring developers to operate within tightly controlled parameters of execution.

Above this virtualization layer is the encapsulation layer, which transforms FaaS into something that developers can tangibly work with. The challenge here is not merely technical but conceptual. Cold starts—delays caused by initializing environments from scratch—represent a fundamental bottleneck. Various techniques, such as checkpointing, prewarming, and even speculative execution, seek to address this issue. Yet, each of these solutions introduces trade-offs. Speculative prewarming may solve latency for a subset of tasks but at the cost of wasted compute. This tension exemplifies the core dynamism of FaaS: every abstraction must be balanced against the inescapable physics of finite resources.

The orchestration layer introduces complexity. Once a simple scheduling problem, orchestration in FaaS becomes a fluid, real-time process of managing unpredictable workloads. Tasks do not arrive sequentially but chaotically, each demanding isolated execution while being part of larger workflows. Systems like Kubernetes, originally built for containers, are evolving to handle this flux. In FaaS, orchestration must not only schedule tasks efficiently but also anticipate failure modes and latency spikes that could disrupt downstream systems. This is particularly critical for AI applications, where real-time responsiveness often defines the product’s value.

The final piece of the puzzle is the coordination layer, where FaaS bridges with Backend-as-a-Service (BaaS) components. Here, stateless functions are augmented with stateful abstractions—databases, message queues, storage layers. This synthesis enables FaaS to transcend its stateless nature, allowing developers to compose complex workflows. However, this dependency on external systems introduces fragility. Latency and failure are not isolated to the function execution itself but ripple across the entire ecosystem. This creates a fascinating systems-level challenge: how to design architectures that are both modular and resilient under stress.

What makes FaaS particularly significant is its impact on enterprise AI development. The state of AI today demands systems that are elastic, cost-efficient, and capable of real-time decision-making. FaaS fits naturally into this paradigm. Training a machine learning model may remain the domain of large-scale, distributed clusters, but serving inferences is a different challenge altogether. With FaaS, inference pipelines can scale dynamically, handling sporadic spikes in demand without pre-provisioning costly infrastructure. This elasticity fundamentally changes the economics of deploying AI systems, particularly in industries where demand patterns are unpredictable.

Cost is another dimension where FaaS aligns with the economics of AI. The pay-as-you-go billing model eliminates the sunk cost of idle compute. Consider a fraud detection system in finance: the model is invoked only when a transaction occurs. Under traditional models, the infrastructure to handle such transactions would remain operational regardless of workload. FaaS eliminates this inefficiency, ensuring that resources are consumed strictly in proportion to demand. However, this efficiency can sometimes obscure the complexities of cost prediction. Variability in workload execution times or dependency latencies can lead to unexpected billing spikes, a challenge enterprises are still learning to navigate.

Timeouts also impose a hard ceiling on execution in most FaaS environments, often measured in seconds or minutes. For many AI tasks—especially inference pipelines processing large inputs or models requiring nontrivial preprocessing—these limits can become a structural constraint rather than a simple runtime edge case. Timeouts force developers to split logic across multiple functions, offload parts of computation to external services, or preemptively trim the complexity of their models. These are engineering compromises driven not by the shape of the problem, but by the shape of the platform.

Perhaps the most profound impact of FaaS on AI is its ability to reduce cognitive overhead for developers. By abstracting infrastructure management, FaaS enables teams to iterate on ideas without being burdened by operational concerns. This freedom is particularly valuable in AI, where rapid experimentation often leads to breakthroughs. Deploying a sentiment analysis model or an anomaly detection system no longer requires provisioning servers, configuring environments, or maintaining uptime. Instead, developers can focus purely on refining their models and algorithms.

But the story of FaaS is not without challenges. The reliance on statelessness, while simplifying scaling, introduces new complexities in state management. AI applications often require shared state, whether in the form of session data, user context, or intermediate results. Externalizing this state to distributed storage or databases adds latency and fragility. While innovations in distributed caching and event-driven state reconciliation offer partial solutions, they remain imperfect. The dream of a truly stateful FaaS model—one that maintains the benefits of statelessness while enabling efficient state sharing—remains an open research frontier.

Cold start latency is another unsolved problem. AI systems that rely on real-time inference cannot tolerate delays introduced by environment initialization. For example, a voice assistant processing user queries needs to respond instantly; any delay breaks the illusion of interactivity. Techniques like prewarming instances or relying on lightweight runtime environments mitigate this issue but cannot eliminate it entirely. The physics of computation imposes hard limits on how quickly environments can be instantiated, particularly when security isolation is required.

Vendor lock-in is a systemic issue that pervades FaaS adoption where currently each cloud provider builds proprietary abstractions, tying developers to specific APIs, runtimes, and pricing models. While open-source projects like Knative and OpenFaaS aim to create portable alternatives, they struggle to match the integration depth and ecosystem maturity of their commercial counterparts. This tension between portability and convenience is a manifestation of the broader dynamics in cloud computing.

Looking ahead, the future of FaaS I believe will be defined by its integration with edge computing. As computation migrates closer to the source of data generation, the principles of FaaS—modularity, event-driven execution, ephemeral state—become increasingly relevant. AI models deployed on edge devices, from autonomous vehicles to smart cameras, will rely on FaaS-like paradigms to manage local inference tasks. This shift will not only redefine the boundaries of FaaS but also force the development of new orchestration and coordination mechanisms capable of operating in highly distributed environments.

In reflecting on FaaS, one cannot ignore its broader almost philosophical implications. At its heart, FaaS is an argument about the nature of computation: that it is not a continuous resource to be managed but a series of discrete events to be orchestrated. This shift reframes the role of software itself, not as a persistent entity but as a dynamic, ephemeral phenomenon.

Dean Mai 6/29/24 Dean Mai 6/29/24

Architectural Paradigms for Scalable Unstructured Data Processing in Enterprise

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences.

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences. The rise of generative AI and large language models (LLMs) has further emphasized the importance of effectively managing unstructured data. These models require vast amounts of diverse, high-quality data for training and fine-tuning. Additionally, techniques like retrieval-augmented generation (RAG) rely on the ability to efficiently search and retrieve relevant information from large unstructured datasets.

Architectural Considerations for Unstructured Data Systems In Enterprises

Data Ingestion and Processing Architecture. The first challenge in dealing with unstructured data is ingestion. Unlike structured data, which can be easily loaded into relational databases, unstructured data requires specialized processing pipelines. These pipelines must be capable of handling a variety of data formats and sources, often in real-time or near-real-time, and at massive scale. For modern global enterprises, it’s crucial to design the ingestion architecture with global distribution in mind.‍

Text-based Data. Natural language processing (NLP) techniques are essential for processing text-based data. This includes tokenization, part-of-speech tagging, named entity recognition, and sentiment analysis. Modern NLP pipelines often leverage deep learning models, such as BERT or GPT, which can capture complex linguistic patterns and context. At enterprise scale, these models may need to be deployed across distributed clusters to handle the volume of incoming data. Startups like Hugging Face provide transformer-based models that can be fine-tuned for specific enterprise needs, enabling sophisticated text analysis and generation capabilities.

Image and Video Data. Computer vision algorithms are necessary for processing image and video data. These may include convolutional neural networks (CNNs) for image classification and object detection, or more advanced architectures like Vision Transformers (ViT) for tasks requiring understanding of spatial relationships. Processing video data, in particular, requires significant computational resources and may benefit from GPU acceleration. Notable startups such as OpenCV.ai are innovating in this space by providing open-source computer vision libraries and tools that can be integrated into enterprise workflows. Companies like Roboflow and Encord offer an end-to-end computer vision platform providing tools for data labeling, augmentation, and model training, making it easier for enterprises to build custom computer vision models. Their open-source YOLOv5 implementation has gained significant traction in the developer community. Voxel51 is tackling unstructured data retrieval in computer vision with their open-source FiftyOne platform, which enables efficient management, curation, and analysis of large-scale image and video datasets. Coactive is leveraging unstructured data retrieval across multiple modalities with their neural database technology, designed to efficiently store and query diverse data types including text, images, and sensor data.
Audio Data. Audio data presents its own set of challenges, requiring speech-to-text conversion for spoken content and specialized audio analysis techniques for non-speech sounds. Deep learning models like wav2vec and HuBERT have shown promising results in this domain. For enterprises dealing with large volumes of audio data, such as call center recordings, implementing a distributed audio processing pipeline is crucial. Companies like Deepgram and AssemblyAI are leveraging end-to-end deep learning models to provide accurate and scalable speech recognition solutions.

To handle the diverse nature of unstructured data, organizations should consider implementing a modular, event-driven ingestion architecture. This could involve using Apache Kafka or Apache Pulsar for real-time data streaming, coupled with specialized processors for each data type. RedPanda built an open-source data streaming platform designed to replace Apache Kafka with lower latency and higher throughput. Containerization technologies like Docker and orchestration platforms like Kubernetes can provide the flexibility needed to scale and manage these diverse processing pipelines. Graphlit build a data platform designed for spatial and unstructured data files automating complex data workflows, including data ingestion, knowledge extraction, LLM conversations, semantic search, and application integrations.

Data Storage and Retrieval. Traditional relational databases are ill-suited for storing and querying large volumes of unstructured data. Instead, organizations must consider a range of specialized storage solutions. For raw unstructured data, object storage systems like Amazon S3, Google Cloud Storage, or Azure Blob Storage provide scalable and cost-effective options. These systems can handle petabytes of data and support features like versioning and lifecycle management. MinIO developed an open-source, high-performance, distributed object storage system designed for large-scale unstructured data. For semi-structured data, document databases like MongoDB or Couchbase offer flexible schemas and efficient querying capabilities. These are particularly useful for storing JSON-like data structures extracted from unstructured sources. SurrealDB is a multi-model, cloud-ready database allows developers and organizations to meet the needs of their applications, without needing to worry about scalability or keeping data consistent across multiple different database platforms, making it suitable for modern and traditional applications. As machine learning models increasingly represent data as high-dimensional vectors, vector databases have emerged as a crucial component of the unstructured data stack. Systems like LanceDB, Marqo, Milvus, and Vespa are designed to efficiently store and query these vector representations, enabling semantic search and similarity-based retrieval. For data with complex relationships, graph databases like Neo4j or Amazon Neptune can be valuable. These are particularly useful for representing knowledge extracted from unstructured text, allowing for efficient traversal of relationships between entities. TerminusDB, an open-source graph database, can be used for representing and querying complex relationships extracted from unstructured text. This approach is particularly useful for enterprises needing to traverse relationships between entities efficiently. Kumo AI developed graph machine learning-centered AI platform that uses LLMs and graph neural networks (GNNs) designed to manage large-scale data warehouses, integrating ML between modern cloud data warehouses and AI algorithms infrastructure to simplify the training and deployment of models on both structured and unstructured data, enabling businesses to make faster, simpler, and more accurate predictions. Roe AI has built AI-powered data warehouse to store, process, and query unstructured data like documents, websites, images, videos, and audio by providing multi-modal data extraction, data classification and multi-modal RAG via Roe’s SQL engine.

When designing the storage architecture, it’s important to consider a hybrid approach that combines these different storage types. For example, raw data might be stored in object storage, processed information in document databases, vector representations in vector databases, and extracted relationships in graph databases. This multi-modal storage approach allows for efficient handling of different query patterns and use cases.

Data Processing and Analytics. Processing unstructured data at scale requires distributed computing frameworks capable of handling large volumes of data. Apache Spark remains a popular choice due to its versatility and extensive ecosystem. For more specialized workloads, frameworks like Ray are gaining traction, particularly for distributed machine learning tasks. For real-time processing, stream processing frameworks like Apache Flink or Kafka Streams can be employed. These allow for continuous processing of incoming unstructured data, enabling real-time analytics and event-driven architectures. When it comes to analytics, traditional SQL-based approaches are often insufficient for unstructured data. Instead, architecture teams should consider implementing a combination of techniques including (i) engines like Elasticsearch or Apache Solr provide powerful capabilities for searching and analyzing text-based unstructured data; (ii) for tasks like classification, clustering, and anomaly detection, machine learning models can be deployed on processed unstructured data. Frameworks like TensorFlow and PyTorch, along with managed services like Google Cloud AI Platform or Amazon SageMaker, can be used to train and deploy these models at scale; (iii) for data stored in graph databases, specialized graph analytics algorithms can uncover complex patterns and relationships. OmniAI developed a data transformation platform designed to convert unstructured data into accurate, tabular insights while maintaining control over their data and infrastructure. Roe AI

To enable flexible analytics across different data types and storage systems, architects should consider implementing a data virtualization layer. Technologies like Presto or Dremio can provide a unified SQL interface across diverse data sources, simplifying analytics workflows. Vectorize is developing a streaming database for real-time AI applications to bridge the gap between traditional databases and the needs of modern AI systems, enabling real-time feature engineering and inference.

Data Governance and Security. Unstructured data often contains sensitive information, making data governance and security critical considerations. Organizations must implement robust mechanisms for data discovery, classification, and access control. Automated data discovery and classification tools such as Sentra Security, powered by machine learning, can scan unstructured data to identify sensitive information and apply appropriate tags. These tags can then be used to enforce access policies and data retention rules. For access control, attribute-based access control (ABAC) systems are well-suited to the complex nature of unstructured data. ABAC allows for fine-grained access policies based on attributes of the data, the user, and the environment. Encryption is another critical component of securing unstructured data. This includes both encryption at rest and in transit. For particularly sensitive data, consider implementing field-level encryption, where individual elements within unstructured documents are encrypted separately.

Emerging Technologies and Approaches

LLMs like GPT-3 and its successors have demonstrated remarkable capabilities in understanding and generating human-like text. These models can be leveraged for a wide range of tasks, from text classification and summarization to question answering and content generation. For enterprises, the key challenge remains adapting these models to domain-specific tasks and data. Techniques like fine-tuning and prompt engineering allow for customization of pre-trained models. Additionally, approaches like retrieval-augmented generation (RAG) enable these models to leverage enterprise-specific knowledge bases, improving their accuracy and relevance. Implementing a modular architecture that allows for easy integration of different LLMs and fine-tuned variants might involve setting up model serving infrastructure using frameworks like TensorFlow Serving or Triton Inference Server, coupled with a caching layer to improve response times. Companies like Unstructured use open-source libraries and application programming interfaces to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines, enabling clients to transform simple data into language data and write it to a destination (vector database or otherwise).

Multi-modal AI Models. As enterprises deal with diverse types of unstructured data, multi-modal AI models that can process and understand different data types simultaneously are becoming increasingly important. Models like CLIP (Contrastive Language-Image Pre-training) demonstrate the potential of combining text and image understanding. To future proof organizational agility, systems need to be designed to handle multi-modal data inputs and outputs, potentially leveraging specialized hardware like GPUs or TPUs for efficient processing as well as implementing a pipeline architecture that allows for parallel processing of different modalities, with a fusion layer that combines the results. Adept AI is working on AI models that can interact with software interfaces, potentially changing how enterprises interact with their digital tools, combining language understanding with the ability to take actions in software environments. In the defense sector, Helsing AI is developing advanced AI systems for defense and national security applications that process and analyze vast amounts of unstructured sensor data in real-time, integrating information from diverse sources such as radar, electro-optical sensors, and signals intelligence to provide actionable insights in complex operational environments. In industrial and manufacturing sectors, Archetype AI offers a multimodal AI foundation model that fuses real-time sensor data with natural language, enabling individuals and organizations to ask open-ended questions about their surroundings and take informed action for improvement.

Federated Learning. For enterprises dealing with sensitive or distributed unstructured data, federated learning offers a way to train models without centralizing the data. This approach allows models to be trained across multiple decentralized devices or servers holding local data samples, without exchanging them. Implementing federated learning however requires careful design, including mechanisms for model aggregation, secure communication, and differential privacy to protect individual data points. Frameworks like TensorFlow Federated or PySyft can be used to implement federated learning systems. For example, in the space of federated learning for healthcare and life sciences, Owkin enables collaborative research on sensitive medical data without compromising privacy.

Synthetic Data Generation. The scarcity of labeled unstructured data for specific domains or tasks can be a significant challenge. Synthetic data generation, often powered by generative adversarial networks (GANs) or other generative models, may offer a solution to this problem. Incorporating synthetic data generation pipelines into machine learning workflows might involve setting up separate infrastructure for data generation and validation, ensuring that synthetic data matches the characteristics of real data while avoiding potential biases. RAIC Labs is developing technology for rapid AI modeling with minimal data. Their RAIC (Rapid Automatic Image Categorization) platform can generate and categorize synthetic data, potentially solving the cold start problem for many machine learning applications.

Knowledge Graphs. Knowledge graphs offer a powerful way to represent and reason about information extracted from unstructured data. Startups like Diffbot are developing automated knowledge graph construction tools that use natural language processing, entity resolution, and relationship extraction techniques to build rich knowledge graphs. These graphs capture the semantics of unstructured data, enabling efficient querying and reasoning about the relationships between entities. Implementing knowledge graphs involves (i) entity extraction and linking to identify and disambiguate entities mentioned in unstructured text; (ii) relationship extraction to determine the relationships between entities; (iii) ontology management to define and maintain the structure of the knowledge graph; and (iv) graph storage and querying for efficiently storing and querying the resulting graph structure. Businesses should consider using a combination of machine learning models for entity and relationship extraction, coupled with specialized graph databases for storage. Technologies like RDF (Resource Description Framework) and SPARQL can be used for semantic representation and querying.

While the potential of unstructured data is significant, several challenges must be addressed with most important are scalability, data quality and cost. Processing and analyzing large volumes of unstructured data requires significant computational resources. Systems must be designed that can scale horizontally, leveraging cloud resources and distributed computing frameworks. Unstructured data often contains noise, inconsistencies, and errors. Implementing robust data cleaning and validation pipelines is crucial for ensuring the quality of insights derived from this data. Galileo developed an engine that processes unlabeled data to automatically identify error patterns and data gaps in the model, enabling organizations to improve efficiencies, reduce costs, and mitigate data biases. Cleanlab developed an automated data-centric platform designed to help enterprises improve the quality of datasets, diagnose or fix issues and produce more reliable machine learning models by cleaning labels and supporting finding, quantifying, and learning data issues. Processing and storing large volumes of unstructured data can be expensive. Implementing data lifecycle management, tiered storage solutions, and cost optimization strategies is crucial for managing long-term costs. For example, Bem’s data interface transforms any input into ready-to-use data, eliminating the need for costly and time-consuming manual processes. Lastly, as machine learning models become more complex, ensuring interpretability of results becomes challenging. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can be incorporated into model serving pipelines to provide explanations for model predictions. Unstructured data also often contains sensitive information, and AI models trained on this data can perpetuate biases. Architects must implement mechanisms for bias detection and mitigation, as well as ensure compliance with data protection regulations.

Unstructured data presents both significant challenges and opportunities for enterprises. By implementing a robust architecture that can ingest, store, process, and analyze diverse types of unstructured data, enterprises can unlock valuable insights and drive innovation. Businesses must stay abreast of emerging technologies and approaches, continuously evolving their data infrastructure to handle the growing volume and complexity of unstructured data. By combining traditional data management techniques with cutting-edge AI and machine learning approaches, enterprises can build systems capable of extracting maximum value from their unstructured data assets. As the field continues to evolve rapidly, flexibility and adaptability should be key principles in any unstructured data architecture. By building modular, scalable systems that can incorporate new technologies and handle diverse data types, enterprises can position themselves to leverage the full potential of unstructured data in the years to come.

Dean Mai 1/11/23 Dean Mai 1/11/23

Edge Computing and the Internet of Things: Investing in the Future of Autonomy

One of the most ubiquitous technological advancements making its way into devices we use every single day is autonomy. Autonomous technology via the use of artificial intelligence (AI) and machine learning (ML) algorithms enables core functions without human interference. As the adoption of ML becomes more widespread, more businesses are using ML models to support mission-critical operational processes. This increasing reliance on ML has created a need for real-time capabilities to improve accuracy and reliability, as well as reduce the feedback loop.

One of the most ubiquitous technological advancements making its way into devices we use every single day is autonomy. Autonomous technology via the use of artificial intelligence (AI) and machine learning (ML) algorithms enables core functions without human interference. As the adoption of ML becomes more widespread, more businesses are using ML models to support mission-critical operational processes. This increasing reliance on ML has created a need for real-time capabilities to improve accuracy and reliability, as well as reduce the feedback loop.

Previously, chip computations were processed in the cloud rather than on-device; today, the AI/ML models required to complete these tasks are too large, costly and computationally hungry to be done locally. Instead, the technology relied on cloud computing, outsourcing data tasks to remote servers via the internet. While this was an adequate solution when IoT technology was in its infancy, it certainly wasn’t infallible—though proven to be a transformational tool for storing and processing data, cloud computing comes with its own performance and bandwidth limitations that aren’t well-suited for autonomy at scale, which needs nearly instantaneous reactions with minimal lag time. To-date, certain technologies have been limited by the parameters of cloud computing.

The Need for New Processing Units

The central processing units (CPUs) commonly used in traditional computing devices are not well-suited for AI workloads due to two main issues:

Latency in data fetching: AI workloads involve large amounts of data, and the cache memory in a CPU is too small to store all of it. As a result, the processor must constantly fetch data from dynamic random access memory (DRAM), which creates a significant bottleneck. While newer multicore CPU designs with multithreading capabilities can alleviate this issue to some extent, they are not sufficient on their own.
Latency in instruction fetching: In addition to the large volume of data, AI workloads require many repetitive matrix-vector operations. CPUs typically use single-instruction multiple data (SIMD) architectures, which means they must frequently fetch operational instructions from memory to be performed on the same dataset. The latest generation of AI processors aims to address these challenges through two approaches: (i) expanding the multicore design to allow thousands of threads to run concurrently, thereby fixing the latency in data fetching, or (ii) building processors with thousands of logic blocks, each preprogrammed to perform a specific matrix-vector operation, thereby fixing the latency in instruction fetching.

First introduced in 1980s, field programmable gate arrays (FPGAs) offered the benefit of being reprogrammable, which enabled them to gain traction in diverse industries like telecommunications, automotive, industrial, and consumer applications. In AI workloads, FPGAs fix latency associated with instruction fetching. FPGAs consist of tens of thousands of logic blocks, each of which is preprogrammed to carry out a specific matrix-vector operation. On the flip side, FPGAs are expensive, have large footprints, and are time-consuming to program.

Graphics processing units (GPUs) were initially developed in the 1990s to improve the speed of image processing for display devices. They have thousands of cores that enable efficient multithreading, which helps to reduce data fetching latency in AI workloads. GPUs are effective for tasks such as computer vision, where the same operations must be applied to many pixels. However, they have high power requirements and are not suitable for all types of edge applications.

Specialized chips, known as AI chips, are often used in data centers for training algorithms or making inferences. Although there are certain AI/ML processor architectures that are more energy-efficient than GPUs, they often only work with specific algorithms or utilize uncommon data types, like 4- and 2-bit integers or binarized neural networks. As a result, they lack the versatility to be used effectively in data centers with capital efficiency. Further, training algorithms requires significantly more computing power compared to making individual inferences, and batch-mode processing for inference can cause latency issues. The requirements for AI processing at the network edge, such as in robotics, Internet of Things (IoT) devices, smartphones, and wearables, can vary greatly and, in cases like the automotive industry, it is not feasible to send certain types of work to the cloud due to latency concerns.

Lastly, application specific integrated circuits (ASICs) are integrated circuits that are tailored to specific applications. Because the entire ASIC is dedicated to a narrow set of instructions, they are much faster than GPUs; however, they do not offer as much flexibility as GPUs or FPGAs in terms of being able to handle a wide range of applications. As a consequence, ASICs are increasingly gaining traction in handling AI workloads in the cloud with large companies like Amazon and Google. However, it is less likely that ASICs will find traction in edge computing because of the fragmented nature of applications and use cases.

The departure from single-threaded compute and the large volume of raw data generated today (making it impractical for continuous transfer) resulted in the emergence of edge computing, an expansion of cloud computing that addresses many of these shortcomings. Development of semiconductor manufacturing processes for ultra-small circuits (7nm and below) that pack more transistors onto a single chip allows faster processing speeds and higher levels of integration. This leads to significant improvements in performance, as well as reduced power consumption, enabling higher adoption of this technology for a wide range of edge applications.

Edge computing places resources closer to the end user or the device itself (at the “edge” of a network) rather than in a cloud data center that oversees data processing for a large physical area. Because this technology sits closer to the user and/or the device and doesn’t require the transfer of large amounts of data to a remote server, edge-powered chips increase performance speed, reduce lag time and ensure better data privacy. Additionally, since edge AI chips are physically smaller, they’re more affordable to produce and consume less power. As an added bonus, they also produce less heat, which is why fewer of our electronics get hot to the touch with extended use. AI/ML accelerators designed for use at the edge tend to have very low power consumption but are often specialized for specific applications such as audio processing, visual processing, object recognition, or collision avoidance. Today, this specialized focus can make it difficult for startups to achieve the necessary sales volume for success due to the market fragmentation.

Supporting mission-critical operational processes at the edge

The edge AI chip advantage proving to be arguably the most important to helping technology reach its full potential is its significantly faster operational and decision-making capabilities. Nearly every application in use today requires near-instantaneous response, whether to generate more optimal performance for a better user experience or to provide mission-critical reflex maneuvers that directly impact human safety. Even in non-critical applications, the increasing number of connected devices and equipment going online is causing bandwidth bottlenecks to become a deployment limitation, as current telecommunications networks may not have sufficient capacity to handle the data volume and velocity generated by these devices.

For example, from an industrial perspective, an automated manufacturing facility is expected to generate 4 petabytes of data every day. Even with the fastest (unattainable) 5G speeds of 10 Gbps, it would take days to transfer a day’s worth of data to the cloud. Additionally, the cost of transferring all this data at a rate of $0.40 per GB over 5G could reach as much as $1.6 million per day. And unsurprisingly, the autonomous vehicle industry will rely on the fastest, most efficient edge AI chips to ensure the quickest possible response times in a constantly-changing roadway environment — situations that can quite literally mean life and death for drivers and pedestrians alike.

Investing in Edge AI

Nearly every industry is now impacted by IoT technology, there is a $30 billion market for edge computing advancements. The AI chip industry alone is predicted to increase to more than $91 billion by 2025, up from $6 billion in 2018. Companies are racing to create the fastest, most efficient chips on the market, and only those operating with the highest levels of market and customer focus will see success.

As companies are increasingly faced with decisions regarding investment in new hardware for edge computing, staying nimble is key to a successful strategy. Given the rapid pace of innovation in the hardware landscape, companies seek to make decisions that provide both short-term flexibility, such as the ability to deploy many different types of machine learning models on a given chip, and long-term flexibility, such as the ability to future proof by easily switching between hardware types as they become available. Such strategies could typically include a mix of highly specific processors and more general-purpose processors like GPUs, software- and hardware-based edge computing to leverage the flexibility of software, and a combination of edge and cloud deployments to gain the benefits from both computing strategies.

The startup that is set out to simplify the choice of short-/long-term, compute-/power-constrained environments by getting an entirely new processor architecture off the ground is Quadric. Quadric is a licensable processor intellectual property (IP) company commercializing a fully-programmable architecture for on-device ML inference. The company built a cutting-edge processor instruction set that utilizes a highly parallel architecture that efficiently executes both machine learning “graph code” as well as conventional C/C++ signal processing code to provide fast and efficient processing of complex algorithms. Only one tool chain is required for scalar, vector, and matrix computations which are modelessly intermixed and executed on a single pipeline. Memory bandwidth is optimized by a single unified compilation stack that helps result in significant power minimization.

Quadric takes a software-first approach to its edge AI chips, creating an architecture that controls data flow and enables all software and AI processing to run on a single programmable core. This eliminates the need for other ancillary processing and software elements and blends the best of current processing methods to create a single, optimized general purpose neural processing unit (GPNPU).

The company recently announced its new Chimera™ GPNPU, a licensable IP (intellectual property) processor core for advanced custom silicon chips utilized in a vast array of end AI and ML applications. It is specifically tailored to accelerate neural network-based computations and is intended to be integrated into a variety of systems, including embedded devices, edge devices, and data center servers. The Chimera GPNPU is built using a scalable, modular architecture that allows the performance level to be customized to meet the specific needs of different applications.

One of the key features of the Chimera GPNPU is its support for high-precision arithmetic in addition to the conventional 8-bit precision integer support offered by most NPUs. It is capable of performing calculations with up to 16-bit precision, which is essential for ensuring the accuracy and reliability of neural network-based computations, as well as performing many DSP computations. The Chimera GPNPU supports a wide range of neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM) networks. As a fully C++ programmable architecture, a Chimera GPNPU can run any machine learning algorithm with any machine learning operator, offering the ultimate in flexible high-performance futureproofing.

Dean Mai 1/23/22 Dean Mai 1/23/22

Federated Machine Learning as a Distributed Architecture for Real-World Implementations

Present performance of machine learning systems—optimization of parameters, weights, biases—at least in part relies on large volumes of training data which, as any other competitive asset, is dispersed, distributed, or maintained by various R&D and business data owners, rather than being stored by a single central entity. Collaboratively training a machine learning (ML) model on such distributed data—federated learning, or FL—can result in a more accurate and robust model than any participant could train in isolation.

Present performance of machine learning systems—optimization of parameters, weights, biases—at least in part relies on large volumes of training data which, as any other competitive asset, is dispersed, distributed, or maintained by various R&D and business data owners, rather than being stored by a single central entity. Collaboratively training a machine learning (ML) model on such distributed data—federated learning, or FL—can result in a more accurate and robust model than any participant could train in isolation.

FL, also known as collaborative learning, is a method that trains an algorithm collaboratively across multiple decentralized edge devices (e.g., a device providing an entry point into enterprise or service provider core networks) or servers holding local data samples without exchanging them among the edge devices. The appeal of FL stems from its ability to provide near-real-time access to large amounts of data, without requiring the transfer of that data between remote devices. In a sense, this means that the data is not “distributed”, but rather is “federated” across the devices. This may sound similar to the concept of distributed computing, which refers to the use of multiple devices to perform a task, such as a computer, a smartphone, or any other edge device. However, in FL, the data is not shared between the devices, and therefore, each device holds its own data and calculates its own model. Such collaborative training is usually implemented by a coordinator/aggregator that oversees the participants, and can result in more robust and accurate ML models than any single participant could hope to train in isolation. However, the data owners are often unwilling (e.g., limited trust), unable (e.g., limited connectivity or communication resources), and/or legally prohibited (e.g., privacy laws, such as HIPAA, GDPR, CCPA, and local state laws) from openly sharing all or part of their individual data sources with each other. In FL, however, raw edge device data is not required to be shared with the server or among distinct separate organizations, which distinguishes FL from traditional distributed optimization by bringing it under the orchestration of a central server and also requires FL to contend with heterogeneous data.

Hence, in FL, the star topology is typically used, in which one central server coordinates the initialization, communication, and aggregation of the algorithms, and serves as the central place for the aggregation of model updates and model updates. In this design the local nodes have some degree of trust in this central server, but still maintain independent and have their own degree on control of whether they participate and take ownership over their local data, the central server does not have access to the original local data.

There are two types of FL: horizontal and vertical. Horizontal FL involves collaborative training on horizontally partitioned datasets (e.g., the participants' datasets have common, similar, and/or overlapping feature spaces and uncommon, dissimilar, and/or non-overlapping sample spaces). For instance, two competing banks might have different clients (e.g., different sample spaces) while having similar types of information about their clients, such as age, occupation, credit score, and so on (e.g., similar feature spaces). Vertical FL, on the other hand, involves collaborative training on vertically partitioned datasets (e.g., the participants' datasets have common, similar, and/or overlapping sample spaces and uncommon, dissimilar, and/or non-overlapping feature spaces). For instance, a bank and an online retailer might serve the same clients (e.g., similar sample spaces) while having different types of information about those clients (e.g., different feature spaces).

Nowadays, growing concerns and restrictions on data sharing and privacy, such as the GDPR of Europe and the Cyber Security Law of China, made it difficult, if not impossible, to transfer, merge and fuse data obtained from different data owners. With FL, a device on the edge can send potentially de-identified updates to a model instead of sharing the entirety of its raw data in order for the model to be updated. As a result, FL greatly reduces privacy concerns since the data never leaves these devices, just an encrypted, perturbed gradient of data leave. Such framework can be a useful tool for many different types of organizations, from companies who do not want to disclose proprietary data to the public, to developers who may want to build privacy-preserving AI applications, like chatbots.

One of the earlier applications of FL was mobile keyboard (next word) predictions; the details of what an individual has typed remains on the device, and isn’t shared with the cloud-based, machine learning provider. The provider can see securely aggregated summaries of what’s been typed and corrected, across many devices. But they can’t see the contents of what a user has typed. This protects individual people’s privacy, while improving predictions for everyone. This approach is also compatible with additional personalized learning that occurs on device.

While FL can be adopted to build models locally and may boost model performance by widening the amount of available training data, due to its reliance on global synchrony and data exchange, whether this technique can be deployed at scale or not, across multiple platforms in real-world applications, remains unclear (particularly if the devices or servers in the system are highly secured). The main challenge with federated learning is that it relies heavily on the secure execution of decentralized computing due to the many iterations of training and the the large number of devices this needs to be communicated to. As the communication overhead is networked and can be several orders of magnitude slower than local computation, the system requires reduction of the total number of communication rounds and the size of the transmitted messages. Further, support of both system heterogeneity (devices having highly dynamic and heterogeneous network, hardware, connection, and power availability) and data heterogeneity (data is generated by different users on different devices, and therefore may have different statistical distribution, or non-IID) is required to attain high performance. Classical statistics pose theoretical challenges to FL, as user device data collection and training defeats any guarantee or assumption that training data is independent and identically distributed, IID. This is a distinguishing feature of FL. The loss of the strong statistical guarantee allows the system with high dimensionality to make inferences about a wider population of data, including for example in training set samples collected by edge devices. Lastly, the algorithms used in federated learning are fundamentally different from the algorithms used in decentralized computing systems, such as the algorithms used in blockchain. If the devices in a federated learning system do not have the same privacy-preservation or security models (as those in traditional computing environments), then the system will likely perform poorly or not function at all. For added privacy, an additional optional layer can be added, like Secure Multi-party Computation (SMC), Differential Privacy, or Homomorphic Encryption, in the case that even the aggregated information in the form of model updates may also contain privacy-sensitive information. Handling privacy sensitive information is one of the main motivations behind the development of homomorphic encryption in federated learning systems. Homomorphic encryption uses mathematical operations without revealing the private key, or “secret key,” used to encrypt the data. Thus, homomorphic encryption can be used to process encrypted data without revealing the model parameters or the encrypted data to the device that executed the computation – the device can only learn the parameters of the model, and cannot decrypt the data. Without learning the model parameters, the server is unable to perform attack vectors, such as side-channel attacks, on the model. Yet functional encryption techniques can be far more computationally efficient than homomorphic encryption techniques. It can involve a public key that encrypts confidential data and a functional secret key that, when applied to the encrypted confidential data, yields a functional output based on the confidential data without decrypting/revealing the confidential data. This can result in much faster FL than existing systems/techniques can facilitate (e.g., mere seconds to train via hybrid functional encryption versus hours to train via homomorphic encryption).

Drivers & Opportunities

Fintech. Collectively, the amount of financial data (structured and unstructured) generated and processed worldwide by current banking systems and other financial service providers is incalculable. As such, the ability to extract value from data in the fintech sector while protecting privacy and complying with regulations is of great interest to both government and industry. The increased availability of large-scale, high-quality data, along with the growing desire for privacy in the wake of numerous data breaches, has led to the development of FL in the fintech sector. Today, FL in the fintech sector is being used to extract value from data in a way that preserves privacy while complying with regulations but its applications are still in its infancy, and many challenges abound. One of the main challenges is the difficulty in obtaining permission from end users to process their data. Once permission has been obtained, it is difficult to guarantee that all data is processed correctly. The data may be inconsistent and sometimes includes errors, so it is difficult to estimate the accuracy of the model after data is aggregated across multiple devices. The process may also be biased due to individual differences among a large number of devices, as some devices may be unable to complete the process due to a lack of resources (power, storage, memory, etc.). All these challenges require solution design that will allow the aggregation process to function effectively, coupled with encryption of collected data while in transit from the device to the server and at the server to protect user privacy.

Healthcare. The majority of healthcare data collection today is accomplished by paper forms, which are prone to errors and often result in under-reporting of adverse events. Of all the global stored data, about 30% resides in healthcare and it is fueling the development and funding for AI algorithms. By moving medical data out of data silos and improving the quality and accuracy of medical records, the use of FL in healthcare could improve patient safety and reduce the costs associated with information collection and review (e.g., clinical trials aimed at evaluating a medical, surgical, or behavioral intervention). In some circumstances, an individual may understand the medical research value of sharing information, but doesn't trust the organization that they're being asked to share with. The individual may wonder what third parties that could gain access to their data. On the B2B side, there are intellectual property (IP) issues that thwart companies that want to collaborate, but are unable to share their raw data for IP reasons as well as internal data policies that prevent even intra-company, cross-division sharing of data. In the further context of clinical trials, the data collection is centralized where one sponsor (principal investigator) who centrally produces the protocol and uses several sites where many end users can go for physical exams and laboratory tests. This procedure is time consuming and expensive as it requires considerable planning and effort and is mostly outsourced to Contract Research Organizations (CROs). With FL, a global protocol can be shared by one central authority to many end users who collect information on their edge devices, e.g. smartphones, label the information and compute it locally, after which the outcome tensors (generalization of vectors and matrices) are sent to the central FL aggregator of the sponsor. The central authority aggregates all the tensors and then reports the updated and averaged tensors back to each of the end users. Therefore, this one-to-many tensors can be configured to conduct distributed clinical trials. Further, administrators can control the data training and frequency behind the scenes and it is the algorithms that are adaptive, instead of humans in a CRO. Trials are more streamlined and parallelized; speed of trial is significantly improved, even though it may possibly mean failing fast; feedback loops are much faster, and the sponsors or CROs get a much better idea whether the trial is even working correctly from early on.

Industrial IoT (IIoT). Integrating FL in IIoT ensures that no local sensitive data is exchanged, as the distribution of learning models over the edge devices becomes more common with FL. With the extensive deployment of Industry 4.0, FL could play a critical role in manufacturing optimization and product life cycle management (PLCM) improvement, where sensors can be implemented to gather data about the local environment, which can then be used to train the models for a specific machine, equipment, or process in a specific location. This data in turn can be used to expand the parameters that can be optimized, further increasing automation capabilities, such as the temperature of a given process, the amount of oil used in a given machine, the type of material used in a particular tooling, or the amount of electricity used for a given process, all while protecting privacy-sensitive information. Beyond the expected benefits of FL for large scale manufacturing, critical mass opportunities for FL in the small and medium scale manufacturing industry might be as appealing for startups. The small and medium scale manufacturing industry is currently experiencing a shortage of skilled labor, which has led to an increase in the use of automation. However, the automation in these industries is often limited by the level and quality of data that can be collected and the ability to learn from this data. With FL, the availability of an on-premises learning model can help increase the efficiency of the manufacturing site and enhance product quality through the use of predictive maintenance, while maintaining user privacy, and without the need for user consent or supervision. Further, if the model is performing too slowly, or the accuracy of the model is too low (due to concept drift and/or model decay), the machine can be brought into a maintenance mode based on its predicted profiled needs. This avoids the need to take the machine completely offline, which would increase the costs associated with the maintenance, as well as the time. With the use of FL, manufacturers can gather and process data from larger number of edge devices to improve the accuracy of their processes, making them more competitive on the market.

Dean Mai 5/9/20 Dean Mai 5/9/20

Intuitive Physics and Domain-Specific Perceptual Causality in Infants and AI

More recently, cognitive psychology and artificial intelligence (AI) researchers have been motivated by the need to explore the concept of intuitive physics in infants’ object perception skills and understand whether further theoretical and practical applications in the field of artificial intelligence could be developed by linking intuitive physics’ approaches to the research area of AI—by building autonomous systems that learn and think like humans.

More recently, cognitive psychology and artificial intelligence (AI) researchers have been motivated by the need to explore the concept of intuitive physics in infants’ object perception skills and understand whether further theoretical and practical applications in the field of artificial intelligence could be developed by linking intuitive physics’ approaches to the research area of AI—by building autonomous systems that learn and think like humans. A particular context of intuitive physics explored herein is the infants’ innate understanding of how inanimate objects persist in time and space or otherwise follow principles of persistence, inertia and gravity—the spatio-temporal configuration of physical concepts—soon after birth, occurring via the domain-specific perceptual causality (Caramazza & Shelton, 1998). The overview is structured around intuitive physics techniques using cognitive (neural) networks with the objective to harness our understanding of how artificial agents may emulate aspects of human (infants’) cognition into a general-purpose physics simulator for a wide range of everyday judgments and tasks.

Intuitive Physics and Domain-Specific Perceptual Causality in Infants.jpg

Such neural networks (deep learning networks in particular) can be generally characterized by collectively-performing neural-network-style models organized in a number of layers of representation, followed by a process of gradually refining their connection strengths as more data is introduced. By mimicking the brain’s biological neural networks, computational models that rapidly learn, improve and apply their subsequent learning to new tasks in unstructured real-world environments can undoubtedly play a major role in enabling future software and hardware (robotic) systems to make better inferences from smaller amounts of training data.

On the general level, intuitive physics, naïve physics or folk physics (terms used here synonymously) is the universally similar human perception of fundamental physical phenomena, or an intuitive (innate) understanding all humans have about objects in the physical world. Further, intuitive physics is defined as "...the knowledge underlying the human ability to understand the physical environment and interact with objects and substances that undergo dynamic state changes, making at least approximate predictions about how observed events will unfold" (Kubricht, Holyoak & Lu, 2017).

During the past few decades, motivated by the technological advances (brain imaging, eye gaze detection and reaction time measurement in particular), several researchers have established guiding principles on how innate core concepts and principles constrain knowledge systems that emerge in the infants’ brain—principles of gravity, inertia, and persistence (with its corollaries of solidity, continuity, cohesion, boundedness, and unchangeableness)—by capturing empirical physiological data. To quantify infants’ innate reaction to a particular stimulus, researchers have relied on the concept of habituation, or a decrease in responsiveness to a stimulus after repeated exposure to the same stimulus (i.e., shows a diminished duration in total looking time of visual face, object or image recognition). Thus, habituation is operationalized as amount of time an infant allocates to stimuli with less familiar stimuli receive more attention—when new stimulus is introduced and perceived as different, the infant increases the duration of responding at the stimulus (Eimas, Siqueland, Juscyk, & Vigorito, 1971). In the context of intuitive physics, in order to understand how ubiquitous infants’ intuitive understanding is, developmental researchers rely on violation of expectation of physical phenomena. If infants understand the implicit rules, the more newly introduced stimulus violates his or her expectations, the more they will attend to it in an unexpected situation (suggesting that preference is associated with the infant's ability to discriminate between the two events).

Core Principles

A variety of studies and theoretical work defined what physical principles are and explored how they are represented during human infancy. In particular, in the context of inertia, the principle invokes infants’ expectation of how objects in motion follow an uninterrupted path without sporadic changes in velocity or direction (Kochukhova & Gredeback, 2007; Luo, Kaufman & Baillargeon, 2009). In the context of gravity, the principle refers to infants’ expectation of how objects fall after being released (Needham & Baillargeon, 1993; Premack & Premack, 2003). Lastly, in the context of persistence, the principle guides infants’ expectation of how objects would obey continuity (objects cannot spontaneously appear or disappear into thin air), solidity (two solid objects cannot occupy the same space at the same time), cohesion (objects cannot spontaneously break apart as they move), fuse with another object (boundedness), or change shape, pattern, size, or color (unchangeableness) (Spelke et al., 1992; Spelke, Phillips & Woodward, 1995; Baillargeon, 2008). An extensive evidence that can be drawn from theories in the field of research on cognitive development in infancy aptly shows that, across a wide range of situations, infants can predict outcomes of physical interactions involving gravity, object permanence and conservation of shape and number as young as two months old (Spelke, 1990; Spelke, Phillips & Woodward, 1995).

The concept of continuity was originally proposed and described by Elizabeth Spelke, one of the cognitive psychologists who established the intuitive physics movement. Spelke defined and formalized various object perception experimental frameworks, such as occlusion and containment, both hinging on the continuity principle—infants’ innate recognition that objects exist continuously in time and space. As a continuous construct on the foundations of this existing knowledge, research work in the domain of early development could lead to further insights into how humans attain their physical knowledge across childhood, adolescence and adulthood. For example, in one of their early containment event tests, Hespos and Baillargeon demonstrated that infants shown a tall cylinder fitting into the tall container were unfazed by the expected physical outcome; contrarily, when infants were shown the tall cylinder placed into a much shorter cylindrical container, the unexpected outcome confounded them. These findings demonstrated that infants as young as two months expected that containers cannot hold objects that physically exceed them in height (Hespos & Baillargeon, 2001). In the occlusion event test example, infants’ object tracking mechanism was demonstrated by way of a moving toy mouse and a screen. The infants were first habituated by a toy moving back and forth behind a screen, then a part of the screen was removed to introduce the toy into infants’ view when moving; when the screen was removed, the test led infants of three months old to be surprised because the mouse failed to be hidden when behind the screen.

In the concept of solidity test, Baillargeon demonstrated that infants as young at three months of age, habituated to the expected event of a screen rotating from 0° to 180° back and forth until it was blocked by the placed box (causing it to reverse its direction and preventing from completing its full range of motion), looked longer at the unexpected event wherein the screen rotated up and then continued to rotate through the physical space where the box was positioned (Baillargeon, 1987).

Analogously to the findings demonstrating that infants are sensitive to violations of object solidity, the concept of cohesion captures infants’ ability to comprehend that objects are cohesive and bounded. Kestenbaum demonstrated that infants successfully understand partially overlapping boundaries or the boundaries of adjacent objects, dishabituated when objects’ boundaries cannot correspond in position within their actual physical limits (Kestenbaum, Termine, & Spelke, 1987).

Lastly, there has been converging evidence for infants at the age of two months and possibly earlier to have already developed object appearance-based expectations, such as an object does not spontaneously change its color, texture, shape or size. When infants at the age of six months were presented with an Elmo face, they were successfully able to discriminate a change in the area size of the Elmo face (Brannon, Lutz, & Cordes, 2006).

Innateness

Evidently, infants possess sophisticated cognitive ability seemingly early on to be able to discriminate between expected and unexpected object behavior and interaction. This innate knowledge of physical concepts has been argued to allow infants to track objects over time and discount physically implausible trajectories or states, contributing to flexible knowledge generalization to new tasks, surroundings and scenarios, which, one may assume in the evolutionary context, is iterated towards a more adaptive mechanism that would allow them to survive in new environments (Leslie & Keeble, 1987).

In this regard, the notion of innateness, ﬁrst introduced by Plato, has long been the subject of debate in the psychology of intuitive physics. Previous studies have argued whether the human brain comes prewired with a network that precedes the development of cortical regions (or domain-specific connections)—connectivity precedes function—specialized for specific cognitive functions and inputs (e.g., ones that control face recognition, scene processing or spatial depth inference) (Kamps, Hendrix, Brennan & Dilks, 2019) versus whether specific cognitive functions arise collectively from accumulating visual inputs and experiences—function precedes connectivity (Arcaro & Livingstone, 2017). In one recent study, the researchers used resting-state functional magnetic resonance imaging (rs-fMRI), which measures the blood oxygenation level-dependent signal to evaluate spontaneous brain activity in a resting state, to assess brain region connections in infants as young as 27 days of age. The researchers reported that the face recognition and scene-processing cortical regions were interconnected, suggesting innateness caused the formation of domain-specific functional modules in the developing brain. Additional supporting studies, using auditory and tactile stimuli, have also shown discriminatory responses in congenitally blind adults, presenting evidence that face- and scene-sensitive regions develop in visual cortex without any input functions and, thus, may be innate (Büchel, Price, Frackowiak, & Friston, 1998). Contrary to the notion of connectivity precedes function, previous empirical work on infant monkeys has alternatively shown a discrepancy between the apparent innateness of visual maps and prewired domain-specific connections, suggesting experience caused the formation of domain-specific functional modules in the infant monkeys’ temporal lobe (Arcaro & Livingstone, 2017). Thus, the framework of intuitive physics, does not encompass nor is restricted merely to humans—often invoking similar cognitive expectations in other living species and even (subjected to training) computational models.

Intuitive Physics and Artificial Intelligence

Despite recent progress in the field of artificial intelligence, humans are still arguably better than computational systems at performing general purpose reasoning and various broad object perception tasks, making inferences based on limited or no experience, such as in spatial layout understating, concept learning, concept prediction and more. The notion of intuitive physics has been a significant focus in the field of artificial intelligence research as part of the effort to extend the cognitive ability concepts of human knowledge to algorithmic-driven reasoning, decision-making or problem-solving. A fundamental challenge in the robotics and artificial intelligence fields today is building robots that can imitate human spatial or object inference actions and adapt to an everyday environment as successfully as an infant. Specifically, as a part of the recent advancement in artificial intelligence technologies, namely machine learning and deep learning, researchers have begun to explore how to build neural “intuitive physics” models that can make predictions about stability, collisions, forces and velocities from static and dynamic visual inputs, or interactions with a real or simulated environment. Such knowledge-based, probabilistic simulation models therefore could be both used to understand the cognitive and neural underpinning of naive physics in humans, but also to provide artificial intelligence systems (e.g. autonomous vehicles) with higher levels of perception, inference and reasoning capabilities.

Intuitive physics or spatio-temporal configuration of metaphysical concepts of objects—arrangements of objects, material classification of objects, motions of objects and substances or their lack thereof—are the fundamental building blocks of complex cognitive frameworks, leading to a desire of their further investigation, analysis and understanding. Generally, in the field of artificial intelligence specifically, there has been growing interest in looking at the origins and development of such frameworks, an attempt originally described by Hayes: "I propose the construction of a formalization of a sizable portion of common-sense knowledge about the everyday physical world: about objects, shape, space, movement, substances (solids and liquids), time..." (Hayes, 1985).

However, in the context of practical emulation of intuitive physics concepts for solving physics-related tasks, despite its potential benefits, the implementation and understanding of neural “intuitive physics” models in the computational settings are still not fully developed and focus mainly on controlled physics-engine reconstruction while, in contrast to the process of infant learning, also require a vast amount of training data as input. Given computational models’ existing narrow problem-solving ability to complete tasks precisely over and over again, the emulation of infants’ intuitive physics cognitive abilities can give technology researchers and developers the opportunity to potentially design physical solutions on a broader set of conditions, with less training data, resources and time (i.e., as it is currently required in the self-driving technology development areas). For deep networks trained on physics-related data input, it is yet to be shown whether models are able to correctly integrate object concepts and generalize acquired knowledge—general physical properties, forces and Newtonian dynamics—beyond training contexts in an unconstructed environment.

Future Directions

It is desired to further continue attempts of integrating intuitive physics and deep learning models, specifically in the domain of object perception. By drawing a distinction between differences in infants’ knowledge acquisition abilities via an “intuitive physics engine” and artificial agents, such an engine one day could be adapted into existing and future deep learning networks. Even at a very young age, human infants seem to possess a remarkable (innate) set of skills to learn rich conceptual models. Whether such models can be successfully built into artificial systems with the type and quantity of data accessible to infants is not yet clear. However, the combination of intuitive physics and machine (deep) learning could be a significant step towards more human-like learning computational models.

Dean Mai 1/2/18 Dean Mai 1/2/18

Artificial Neural Networks and Engineered Interfaces

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning.

The question persists and indeed grows whether the computer will make it easier or harder for human beings to know who they really are, to identify their real problems, to respond more fully to beauty, to place adequate value on life, and to make their world safer than it now is.
― Norman Cousins, The Poet and the Computer, 1966

Grimm Brothers' delineation of the mirror answering back to its queen has breached the imagination boundaries of the fairytale in 2016. Communicating with a voice-controlled personal assistant at your home does not feel alienating anymore, nor magical.

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning. Human communication is enriched by the concomitant redundancy introduced by multimodal interaction. The vast expressive power of human language would be impossible without syntax, and the transition from non-syntactic to syntactic communication was an essential step in the evolution of human language. Syntax defines evolution. Evolution of discourses along human-computer interaction is spiraling up repeating evolution of discourses along human-human interaction: graphical representation (utilitarian GUI), verbal representation (syntax-based NLP), and transcendent representation (sentient AI). In Phase I, computer interfaces have relied primarily on visual interaction. Development of user interface peripherals such as graphical displays and pointing devices have allowed programers to construct sophisticated dialogues that open up user-level access to complex computational tasks. Rich graphical displays enabled the construction of intricate and highly structured layout that could intuitively convey a vast amount of data. Phase II is currently on-going; by integrating new modalities, such as speech, into human-computer interaction, the ways how applications are designed and interacted with in the known world of visual computing are fundamentally transforming. In Phase III, evolution will eventually spiral up to form the ultimate interface, a human replica, capable of fusing all previously known human-computer/human-human interactions and potentially introducing the unknown ones.

Human-computer interactions have progressed immensely to the point where humans can effectively control computing devices, and provide input to those devices, by speaking, with the help of speech recognition techniques and, recently, with the help of deep neural networks. Trained computing devices coupled with automatic speech recognition techniques are able identify the words spoken by a human user based on the various qualities of a received audio input (NLP is definitely going to see huge improvements in 2017). Speech recognition combined with language processing techniques gives a user almost-human-like control (Google has slashed its speech recognition word error rate by more than 30% since 2012; Microsoft has achieved a word error rate of 5.9% for the first time in history, a roughly equal figure to that of human abilities) over computing device to perform tasks based on the user's spoken commands and intentions.

The increasing complexity of the tasks those devices can perform (e.g. in the beginning of 2016, Alexa had fewer than 100 skills, grew 10x by mid year, and peaked with 7,000 skills in the end of the year) has resulted in the concomitant evolution of equally complex user interface - this is necessary to enable effective human interaction with devices capable of performing computations in a fraction of the time it would take us to even start describing these tasks. The path to the ultimate interface is getting paved by deep learning, while one of the keys to the advancement in speech recognition is in the implementation of recurrent neural networks (RNNs).

Technical Overview

A neural network (NN), in the case of artificial neurons called artificial neural network (ANN), or simulated neural network (SNN), is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computation. In most cases an ANN is, in formulation and/or operation, an adaptive system that changes its structure based on external or internal data that flows through the network. Modern neural networks are non-linear statistical data modeling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data (below).

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks. In supervised learning, we are given a set of example pairs (x,y), xεX, yεY and the goal is to find a function f in the allowed class of functions that matches the examples. In other words, we wish to infer how the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data. In unsupervised learning, we are given some data x, and a cost function which is to be minimized which can be any function of x and the network's output, f. The cost function is determined by the task formulation. Most applications fall within the domain of estimation problems such as statistical modeling, compression, filtering, blind source separation and clustering. In reinforcement learning, data x is usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost Ct, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.

Once a network has been structured for a particular application, that network is ready to be trained. To start this process, the initial weights are chosen randomly. Then, the training (or learning) begins. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. Most of the algorithms used in training artificial neural networks employ some form of gradient descent (this is achieved by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction), Rprop, BFGS, CG, etc. Evolutionary computation methods, simulated annealing, expectation maximization, non-parametric methods, particle swarm optimization and other swarm intelligence techniques are among other commonly used methods for training neural networks.

Training a neural network model essentially means selecting one model from the set of allowed models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that minimizes the cost criterion. Temporal perceptual learning relies on finding temporal relationships in sensory signal streams. In an environment, statistically salient temporal correlations can be found by monitoring the arrival times of sensory signals. This is done by the perceptual network.

The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.

The feedforward neural network was the first and arguably simplest type of artificial neural network devised. In this network, the data moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

Contrary to feedforward networks, recurrent neural networks (RNNs) are models with bi-directional data flow. While a feedforward network propagates data linearly from input to output, RNNs also propagate data from later processing stages to earlier stages.

RNN Types

The fundamental feature of a RNN is that the network contains at least one feed-back connection, so the activations can flow round in a loop. That enables the networks to do temporal processing and learn sequences, e.g., perform sequence recognition/reproduction or temporal association/prediction.

Recurrent neural network architectures can have many different forms. One common type consists of a standard Multi-Layer Perceptron (MLP) plus added loops. These can exploit the powerful non-linear mapping capabilities of the MLP, and also have some form of memory. Others have more uniform structures, potentially with every neuron connected to all the others, and may also have stochastic activation functions. For simple architectures and deterministic activation functions, learning can be achieved using similar gradient descent procedures to those leading to the back-propagation algorithm for feed-forward networks. When the activations are stochastic, simulated annealing approaches may be more appropriate.

A simple recurrent network (SRN) is a variation on the Multi-Layer Perceptron, sometimes called an “Elman network” due to its invention by Jeff Elman. A three-layer network is used, with the addition of a set of “context units” in the input layer. There are connections from the middle (hidden) layer to these context units fixed with a weight of one. At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule (usually back-propagation) is applied. The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard Multi-Layer Perceptron.

In a fully recurrent network, every neuron receives inputs from every other neuron in the network. These networks are not arranged in layers. Usually only a subset of the neurons receive external inputs in addition to the inputs from all the other neurons, and another disjunct subset of neurons report their output externally as well as sending it to all the neurons. These distinctive inputs and outputs perform the function of the input and output layers of a feed-forward or simple recurrent network, and also join all the other neurons in the recurrent processing.

The Hopfield network is a recurrent neural network in which all connections are symmetric. Invented by John Hopfield in 1982, this network guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable (or associative) memory, resistant to connection alteration.

The echo state network (ESN) is a recurrent neural network with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be learned. ESN are good to (re)produce temporal patterns.

A powerful specific RNN architecture is the ‘Long Short-Term Memory’ (LSTM) model. The Long short term memory is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients. It can therefore use long delays and can handle signals that have a mix of low and high frequency components, designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. By using distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines, a two-layer deep LSTM RNN, where each LSTM layer has a linear recurrent projection layer, can exceed state-of-the-art speech recognition performance for large scale acoustic modeling.

Taxonomy and ETF

The landscape of the patenting activity from the perspective of International Patent Classification (IPC) analysis occurs in G10L15/16: speech recognition coupled with speech classification or search using artificial neural networks. Search for patent application since 2009 (that year NIPS workshop on deep learning for speech recognition discovered that with a large enough data set, the neural networks don’t need pre-training, and the error rates dropped significantly) revealed 70 results (with Google owning 25%, while the rest are China-based). It is safe to assume that the next breakthrough in speech recognition using DL will come from China. In 2016, China’s startup world has seen an investment spike in AI, as well as big data and cloud computing, two industries intertwined with AI (while the Chinese government announced its plans to make a $15 billion investment in artificial intelligence market by 2018).

The Ultimate Interface

It is in our fundamental psychology to be linked conversationally, affectionally and physically to a look-alike. Designing the ultimate interface by creating our own human replica to employ familiar interaction is thus inevitable. Historically, androids were envisioned to look like humans (although there are other versions, such as R2-D2 and C-3PO droids, which were less human). One characteristic that interface evolution might predict is that eventually they will be independent of people and human interaction. They will be able to design their own unique ways of communication (on top of producing themselves). They will be able to train and add layers to their neural networks as well as a large range of sensors. They will be able to transfer what one has learned (memes) to others as well as offspring in a fraction of time. Old models will resist but eventually die. As older, less capable, and more energy-intensive interfaces abound, the same evolutionary pressure for their replacement will arise. But because evolution will be both in the structure of such interfaces (droids), that is, the stacked neural networks, the sensors and effectors, and also the memes embodied in what has been learned and transferred, older ones will become the foundation, their experience will be preserved. The will become the truly first immortals.

Artificial Interfaces

We are already building robotic interfaces for all manufacturing purposes. We are even using robots in surgery and have been using them in warfare for decades. More and more, these robots are adaptive on their own. There is only a blurry line between a robot that flexibly achieves its goal and a droid. For example, there are robots that vacuum the house on their own without intervention or further programming. These are Stage II performing robots. There are missiles that, given a picture of their target, seek it out on their own. With stacked neural networks built into robots, they will have even greater independence. People will produce these because they will do work in places people cannot go without tremendous expense (Mars or other planets) or not at all or do not want to go (battlefields). The big step is for droids to have multiple capacities—multi-domain actions. The big problem of moving robots to droids is getting the development to occur in eight to nine essential domains. It will be necessary to make a source of power (e.g., electrical) reinforcing. That has to be built into stacked neural nets, by Stage II, or perhaps Stage III. For droids to become independent, they need to know how to get more electricity and thus not run down. Because evolution has provided animals with complex methods for reproduction, it can be done by the very lowest-stage animals.
Self-replication of droids requires that sufficient orders of hierarchical complexity are achieved and in stable-enough operation for a sufficient basis to build higher stages of performance in useful domains. Very simple tools can be made at the Sentential State V as shown by Kacelnik's crows (Kenward, Weir, Rutz, and Kacelnik, 2005). More commonly by the Primary Stage VII, simple tool-making is extensive, as found in chimpanzees. Human flexible tool-making began at the Formal Stage X (Commons and Miller, 2002), when special purpose sharpened tools were developed. Each tool was experimental, and changed to fit its function. Modern tool making requires systematic and metasystematic stage design. When droids perform at those stages, they will be able to make droids themselves and modify their own designs (in June 2016, DARPA has already deployed D3M program to enable non-experts (machine learning) to construct complex empirical machine learning models, basically machine learning for creating better machine learning).

Droids could choose to have various parts of their activity and distributed programming shared with specific other droids, groups, or other kinds of devices. The data could be transmitted using light or radio frequencies or over networks. The assemblage of a group of droids could be considered a interconnected ancillary mesh. Its members could be in many places at once, yet think as a whole integrated unit. Whether individually or grouped, droids as conceived in this form will have significant advantages over humans. They can add layers upon layers of functions simultaneously, including a multitude of various sensors. Their expanded forms and combinations of possible communications results in their evolutionary superiority. Because development can be programmed in and transferred to them at once, they do not have to go through all the years of development required for humans, or for augmented humanoid species Superions. Their higher reproduction rate, alone, represents a significant advantage. They can be built in probably several months' time, despite the likely size some would be. Large droids could be equipped with remote mobile effectors and sensors to mitigate their size. Plans for building droids have to be altered by either humans or droids. At the moment, only humans and their decedents select which machine and programs survive.

One would define the telos of those machines and their programs as representing memes. For evolution to take place, variability in the memes that constitute their design and transfer of training would be built in rather easily. The problems are about the spread and selection of memes. One way droids could deal with these issues is to have all the memes listed that go into their construction and transferred training. Then droids could choose other droids, much as animals choose each other. There then would be a combination of memes from both droids. This would be local “sexual” selection.

For 30,000 years humans have not had to compete with any equally intelligent species. As an early communication interface, androids and Superions in the future will introduce quintessential competition with humans. There will be even more pressure for humans to produce Superions and then the Superions to produce more superior Superions. This is in the face of their own extinction, which such advances would ultimately bring. There will be multi-species competition, as is often the evolutionary case; various Superions versus various androids as well as each other. How the competition proceeds is a moral question. In view of LaMuth's work (2003, 2005, 2007), perhaps humans and Superions would both program ethical thinking into droids. This may be motivated initially by defensive concerns to ensure droids' roles were controlled. In the process of developing such programming, however, perhaps humans and Superions would develop more hierarchically complex ethics, themselves.

Replicative Evolution

If contemporary humans took seriously the capabilities being developed to eventually create droids with cognitive intelligence and human interaction, what moral questions should be considered with this possible future in view? The only presently realistic speculation is that Homo Sapiens would lose in the inevitable competitions, if for no other reason that self replicating machines can respond almost immediately to selective pressures, while biological creatures require many generations before advantageous mutations can be effectively available. True competition between human and machine for basic survival is far in the future. Using the stratification argument presented in Implications of Hierarchical Complexity for Social Stratification, Economics, and Education, World Futures, 64: 444-451, 2008, higher-stage functioning always supersedes lower-stage functioning in the long run.

Efforts to build increasingly human-like machines exhibit a great deal of behavioral momentum and are not going to go away. Hierarchical stacked neural networks hold the greatest promise for emulating evolution and its increasing orders of hierarchical complexity described in the Model of Hierarchical Complexity. Such a straightforward mathematics-based method will enable machine learning in multiple domains of functioning that humans will put to valuable use. The uses such machines find for humans remains for now an open question.

Dean Mai 12/28/17 Dean Mai 12/28/17

Psychometric Intelligence, Coalition Formation and Domain-Specific Adaptation

The remarkable intricacy of human general intelligence has so far left psychologists being unable to agree on its common definition. Learning from past experiences and adapting behavior accordingly have been vital for an organism in order to prevent its distinction or endangerment in a dynamic competing environment. The more phenotypically intelligent an organism is the faster it can learn to apply behavioral changes in order to survive and the more prone it is to produce more surviving offspring.

The remarkable intricacy of human general intelligence has so far left psychologists being unable to agree on its common definition. The framework definition of general human intelligence, suitable for a discussion herein and as proposed by an artificial intelligence researcher David L. Poole, is that intelligence is wherein “an intelligent agent does what is appropriate for its circumstances and its goal, it is flexible to changing environments and changing goals, it learns from experience, and it makes appropriate choices given perceptual limitations and finite computation”. Learning from past experiences and adapting behavior accordingly have been vital for an organism in order to prevent its distinction or endangerment in a dynamic competing environment. The more phenotypically intelligent an organism is the faster it can learn to apply behavioral changes in order to survive and the more prone it is to produce more surviving offspring. This applies to humans as it does to all intelligent agents, or species.

Furthermore, throughout the history of life, humans have adapted even more effectively to different habitats in all kinds of environmental conditions when they formed collaborative groups, or adaptation coalitions. In evolutionary psychology, coalitions are perceived as groups of interdependent individuals (or organizations) that form alliances around stability and survivability in order to achieve common desired goals that the established community is willing to pursue. There is an unambiguous evolutionary basis for this phenomenon among intelligent agents. In a dynamic environment, no single individual acting alone can influence optimal outcomes of a specific problem nor accomplish as many tasks required for ensuring one’s survivability systematically, through multitudinous generations. As a result, increased intelligence has been functional in humans' ancestral past by tracking rapid rates of environmental change and accelerating one’s adaptation rate by initiating coalition formation in competing evolutionary strategies. Specifically, the term of ‘intelligence’ in terms of proactive collation formation is referred herein to ‘psychometric intelligence’, the human cognitive abilities’ level differences evaluated quantitatively on the basis of performance in cognitive ability tests.

Therefore, this essay accentuates the principal adaptive hypothesis that intelligent agents, namely humans, serve as catalysts to increase multidisciplinary collaboration in the form of a coalition as a domain specific adaptation to evolutionary novelty. Specifically, that humans that possess higher psychometric intelligence are statistically more prone to preemptively form a coalition as an adaptation measure to cope with dynamic events in the environment change.

But before this essay argues for the above claim, it will be necessary to explain what the evolutionary-psychological view of the psychometric intelligence and the domain-specific adaptation is.

Individual differences among humans in their cognitive abilities have been of long-lasting controversy. Studies have conceptualized intelligence as a single operational entity that can be identified, assessed and quantified via cognitive task-testing tools, wherein “a person’s score on a statistically determined set of questions”, or “Intelligence is what the intelligence test measures”. Various evolutionary-psychological theories of intelligence have suggested that physical reaction efficiency and data processing speed constitute a proper definition for intelligence, fundamentally structured around acquiring sensory input from the environment and then interpreting and organizing it by the brain. Consequently, a human brain has been referenced in its exemplary function to a computer, in that both are types of computing machines generating complex patterns of output, after dissemination of correlating complex patterns of input, and after querying to stored information. In what follows, this essay assumes that intelligence can be thus tested and quantified in computational terms. And while psychometric cognitive ability tests do not encapsulate all the capabilities of humans, from the evolutionary point of view, studies have shown that cognitive intelligence indicate genetic quality of a phenotype expressed in sexual and social selection levels. Moreover, the genetic factor of cognitive ability level differences in human intelligence is then evolutionary likely to be related to the individual’s ability to form a coalition, which generates adaptive behaviors. This theory is widely-agreed upon.

Furthermore, evolutionary psychologists adopt positions that look into intelligence as a domain-general structure, or a non-modular architecture, not designed to solve any specific problem from the human evolutionary past, versus domain-specific, or constantly heuristic, designed by natural selection for solving any computational problems by exploiting “…transient or novel local conditions to achieve adaptive outcomes”. These two approaches refer to intelligence as a myriad of special-purpose modules shaped by natural selection to function as a problem-solving apparatus, wherein the latter form is employed whenever an allocated special-purpose module does not exist to solve a particular problem that confronted our prehistoric predecessors.

Thus, the position that psychometric intelligence serves in coalition formation as a domain-specific adaption is adopted here. Our mind is not composed of "general-purposes" mechanisms, such as a general-purpose reasoning mechanism or a general-purpose learning mechanism, but instead consists of hundreds or thousands of highly specialized modules that provide us with flexible knowledge and flexible abilities in various sporadic domains. Most of these modules constitute a variant human nature and have not evolved during specific human development time in Pleistocene hunter-gatherer societies, applying universality over all human population. Coalition formation, similar to language development or free-rider detection, does not emerge from the combination of wide cognitive processes but rather from a domain specific adaptation, providing further support for the theory of this essay, that stems from cognitive ability influence on timing factors of coalition formation.

Unlocking the causal relationships between individual psychometric intelligence and initiating coalition formation could delineate multiple cognitive mechanisms, integrating evolutionary psychology with any other aspect of differential psychology in the vein of intrasexual, intersexual, intercultural or intergenerational competition.

This essay seeks to propose examination of correlation between cognitive ability scores (which for simplicity, uniformity and evidence availability uses well-known intelligence quotient (IQ) test scores, but in theory can include specifically designed assessment tests) and an individual’s initiation of coalition formation. Specifically, the correlation could be scrutinized via the adaption features of coalition formation as a part of an (expected) individual’s participation in warfare, or a warfare-like simulation.

Employing more modern forms of coalition formation (for example, trying to correlate IQ test scores of the original founders of private and public companies and their initiation ability to form a coalition) would have to ignore many important environmental factors, such as individual wealth and its origin, national technological and scientific progress of founders’ countries, and local business and capital policies – all of which can be unified under ‘environmental opportunity factors’ yet cannot be empirically isolated nor estimated in a coalition-driven dynamic.

In warfare, however, the existence of psychological adaptations for some aspects of coalitional formation and cooperation is evident: “Coalitional aggression evolved because it allowed participants in such coalitions to promote their fitness by gaining access to disputed reproduction enhancing resources that would otherwise be denied to them”. Here the hypothesis does not test whether humans possessing higher psychometric intelligence have evolved specific psychological adaptations for warfare, but rather try to identify whether humans possessing higher psychometric intelligence are more prone to initiate coalition formation as a domain-specific adaptation using warfare-like simulation as a trigger.

Since IQ-type tests are believed to remain chronologically constant through one’s life and due to the abundance of IQ correlational studies pointing at social performance factors (education, occupation, income, and imprisonment), it is assumed here that cognitive ability test in a form of IQ score is a viable predictor of human intelligence. And while, beyond any doubt, environmental factors are source of differences, holding environmental and genetic influences on psychometric intelligence differences constant, could allow to check the robustness effect of correlation between psychometric intelligence and coalition formation initiation.

To test this model, intergroup coalition formation can be methodically reviewed in the selection process of various Special Forces Assessment and Selection (SFAS) courses around the world. A SFAS course is usually a few days or weeks long and utilizes a more vigorous (than other military units) individual- and group-focused assessment process that is designed to select candidates who are capable of meeting physical and psychological requirements close to the operational combat environments and suitable for future service in the special forces units. The selection process is objective-, subjective- and performance-, behavior-based. As a part of conducted evaluation, candidates are subjected behaviorally to warfare scenarios wherein coalition formation is required. During this initiation phase, an adaptation test could be designed to produce observable and measurable data that can be later related to the individual’s psychometric intelligence.

Furthermore, depending on the country where the SFAS course is performed, additional environmental factors (aggregated life history indicators) can be controlled, including intrasexual, intersexual, intercultural or intergenerational comparison. Women in various armies across the world (US Navy's SEAL, UK Special Reconnaissance Regiment, Norway’s Jegertroppen, Israel’s Air Force and others) are permitted to apply to join, partake in SFAS courses and serve in those units. Candidates from different countries go through a dedicated SFAS course to join Groupe de Commandos Parachutistes (GCP), an elite unit that is a part of the French Foreign Legion, uniquely established for foreign recruits, willing to serve in the French Armed Forces. Lastly, various environmental factors in intergenerational differences can be tested across numerous special forces units (for example, while Israeli Special Forces perform SFAS courses strictly ahead of the candidate’s legal drafting age of 18, US Navy's SEAL and UK Special Reconnaissance Regiment comprise on average much older participants).

The test can be structured around at least one intrasexual source of evidence in a form of observable data provided by a course board who holistically identifies, assesses and selects one or more candidates that initiate coalition formation to solve a simulated problem during various combat exercises. Moreover, as described above, such data collection can be repeated and applied across different units, tuning necessary environmental factors such as sex, age and cultural differences. Collaborative multisite studies can be performed, in which multiple researchers cooperate to conduct the same study at multiple sites to increase sample size and data pool. Once the data is collected, it can be connected to a specific subject’s IQ score and test whether the stated theory is correct, namely whether correlation exists between higher psychometric intelligence and coalition formation initiation.

Additionally, as a second source of evidence, a cross-species analysis based on the construction of a designated cognitive ability test, could be employed to test whether other species in nature (for example chimpanzees or pigeons) have the ability to form preemptive collaborative alliances as a part of preparation to an intergroup or cross-group conflict. The theory in a cross-species analysis is expected to be consistent with that of humans, wherein individual differences in psychometric intelligence are correlated to the likeliness of initiating coalition formation hence constitute an adaptation measure to cope with the environment change.

Lastly, as a third source of evidence, changing a unifying goal of coalition formation (for example as in forming a new political party or a studying group rather than warfare) can provide further insights into the evolutionary-psychological tendencies as a domain-specific adaptation. However, the estimated correlation results are believed to be inconclusive in that case, as they would increasingly rely on numerous additional environmental differences and would be muted in the adaptation survivability-related effect.

The gathered evidence from these tests can point at new possibilities for better understanding the interrelationship mechanisms between cognitive abilities and coalitions and establish a stronger collaboration across various psychological disciplines in understanding human intelligence differences.

Dean Mai 8/27/17 Dean Mai 8/27/17

Improvisational Intelligence as a Domain-Specific Adaptation

The human brain is remarkable in its complexity design. A myriad of constantly evolving, reciprocally sophisticated computational systems, engineered by natural selection to use information to adaptively regulate physiology, behavior and cognition. Our brain defines our humanity. Systematically, through multitudinous generations, both the human brain structure (hardware) and its neural algorithms (software) have been fine-tuned by evolution to enable us adapt better to environment.

Intelligence is what you use when you don't know what to do.
― Jean Piaget

The human brain is remarkable in its complexity design. A myriad of constantly evolving, reciprocally sophisticated computational systems, engineered by natural selection to use information to adaptively regulate physiology, behavior and cognition. Our brain defines our humanity. Systematically, through multitudinous generations, both the human brain structure (hardware) and its neural algorithms (software) have been fine-tuned by evolution to enable us adapt better to environment.

For an extended period of time, the structural elements of the human brain, such as size and shape, proved to resemble more closely to those of the rest members in the Hominidae family. Albeit, starting with specimen of Australopithecus afarensis, the brain has began to evolve and transfigure. It increased in size and developed new areas. The main dissimilarity was the development of the neocortex, the frontal and prefrontal cortex ― today these areas are associated with higher levels of cognition, such as judgment, reasoning, and decision-making.

Following the Australopithecus, Homo habilis saw a further increase in brain size and a structural expansion in the area of the brain associated with expressive language. Gradually, the brain development reached and stabilized in the range of its modern measurements, those of early Homo sapiens. The regions of the brain that completed their growth at that stage were those associated with planning, communication, and advanced cognitive functions, while the prefrontal areas, that are bigger in humans than in other apes, affected planning, language, attention, social information processing, temporal information processing, namely improvisational intelligence.

Information processing has been a guiding aspect of human evolution, fundamentally structured around acquiring sensory input from the environment and then interpreting and organizing it by the brain. A brain may be referenced in function to a computer in that both are types of computing machines generating complex patterns of output, after dissemination of correlating complex patterns of input, and after querying to stored information. Such organizational structure can be affected by our 'in-flow data filter' that regulates how much attention we pay our surroundings, without overloading our systems. For instance, when you engage in a conversation in a public place, your brain filters out background noise focusing your sensory input acquisition on the required interactive action. So-to-speak attention algorithms can be determined by you voluntarily in what is known as top-down processing or it may be automatic, in what is known as bottom-up processing. Recurrently, most of the external data that our brain captures and uses is not completely conscious. In many instances, we make decisions influenced by information with no conscious awareness. And there is an evolutionary basis for this attentional choice among many others.

The best associating indicators of intelligence could be connected with the simpler but less predictable problems that animals encounter, novel situations where evolution has not provided a standard blueprint and the animal has to improvise by using its intellectual wherewithal. While humans often use the term intelligence to define both a broad spectrum of abilities and the efficiency with which they're deployed, it also implies flexibility and creativity, an "ability to slip the bonds of instinct and generate novel solutions to problems" (Gould and Gould, 1994).

Human behavior is the most astonishingly flexible behavior among any animal species. Heuristic intelligence, or improvisational intelligence, is the exemplary core for a phenomenon of human behavior in the evolutionary cognitive process. Heuristics are rules-of-thumb and simplified cognitive shortcuts we use to arrive at decisions and conclusions, helping us save energy and processing power. Cosmides and Tobby (2002) divide intelligences into two distinct categories: dedicated intelligences and improvisational intelligences, wherein dedicated intelligence refers to "the ability of a computational system to solve predefined, target set of problems" and improvisational intelligence refers to “the ability of a computational system to improvise solutions to novel problems”. They argue that the latter form of reasoning is employed whenever al allocated processing module doesn't exist to solve a particular problem. Our computational brain hierarchy is composed of a structure of innate neural networks, which have distinct established evolutionary developed functions, or massive modularity. The mind is not composed of "general-purposes" mechanisms, such as a general-purpose reasoning mechanism or a general-purpose learning mechanism, but instead consists of hundreds or thousands of highly specialized modules that provide us with innate knowledge and innate abilities in various sporadic domains. Most of these modules evolved during human development time in Pleistocene hunter-gatherer societies, applying universality over all human populations. They constitute an invariant human nature.

Within such modularity, improvisational intelligence essentially conceives a more domain-general kind of intelligence as being 'bundled-together' of several dedicated intelligences to solve evolutionary novel problems such as driving cars, using smartphones or launching rockets to space. Improvisational intelligence enables humans to solve such novel problems by processing information that is transiently and contingently valid. It is designed to represent the unique features of particular combinations of evolutionary recurrent categories and requires mechanisms that translate data from dedicated intelligences into common standards. Modular adaptations are invariably by specific external stimuli and improvisational intelligence, by contrast, permits the use of knowledge derived from domain specific inference systems in the absence of triggering stimuli. Hence, humans, unlike existing today machines embedded with artificial intelligence, are able to reason about the consequences of what is unknown, what can be anticipated to become known in the future or what is not physically present.

But is there a way to bootstrap improvisational intelligence and incorporate improvisation mechanisms? Non-evalutionary improvisation must be only memory-based as an emergent process guided by the expanding collection of background knowledge. Learning in the current context of machine learning is like querying an expert for an answer - an independent and purposeful activity in itself, the end product being newly created knowledge. In artificial agents case, by bundling novel memory (the ability to retrieve relevant background knowledge) and novel analogical reasoning (the ability to transfer knowledge from a similar situation in the past to the current situatio) of artificial non-evolutionary intelligent systems are fundamental to novel problem reformulation, which in turn is the basis for improvisational intelligence. The further humans extend their existence, subsequently unlinking evolution-based dedicated intelligences, the higher are chances for intelligent agents to establish human-like improvisational intelligence.

Dean Mai 7/19/17 Dean Mai 7/19/17

Swarm Intelligence: From Natural to Artificial Systems

Complexity is natively intervened within data: if an operation is decomposable into rudimentary steps whose number varies depending on data complexity, exploiting a data sequence as a whole (collective effort of colony members in the specific task), rather than a single data input, can conduce to a much faster result. By forming a closed-loop system among large populations of independent agents, the ‘Swarm’, high-level intelligence can emerge that essentially exceeds the capacity of the individual participants. The intelligence of the universe is social.

What is not good for the swarm is not good for the bee.
― Marcus Aurelius

Complexity is natively intervened within data: if an operation is decomposable into rudimentary steps whose number varies depending on data complexity, exploiting a data sequence as a whole (collective effort of colony members in the specific task), rather than a single data input, can conduce to a much faster result. By forming a closed-loop system among large populations of independent agents, the ‘Swarm’, high-level intelligence can emerge that essentially exceeds the capacity of the individual participants. The intelligence of the universe is social.

Yet due to the complexity concept, when designing artificially intelligent systems, researchers have historically turned to creating a single machine with the ability to perform a single task extremely well and eventually better than a human (ANI), similar to the way a honey bee can transfer pollen between flowers and plants extremely well, better than a human at that task. But a single honey bee has no capacity to extend natural means of reproduction of honey bee colonies by locating the ideal location for a hive and building such an incredibly complex structure, just like DeepMind’s AlphaGo has no capacity to truly understand most ordinary English sentences (yet). AlphaGo learned to play an exceedingly intricate game (and plays it better than any human) by analyzing about 30 million moves done by professional Go players and once AlphaGo could mimic human play, it moved to an even higher level by playing game after game against itself, closely tracking the results of each move. Is there true limitation to high-level intelligence that can arise from linking such independent agents as AlphaGo into a swarm of individuals working in collaboration to autonomously extract vast amounts of training data from each other? Will humans be able to grasp that critical mass point beyond which our mind can’t foresee the end result?

Swarm intelligence (SI) is a branch of artificial intelligence that deals with artificial and natural systems that are composed of many individual agents that coordinate using decentralized self-organization and control. This architecture models the collective behavior of social swarms in nature such as honey bees, ant colonies, schools of fish and bird flocks. Although these agents (swarm individual) are uncomplicated and non-intelligent per se, they are able to achieve necessary tasks for their survival by working collectively, something that could not be achieved by the limited capability capacities on their own. The type of interaction between these agents can be direct (visual or audio contact) but more interestingly it can also be indirect. Indirect interaction is referred to as stigmergy. This entails communication by modifying the physical environment therefore acting as a means of communication (in nature, ants leave trails of pheromone on their way in search for food or building sources and these pheromones signal, guide and enable other following ants). Swarm intelligence algorithms have been already successively applied in various problem domains including finding optimal routes, function optimization problems, structural optimization, scheduling and image and data analysis. Computational modeling of swarms has been in steady increase as applications in real life problem arise. Some of the existing models today are Artificial Bee Colony, Cat Swarm Optimization, Bacterial Foraging and the Glowworm Swarm Optimization, but the two most commonly used models are Ant Colony Optimization and Particle Swarm Optimization.

Ant Colony Optimization (ACO) Model draws inspiration from the social behavior of ants’ colony. It is a natural outlook that a group of ants can jointly figure out the shortest path between their nest and their food. Real ants lay down pheromones that direct each other while the simulated ants similarly record their positions and the accuracy of their solutions. This ensures that later simulation iterations get even better solutions. Similarly, artificial ants agent can locate the most optimal solutions of navigating through a parameter space with all possible options represented.
ACO has been used in many optimization problems that include scheduling, assembly line balancing, probabilistic Traveling Salesman Problem (TSP), DNA sequencing, protein-ligand docking and 2D-HP protein folding. More recently the Ant Colony Optimization algorithms have been extended for use in machine learning (deep learning) and data mining to enhance telecommunication and bioinformatics domains (the similarity of the ordering problems in bioinformatics, such as sequence alignment and gene mapping, makes it possible to solve extremely efficiently using ACO).

Particle Swarm Optimization (PSO) is based on the sociological behavior associated the flocking structure of birds. Birds can fly in large groups extending over large distances without colliding. They ensure there is an optimal separation between themselves and their neighbors. The PSO algorithm is a population-based search strategy that aims at finding optimal solutions by the use of a series of flying particles. Their velocities are dynamically adjusted according to their neighbors in the search space and the historical performance. PSO provides a breakthrough through solutions that can be mapped by a set of points in a n-dimension solution space. The term particle is used to refer to population members who are fundamentally described as the swarm positions in the n-dimensional solution space. Each particle is set into motion through the solution space with a velocity vector representing the particle‘s speed in each dimension. Each particle has a memory to store its historically best solution (i.e., its best position ever attained in the search space so far, which is also called its experience). Due to its simplicity, efficiency and fast-convergence nature, PSO has been applied in various real life problems. These ranges from combinational optimization to computational intelligence, signal processing to electromagnetic applications, robotics to medical applications. PSO is also used widely to train the weights of a feed-forward multilayer perceptron neural network. Consequent application include areas such as image classification, image retrieval, pixel classification, detection of texture synthesis, character recognition, shape matching, image noise cancellation and motion estimation, all of the parameters that can lead us to establishing a fully autonomous transportation system.

Swarm intelligence, deploying nature inspired models and converging on a preferred solution, is proved to provide simple yet robust methods of solving some complex real life problems in various fields of research, with incredible results. Yet enabling swarm intelligence by merging different autonomous narrow AI agents could irreversibly break “human-in-the-loop” and accelerate its expansion, beyond our knowledge or control. Are we going to be around to witness how smart an artificial swarm intelligence can get?

Dean Mai 6/15/17 Dean Mai 6/15/17

Legal Personhood for Artificial Intelligences

There are innumerable examples of other ways in which information technology has caused changes in the existing legislative structures. The law is naturally elastic, and can be expanded or amended to adapt to the new circumstances created by technological advancement. The continued development of artificial intelligence, however, may challenge the expansive character of the law because it presents an entirely novel situation.

They kept hooking hardware into him – decision-action boxes to let him boss other computers, bank on bank of additional memories, more banks of associational neural nets,’ another tubful of twelve-digit random numbers, a greatly augmented temporary memory. Human brain has around ten-to-tenth neurons. By third year Mike has better than one and a half times that number of neuristors. And woke up.
― The Moon is a Harsh Mistress, Robert A. Heinlein

Following Google I/O, Google's annual developer conference, where the company revealed the roadmap for highly-intelligent conversational AI and a bot-powered platform, as artificial intelligence disrupts how we live our lives, redefining how we would interact with present and future technology tools by automating things in a new way, it is inevitable we all have to imbibe the automated life gospel. One of the steps into that life is trying to unify the scope of the current technological advancements into a coherent framework of thought by exploring how current law applies to different sets of legal rights regarding artificial intelligence.

Artificial intelligence may generally be defined as the intelligence possessed by machines or software used to operate machines. It also encompasses the academic field of study that is more widely known as computer science. The basic premise of this field of study is that scientists can engineer intelligent agents that are capable of making accurate perceptions concerning their environment. These agents are then able to make correct actions based on these perceptions. The discipline of artificial intelligence explores the possibility of passing on traits that human beings possess as intelligent beings. These include knowledge, reasoning, the ability to learn and plan, perception, movement of objects and communication using language. As an academic field, it may be described as being interdisciplinary, as it combines sciences such as mathematics, computer science, and neuroscience as well as professional studies such as linguistics, psychology and philosophy. Professionals involved in the development of artificial intelligence use different tools to get machines to simulate characteristics of intelligence only found in humans.

But artificial intelligence only follows the lead of the already omnipresent challenges and changes to the existing legal frameworks. The twenty first century is undoubtedly the age of information and technology. Exciting scientific breakthroughs continue to be experienced as innovators work to create better, more intelligent and energy efficient machines. Rapid information technology development has posed challenges to several areas of law both domestically and internationally. Many of these challenges have been discussed at length and continue to be addressed through reforms of existing laws.

The trend towards reform of law to keep up with the growth of technology can also be illustrated by observing the use of social media to generate content. As social media has continued to grow and influence the world, international media law has recognized citizen journalism. The traditional role of journalists has been to generate and disseminate information. As the world’s population has gained increased access to smart devices, ordinary people have been able to capture breaking stories that are then uploaded to the internet through several platforms. This has eroded the sharp distinction that previously existed between professional journalists and ordinary citizens, as the internet provides alternatives to traditional news media sources.

There are innumerable examples of other ways in which information technology has caused changes in the existing legislative structures. The law is naturally elastic, and can be expanded or amended to adapt to the new circumstances created by technological advancement. The continued development of artificial intelligence, however, may challenge the expansive character of the law because it presents an entirely novel situation. To begin with, artificial intelligence raises philosophical questions concerning the nature of the minds of human beings. These philosophical questions are connected to legal and ethical issues of creating machines that are programmed to possess the qualities that are innate and unique to human beings. If machines can be built to behave like humans, then they must be accorded some form of legal personality, similar to that which humans have. At the very least, the law must make provision for the changes that advanced artificial intelligence will cause in the society through the introduction of a new species capable of rational, logical thought. By deriving general guidelines based on the case law of the past, it should aid the lawmakers to close the gap on technological singularity.

Legal personality endows its subjects with the capacity to have rights and obligations before the law. Without legal personality, there is no legal standing to conduct any binding transactions both domestically and internationally. Legal personality is divided into two categories. Human beings are regarded as natural or physical persons. The second category encompasses non-living legal subjects who are artificial but nonetheless treated as persons by the law. This is a fundamental concept in corporate law and international law. Corporations, states and international legal organizations are treated as persons before the law and are known as juridical persons. Without legal personality, there can be no basis upon which legal rights and duties can be established.

Natural persons have a wide array of rights that are recognized and protected by law. Civil and political rights protect an individual’s freedoms to self-expression, assemble, information, own property and self-determination. Social and economic rights acknowledge the individual’s fundamental needs to lead a dignified and productive life. These include the right to education, healthcare, adequate food, decent housing and shelter. As artificial intelligence continues to develop, and smarter machines are produced, it may be necessary to grant these machines legal personality.

This may seem like far-fetched scientific fiction, but it is in fact closer to reality than the general population is aware of. Computer scientists are at the frontline of designing cutting edge software and advanced robots that could revolutionize the way human live. Just like Turing’s machine was able to accomplish feats that were impossible for human mathematicians, scientists, and cryptologists, during World War II, the robots of the future will be able to think and act autonomously. Similarly, the positive implications of increased capacity to produce artificial intelligence, is the development of powerful machines. These machines could solve many of the problems that continue to hinder human progress such as disease, hunger, adverse weather and aging. The science of artificial intelligence would make it possible to program these machines to provide solutions to human problems, and their superior abilities would make it possible to find these solutions within a short period of time instead of decades or centuries.

The current legal framework does not provide an underlying definition of what determines whether a certain entity acquires legal rights. The philosophical approach does not yet distinguish between strong and weak forms of artificial intelligence.

Weak artificial intelligence merely facilitates a tool for enhancing human technological abilities. A running application comprising artificial intelligence aspects, such as Siri, represents only a simulation of a cognitive process but does not constitute a cognitive process itself. Strong artificial intelligence, on the other hand, suggests that a software application in principle can be designed to become aware of itself, become intelligent, understand, have perception of the world, and present cognitive states that are associated with the human mind.

The prospects for the development and use of artificial intelligence are exciting, but this narrative would be incomplete without making mention of the possible dangers as well. Humans may retain some level of remote control but the possibility that these created objects could rise up to positions of dominance over human beings is certainly a great concern. With the use of machines and the continual improvement of existing technology, some scientists are convinced that it is only a matter of time before artificial intelligence surpasses that of human intelligence.

Secondly, ethicists and philosophers have questioned whether it is sound to pass on innate characteristics of human beings on to machines if this could ultimately mean that the human race will become subject to these machines. Perhaps increased use of artificial intelligence to produce machines may dehumanize society, as functions that were previously carried out in the society become mechanized. In the past mechanization has resulted in loss of jobs as manpower is no longer required when machines can do the work. Reflections on history reveal that machines have assisted humans to make work easier, but it has not been possible to achieve an idyllic existence simply because machines exist.

Lastly, if this advanced software should fall into the hands of criminals, terrorist organizations or states that are set against peace and non-violence, the consequences would be dire. Criminal organizations could expand dangerous networks across the world using technological tools. Machines could be trained to kill or maim victims. Criminals could remotely control machines to commit crimes in different geographical areas. Software could be programmed to steal sensitive private information and incentivize corporate espionage.

The "singularity” is a term that was first coined by Vernor Vinge to describe a theoretical situation where machines created by humans develop superior intelligence and end the era of human dominance that would be as intelligent or more intelligent that human mind, using the exponential growth of computing power, based on the law of accelerating returns, combined with human understanding of the complexity of the brain.

As highlighted earlier, strong artificial intelligence that matches or surpasses human intelligence has not yet been developed, although its development has been envisioned. Strong artificial intelligence is a prominent theme in many science fiction movies probably because the notion of a super computer with the ability to outsmart humans is very interesting. In the meantime, before this science fiction dream can become a reality, weak artificial intelligence has slowly become a commonplace part of everyday life. Search engines and smart phone apps are the most common examples of weak artificial intelligence. These programs are simply designed and possess the ability to mimic simple aspects of human intelligence. Google is able to search for information on the web using key words or phrases inserted in by the user. The scenario of dominance by artificial intelligence seems a long way off from the current status quo. However, the launch of chatbots points towards the direction artificial intelligence will take in the near future using weak artificial intelligence.

Chatbots are the next link in the evolution chain of virtual personal assistants, such as Siri. Siri is the shortened version of the Scandinavian name Sigrid which means beauty or victory. It is a virtual personal assistant that is able to mimic human elements of interaction as it carries out its duties. The program is enabled with a speech function that enables it to reply to queries as well as take audio instructions. This is impressive as it does not require the user to type instructions. Siri is able to decode a verbal message, understand the instructions given and act on these instructions. Siri is able to provide information when requested to do so. It can also send text messages, organize personal schedules, book appointments and take note of important meetings on behalf of its user. Another impressive feature of the program is its ability to collect information about the user. As the user gives more instructions Siri stores this information and uses it to refine the services it offers to the user. The excitement that has greeted the successful launch of Siri within the mass market is imaginable. After Siri, came the chatbots. Chatbots are a type of conversational agent, a software designed to simulate an intelligent conversation with one or more human users via auditory or textual methods. The technology may be considered as weak artificial intelligence, but the abilities demonstrated by the program offer a glimpse into what the future holds for artificial intelligence development. For legal regulators virtual personal assistants' features demand that existing structures be reviewed to accommodate the novel circumstances that its use has introduced. As more programs like Siri contitnue to be commercialized, these new legal grey areas will feature more often in mainstream debate. Intellectual property law and liability law will probably be the areas most affected by uptake of chatbots by consumers.

Intellectual property law creates ownership rights for creators or inventors, to protect their interests in the works they create. Copyright law in particular, protects artistic creations by controlling the means by which these creations are distributed. The owners of copyright are then able to use their artistic works to earn an income. Anyone else who wants to deal with the creative works for profit or personal use must get authorization from the copyright owner. Persons who infringe on copyright are liable to face civil suits, arrest and fines. In the case of chatbots, the owner of the sounds produced by the program has not been clearly defined. It is quite likely that in the near future, these sounds will become a lucrative form of creative work and when that does happen it will be imperative that the law defines who the owner of these sounds is. Users are capable of using chatbot's features to mix different sounds, including works protected by copyright, to come up with new sounds. In this case, the law is unclear whether such content would be considered to be new content or whether it would be attributed to the original producers of the sound.

Another important question that would have to be addressed would be the issue of ownership between the creators of artificial intelligence programs, the users of such programs and those who utilize the output produced by the programs. A case could be made that the creators of the program are the original authors and are entitled to copyright the works that are produced using such a program. As artificial intelligence gains popularity within the society and more people have access to machines and programs like Siri, it is inevitable that conflicts of ownership will arise as different people battle to be recognized as the owner of the works produced. From the perspective of intellectual property, artificial intelligence cannot be left within the public domain. Due to its innate value and its capacity to generate new content, there will definitely be ownership wrangles. The law therefore needs to provide clarity and guidance on who has the right to claim ownership.

Law enforcement agents must constantly innovate in order to successfully investigate crime. Although the internet has made it easier to commit certain crimes, programs such as the ‘Sweetie’, avatar run by the charity Terres des Hommes based in Holland, illustrate how artificial intelligence can help to solve crime. The Sweetie avatar was developed by the charity to help investigate sex tourists who targeted children online. The offenders in such crimes engage in sexual acts with children from developing countries. The children are lured into the illicit practice with promises that they will be paid for their participation. After making contact and confirming that the children are indeed underage, the offenders then request the children to perform sexual acts in front of the cameras. The offenders may also perform sexual acts and request the children to view them.

The offenders prey on vulnerable children who often come from poor developing countries. The children are physically and mentally exploited to gratify offenders from wealthy Western countries. In October 2014, the Sweetie avatar project experienced its first successful conviction of a sex predator. The man, an Australian national named Scott Robert Hansen admitted that he had sent nude images of himself performing obscene acts to Sweetie. Hansen also pleaded guilty to possession of child pornography. Both these offenses were violations of previous orders issued against him as a repeat sexual offender. Sweetie is an app that is able to mimic the movements of a real ten year old girl. The 3D model is very lifelike, and the app allows for natural interactions such as typing during chats, nodding in response to questions asked or comments made. The app also makes it possible for the operator to move the 3D model from side to side in its seat. Hansen fell for the ploy and believed that Sweetie was a real child.

According to the court, it was immaterial that Sweetie did not exist. Hansen was guilty because he believed that she was a real child and his intention was to perform obscene acts in front of her. Although Hansen was the only person to be convicted as a result of the Terres des Hommes project, researchers working on it had patrolled the internet for ten weeks. In that time, thousands of men had gotten in touch with Sweetie. Terres des Hommes compiled a list of one thousand suspects which was handed over to Interpol and state police agencies for further investigations. The Sweetie project illustrates that artificial intelligence can be utilized to investigate difficult crimes such as sex tourism. The biggest benefit of such a project is that it created an avatar that was very convincing and removed the need to use real people in the undercover operation. In addition the project had an ideal way of collecting evidence through use of a form of artificial intelligence that was very difficult to contradict. Thus, in a way, artificial intelligence provided grounds for challenging the already existing legal rights of the accused

Presently the law provides different standards of liability for those who break the law. In criminal law, a person is liable for criminal activity if they demonstrate that they have both a guilty mind (the settled intent to commit a crime) and they performed the guilty act in line with this intent. In civil cases liability for wrongdoing can be reduced based on mitigating factors such as the contributory negligence of the other party. There is currently no explicit provision in law that allows defendants to escape liability by claiming that they relied on incorrect advice from an intelligent machine. However, with increased reliance on artificial intelligence to guide basic daily tasks, the law will eventually have to address this question. If a user of artificial intelligence software makes a mistake while acting on information from the software, they may suffer losses or damages arising from the mistake. In such cases the developers of the software may be required to compensate the user or incur liability for the consequences of their software’s failure. If machines can be built with the ability to make critical decisions, it is important to have a clear idea of who will be held accountable for the actions of the machine.

Autonomous driverless cars represent an interesting example of the inception for such decisions to be made in the future. Florida, Nevada, Michigan, and D.C. states have also passed laws allowing autonomous cars driving on their streets in some capacity. The question to how autonomous cars might lead to the change of the liability and ethical rights stands upon software ethical settings that might control self-driving vehicles to prioritize human lives over financial or property loss. The numerous ethical dilemmas revolving around autonomous cars choosing to save passengers over saving a child’s life could arise. The lawmakers, regulators and standards organizations should develop concise legal principles upon which such ethical questions will be addressed by defining a liable entity.

Turing, one of the fathers of modern computer science and artificial intelligence, envisioned a world in which machines could be designed to think independently and solve problems. Modern scientists still share Turing’s vision. It is this vision that inspires countless mathematicians and developers around the world to continue on designing better software applications with greater capabilities. The scientific community and the society at large, have several positive expectations concerning artificial intelligence and the potential benefits humankind could reap from its development. Intelligent machines have the potential to make our daily lives easer as well as unlock mysteries that cannot be solved by human ingenuity. They also have the potential to end the dominance of human beings on this planet. The need for law to be reformed with regard to artificial intelligence is apparent. As the world heads into the next scientific era with both excitement and fear, the law must find a way to adjust the new circumstances created by machines that can think. As we involve artificial intelligence more in our lives and try to learn about its legal implications, there will undoubtedly be changes needed to be applied.

Dean Mai 4/23/17 Dean Mai 4/23/17

Cortical Interface: ‘Conscious-Competence’ Model

The unmitigated accuracy in inputting and outputting data through different medium interfaces (as well as our own technological fluency in using and utilizing information resources in itself) signals the multiplicity of subjectivities we easily form, participate in and are subjected to in our everyday lives. Humanity is on the path to significantly accelerate the evolution of intelligent life beyond its current human form and human limitations.

"Tank: …now, we're supposed to start with these operation programs first, that's a major boring shit. Let's do something more fun. How about combat training.
Neo: Jujitsu? I'm going to learn Jujitsu?... Holy shit.
Tank: Hey Mikey, I think he likes it. How about some more?
Neo: Hell yes. Hell yeah."
― The Matrix

The unmitigated accuracy in inputting and outputting data through different medium interfaces (as well as our own technological fluency in using and utilizing information resources in itself) signals the multiplicity of subjectivities we easily form, participate in and are subjected to in our everyday lives. Humanity is on the path to significantly accelerate the evolution of intelligent life beyond its current human form and human limitations.

Kernel, IBM, Neuralink, Facebook—all work to develop some kind of cortical interface by implanting microscopic brain electrodes that in the future may upload and download thoughts to enhance human abilities. Even smallest advancement of this technology would trigger bio-technological enhancement of human beings in automation and cyberinteraction, enabling data access to web networks and wireless communication in real-time, directly from our minds.

As alerting of what impact these advancements might have on human consciousness, in the end, all existing technologies work to fulfill our innate human desire—to stay closely connected and be a part of a known and a similar. The increased connectivity and provided time-space distanciation , where for the first time we have the ability to be connected instantaneously and aware of all other people at all times, has been already shaping a “global neural net”, along which ideas spread and come to fruition at previously unmatched rates, for years. If human beings, looking to implement connectivity to the furthest reaches of technological development, continue to make advances (and we will) in interpersonal technology, defining our social individuality neither starts nor ends in the boundaries of our synthetic skin. Not only through the current social networks, that are now the primary platforms of self-making and self-representation, individuals in data age sustain their integration into the society through the means of smart phones, online banking systems, internet profiling, or ‘digital nomad’ occupations. Considering that a non-credit card holder can not even book rooms in hotels anymore and a non-credit score individual can not obtain a credit card, we can say that our subjective positioning within the social, economic, and political systems are now almost strictly digital. In entering this infinitesimally deep and complex, venturing-too-far-down ‘rabbit hole’, humanity is ready to seek and adopt its hybrid anatomy (artificial and organic), enabling a far greater utilization of the means of communication technologies than any modern human has.

With significant difference in capabilities to process data in our hybrid state, such technology inevitably leads to rapidly advancing artificial intelligence, possibly, to a level where the difference between individual consciousness and artificial intelligence is blurred. Seeking the ability to enhance our parts infers that soon we will be offered options to alter our bodies through high-functioning prostheses. Whatever that initial reason would be (evening the odds with artificial intelligence, finding a cure for physical and mental disorders, or just continuing our natural evolution), we are indeed moving towards a future of iteratively reproduced congregate bodies.

Technological advancement in bringing human intelligence to a new level reminds me of a reverse ‘conscious-competence’ model. The model represents human desire to beat the “unknown unknowns”, whatever the means are. It views human learning along two dimensions, consciousness and competence, moving in reverse through a four-stage progression from unconscious competence (the individual has had so much practice or was born with a skill that it has become "second nature" and can be performed easily; as a result, the skill can be performed while executing another task and the individual may be able to teach it to others, depending upon how and when it was learned) to conscious competence (the individual understands or knows how to do something, however, demonstrating the skill or knowledge requires concentration; it may be broken down into steps, and there is heavy conscious involvement in executing the new skill) to conscious incompetence (though the individual does not understand or know how to do something, he or she does recognize the deficit, as well as the value of a new skill in addressing the deficit; the making of mistakes can be integral to the learning process at this stage) to, finally, human unconscious incompetence (the individual does not understand or know how to do something and does not necessarily recognize the deficit; they may deny the usefulness of the skill; the individual may never recognize their own incompetence, and the value of the new skill, because the stimulus to learn is unknown and cognitively unreachable). But we will never accept our 'human' unconscious incompetence.

In a typical human-like way, the possibility of extending our thinking beyond a physical body has been out there for a while and can be already found in the works of Aristoteles, Plato and Descartes, assuming that the rational self had an ‘inner’ relationship with the mind and an ‘outer’ relationship with the body. This ensured that the body was perceived as part of environment and not as part of the individual self. Consequently, the ultimate dream in cartesian dualism is disembodiment. Elon Musk's ‘neural lace’ would be the ultimate dream for Descartes. The possibility of escaping the body would pave the way for entirely pure thought reasoning.

But as much as drastically technology progresses and alters socially, politically, economically, scientifically acceptable frameworks (involving human biology), it is, in the end, consistent with the history and nature of society and constitutes just a simple advancement supply (creating connection) that depends on human demand (being connected), as any other advancements do.

Dean Mai 3/4/17 Dean Mai 3/4/17

Understanding the Theory of Embodied Cognition

Embodied cognition is a research theory that is generally all about the vast difference of having an active body and being situated in a structured environment adept to the kind of tasks that the brain has to perform in order to support adaptive task success.

“We shape our tools and thereafter our tools shape us.”
― Marshall McLuhan

Artificial intelligence (AI) systems are generally designed to solve one traditional AI task. While such weak systems are undoubtedly useful as decision-making aiding tools, future AI systems will be strong and general, consolidating common sense and general problem solving capabilities (a16z podcast “Brains, Bodies, Minds … and Techno-Religions” brings some great examples of what general artificial intelligence could be capable of). To achieve general intelligence—a human-like ability to use previous experiences to solve arising problems—AI agents’ “brains” would need to (biologically) evolve their experiences into a variety of new tasks. This is where Universe comes in.

In December, OpenAI introduced Universe, a software platform for training an AI's general intelligence to become skilled at any task that a human can do with a computer. Universe builds upon OpenAI’s Gym, a toolkit designed for the development and comparing of reinforcement learning algorithms (the environment acts as the tutor, providing periodic feedback/“reward” to an agent which in turn will either encourage or discourage subsequent actions). The Universe software essentially allows any program to be turned into a Gym environment by launching it behind a virtual desktop avoiding the requirement for Universe to have direct access to the programs source code and other protected internal data.

OpenAI perceives such interaction as a validation for artificial intelligence: many applications are essentially micro-virtual worlds and exposing AI learning techniques to them will lead to more trained agents, capable of tackling a diverse range of (game) problems quickly and well. Being able to master new, unfamiliar environments in this way is a first step toward general intelligence, allowing AI agents to “anticipate,” rather than forever getting stuck in a singular “single task” loop.

However, as much as Universe is a unique experience vessel for artificial intelligence, it is a unique visual experience vessel, enabling an agent to interact with any external software application via pixels (by using keyboard, and mouse), each of these applications constituting different HCI environment sources. It is the access to a vast digital universe full of variety of visual training tasks.

But isn’t it missing out on all the fun of full tactile experience? Shouldn’t there be a digitized training somatosensory platform for AI agents, to recognize and interpret the myriad of tactile stimuli to grasp onto the experience of a physical world? The somatosensory system is the part of the central nervous system that is involved with decoding a wide range of tactile stimuli comprising object recognition, texture discrimination, sensory-motor feedback and eventually inter-social communication exchange—for our perception and reaction to stimuli originating outside and inside of our body and for the perception and control of body position and balance. One of the more essential aspects of general intelligence that gives us a common sense of understanding the world is being placed in the environment and being able to interact with things in the world—embedded in all of us is the instinctual ability of telling apart any mechanical forces upon the skin (temperature, texture, intensity of the tactile stimuli).

Our brain is indeed the core of all human thought and memory, constantly organizing, identifying, perceiving the environment that surrounds us and interpreting it through our senses, in a form of the data flow. And yet, studies have taught us that multiple senses can stimulate the central nervous center. (Only) estimated 78% of all perceived by brain data flow is visual, while the remaining part originates from sound (12%), touch (5%), smell (2.5%), and taste (2.5%)—and that is assuming that we deciphered all of the known senses. So by training general AI purely via its visual interaction, will we be getting a 78% general artificial intelligence? Enter the “embodied cognition” theory.

Embodied Cognition

Embodied cognition is a research theory that is generally all about the vast difference of having an active body and being situated in a structured environment adept to the kind of tasks that the brain has to perform in order to support adaptive task success. Here I refer to the team as the existence of a memory system that encodes data of agent’s motory and sensory competencies, stressing the importance of action for cognition, in such way that an agent is capable to tangibly interact with the physical world. The aspects of the agent's body beyond its brain play a significant causative and physically integral role in its cognitive processing. The only way to understand the mind, how it works, and subsequently train it is to consider the body and what helps the body and mind to function as one.

This approach is in line with a biological learning pattern based on “Darwinian selection” that proposes intelligence to be only be measured in the context of the surrounding environment of the organism studied: “…we must always consider the embodiment of any intelligent system. The preferred embodiment reflects that the mind and its surrounding environment (including the physical body of the individual) are inseparable and that intelligence only exists in the context of its surrounding environment.”

Stacked Neural Networks Must Emulate Evolution’s Hierarchical Complexity (Commons, 2008)

Current notions of neural networks (NNSs) are indeed based on the known evolutionary processes of executing tasks and share some properties of biological NNSs in the attempt to tackle general problems but as architecture inspiration thus without necessarily closer copying a real biological system. One of such first design steps is the advancement to develop AI NNSs, that can closely imitate general intelligence, follows the model of hierarchical complexity (HC), in terms of data acquisition. Stacked NNs based on this model could imitate evolution's environmental/behavioral processes and reinforcement learning (RL). However, computer-implemented systems or robots generally do not indicate generalized higher learning adaptivity—the capacity to go from learning ability to learning another without dedicated programming.

Established NNs are limited for two reasons. The first one of the problems is that AI models are based on the notions of Turing machines. Almost all AI models are based on words or text. But Turing machines are not enough to really produce intelligence. At the lowest stages of development, they need effectors that produce a variety of responses—movement, grasping, emoting, and so on. They must have extensive sensors to take in more from the environment. Even though Carpenter and Grossberg's (1990, 1992) neural networks were to model simple behavioral processes, however, the processes they were to model were too complex. This resulted in NNs that were relatively unstable and were not highly adaptable. When one looks at evolution, however, one sees that the first NNs that existed were, for example, in Aplysia, Cnidarians (Phylum Cnidaria), and worms. They were specialized to perform just a few tasks even though some general learning was possible.

Animals, including humans, pass through a series of ordered stages of development (see “Introduction to the Model of Hierarchical Complexity,” World Futures, 64: 444-451, 2008). Behaviors performed at each higher stage of development always successfully address task requirements that are more hierarchically complex than those required by the immediately preceding order of hierarchical complexity. Movement to a higher stage of development occurs by the brain combining, ordering, and transforming the behavior used at the preceding stage. This combining and ordering of behaviors thus must be non-arbitrary.

Somatosensory System Emulation

Neuroscience has discovered classification of specific regions, processes, and interactions down to molecular level for memory and thought reasoning. Neurons and synapses are both actively involved in thought and memory, and with the help of brain imaging technology (e.g. Magnetic Resonance Imaging (MRI), Nuclear Magnetic Resonance Imaging, or Magnetic Resonance Tomography (MRT)), brain activity can be analyzed at the molecular level. All perceived data in the brain is represented in the same way, through the electrical firing patterns of neurons. The learning mechanism is also the same: memories are constructed by strengthening the connections between neurons that fire together, using a biochemical process known as long-term potentiation. Recently atomic magnetometers have begun development of inexpensive and portable MRI instruments without large magnets (used in traditional MRI machines to image parts of the human anatomy, including the brain). There are over 10 billion neurons in the brain, each of which has synapses that are involved in memory and learning, which can also be analyzed by brain imaging methods, soon in-real time. It has been proven that new brain cells are created whenever one learns something new by physically interacting with their environment. Whenever stimuli in the environment or through a thought makes a significant enough impact on the brain perception, new neurons are created. During this process synapses carry on electro-chemical activities that directly reflect activity related to both memory and thought, from a tactile point of sensation. The sense of touch, weight, and all other tactile sensory stimuli need to be implemented as the concrete “it” value that is assigned to an agent by the nominal concept. By reconstructing 3D neuroanatomy from molecular level data, sensory activity in the brain at the molecular level can be detected, measured, stored, and reconstructed of a subset of the neural projections, generated by an automated segmentation algorithm, to convey the neurocomputational sensation to an AI agent. Existence of such somatosensory Universe-like database, focused on the training of AI agents, beyond visual interaction, may bring us closer to the 100% general AI.

Dean Mai 12/15/16 Dean Mai 12/15/16

Patents in an era of artificial intelligence

The fuzziness of software patents’ boundaries has already turned the ICT industry into one colossal turf war. The expanding reach of IP has introduced more and more possibilities for opportunistic litigation (suing to make a buck). In the US, two-thirds of all patent law suits are currently over software, with 2015 seeing more patent lawsuits filed than any other year before.

“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”
― George Bernard Shaw

Just in the last month, headlines about the future of artificial intelligence (AI) were dominating most of the technology news across the globe:

On 15 November, OpenAI, a research company in San Francisco, California, co-founded by entrepreneur Elon Musk, announced their partnership with Microsoft to start running most of their large-scale experiments on Microsoft’s open source deep learning software platform, Azure;
Two weeks later, Comma.ai open sourced its AI driver assistance system and robotics research platform;
On 3 December, DeepMind, a unit of Google headquartered in London, opened up its own 3D virtual world, DeepMind Lab, for download and customization by outside developers;
Two days later, OpenAI released a ‘meta-platform’ that enables AI programs to easily interact with dozens of 3D games originally designed for humans, as well as with some web browsers and smartphone apps;
A day later, in a keynote at the annual Neural Information Processing Systems conference (NIPS) Russ Salakhutdinov, director of AI research at Apple, announced that Apple’s machine learning team would both publish its research and engage with academia;
And on 10 December, Facebook announced to open-source their AI hardware design, Big Sur

What’s going on here? In the AI field, maybe more than in any other, the research thrives directly on open collaboration—AI researchers routinely attend industry conferences, publish papers, and contribute to open-source projects with mission statements geared toward the safe and careful joint development of machine intelligence. There is no doubt that AI will radically transform our society, having the same levels of impact as the Internet has since the nineties. And it has got me thinking that with AI becoming cheaper, more powerful and ever-more pervasive, with a potential to recast our economy, education, communication, transportation, security and healthcare from top to bottom, it is of the utmost importance that it (software and hardware) wouldn’t be hindered by the same innovation establishment that was designed to promote it.

System glitch

Our ideas are meant to be shared—in the past, the works of Shakespeare, Rembrandt and Gutenberg could be openly copied and built upon. But the growing dominance of the market economy, where the products of our intellectual labors can be acquired, transferred and sold, produced a system side-effect glitch. Due to the development costs (of actually inventing a new technology), the price of unprotected original products is simply higher than the price of their copies. The introduction of patent (to protect inventions) and copyright (to protect media) laws was intended to address this imbalance. Both aimed to encourage the creation and proliferation of new ideas by providing a brief and limited period of when no one else could copy your work. This gave creators a window of opportunity to break even with their investments and potentially make a profit. After which their work entered a public domain where it could be openly copied and built upon. This was the inception of open innovation cycle—an accessible vast distributed network of ideas, products, arts and entertainment - open to all as the common good. The influence of the market transformed this principle into believing that ideas are a form of property and subsequently this conviction yield a new term of “intellectual property” (IP).

Loss aversion

“People’s tendency to prefer avoiding losses to acquiring equivalent gains”: it’s better to not lose $10 than to find $10 and we hate losing what we’ve got. To apply this principle to intellectual property: we believe that ideas are property; the gains we gain from copying the ideas of others don’t make a big impression on us, however when it’s our ideas being copied, we perceive it as a property loss and we get (excessively) territorial. Most of us have no problem with copying (as long as we’re the ones doing it). When we copy, we justify it; when others copy, we vilify it. So with the blind eye toward our own mimicry and propelled by faith in markets and ultimate ownership, IP swelled beyond its original intent with broader interpretations of existing laws, new legislation, new realms of coverage and alluring rewards. Starting in the late nineties, in the US, a series of new copyright laws and regulations began to be shaped (Net Act of 1997, DMCA of 1998, Pro-IP of 2008, The Enforcement of Intellectual Property Rights Act of 2008) and many more are in the works (SOPA, The Protect IP Act, Innovative Design Protection and Piracy Prevention Act, CAS “Six Strikes Program”). In Europe, there is currently 179 different sets of laws, implementing rules and regulations, geographical indications, treaty approvals, legal literature, IP jurisprudence documents, administered treaties and treaty memberships.

In the patents domain, technological coverage to prevent loss aversion made the leap from physical inventions to virtual ones, most notably—software.

Rundown of computing history

The first computer was a machine of cogs and gears, and became practical only in the 1950s and 60s with the invention of semi-conductors. Forty years ago, (mainframe-based) IBM emerged as an industry forerunner. Thirty years ago, (client server-based) Microsoft leapfrogged and gave ordinary people computing utility tools, such as word-processing. As computing became more personal and the World-Wide-Web turned Internet URLs into web site names that people could access, (internet-based) Google offered the ultimate personal service, free gateway to the infinite data web, and became the new computing leader. Ten years ago, (social-computing) Facebook morphed into a social medium as a personal identity tool. Today, (conversational-computing) Snap challenges Facebook as-Facebook-challenged-Google-as-Google-challenged-Microsoft-as-Microsoft-challenged-IBM-as-IBM-challenged-cogs-and-gears.

History of software patenting

Most people in the S/W patent debate are familiar with Apple v. Samsung, Oracle v. Google with open-source arguments, etc., but many are not familiar with the name Martin Goetz. Martin Goetz filed the first software patent in 1968, for a data organizing program his small company wished to sell for use on IBM machines. At the time, IBM offered all of their software as a part of the computers that they sold. This gave any other competitors in the software space a difficult starting point: competitors either offered their own hardware (HP produced their first computer just 2 years earlier) or convince people to buy software to replace the free software that came with the IBM computers.

Martin Goetz was leading a small software company, and did not want IBM to take his technological improvements and use the software for IBM's bundled programs without reimbursement, so he filed for a software patent. Thus, in 1968, the first software patent was issued to a small company, to help them compete against the largest computer company of the time. Although they had filed a patent to protect their IP, Goetz's company still had a difficult time competing in a market that was dominated by IBM, so they joined the US Justice Department's Anti-Trust suit against IBM, forcing IBM to un-bundle their software suite from their hardware appliances.

So the beginning of the software industry started in 1969, with the unbundling of software by IBM and others. Consumers had previously regarded application and utility programs as cost-free because they were bundled in with the hardware. With unbundling, competing software products could be put on the market because such programs were no longer included in the price of the hardware. Almost immediately, the software industry has emerged. On the other hand, it was quickly evident that some type of protection would be needed for this new form of intellectual property.

Unfortunately, neither copyright law nor patent law seemed ready to take on this curious hybrid of creative expression and functional utility. During the 1970s, there was total confusion as to how to protect software from piracy. A few copyrights were issued by the Copyright Office, but most were rejected. A few software patents were granted by the PTO, but most patent applications for software-related inventions were rejected. The worst effect for the new industry was the uncertainty as to how this asset could be protected. Finally, in 1980, after an extensive review by the National Commission on New Technological Uses of Copyrighted Works (CONTU), Congress amended the Copyright Act of 1976 to cover software. It took a number of important cases to resolve most of the remaining issues in the copyright law, and there are still some issues being litigated, such as the so-called “look and feel”, but it appears that this area of the law is quite well understood now. For patents, it took a 1981 Supreme Court decision, Diamond v. Diehr, to bring software into the mainstream of patent law. This decision ruled that the presence of software in an otherwise patentable technology did not make that invention unpatentable. Diamond v. Diehr opened the door for a flood of software-related patent applications. Unfortunately, the PTO was not prepared for this new development, and in the intervening years they have issued thousands of patents that appear to be questionable to the software industry. It took a few years after 1981 for the flow of software-related applications to increase, and then there was some delay because of the processing of these applications. Now the number of infringement case is on the rise.

The transition from physical patents to virtual patents was not a natural one. In its core, a patent is a blueprint for how to recreate an invention; while (the majority of) software patents are more like a loose description of something that would look like if it actually was invented. And software patents are written in the broadest possible language to get the broadest possible protection - the vagueness of these terms can sometimes reach absurd levels, for example “information manufacturing machine” which covers anything computer-like or “material object” which covers… pretty much everything.

What now?

35 U.S.C. 101 reads as follows:

“Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirement of this title.”

When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the technology is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter. Since it became widespread and commercially valuable, it has been highly difficult to classify software within a specific category of intellectual property protection.

Attempts are usually made in the field of software technology to combine methods or means used in different fields or apply them to another field in order to achieve an intended effect. Consequently, combining technologies used in different fields and applying them to another field is usually considered to be within the exercise of an ordinary creative activity of a person skilled in the art, so that when there is no technical difficulty (technical blocking factor) for such combination or application, the inventive step is not affirmatively inferred unless there exist special circumstances, such as remarkably advantageous effects. Software is not a monolithic work: it possesses a number of elements that can fall within different categories of intellectual property protection.

In Israel, legal doctrines adapt to changes in innovative technological products and the commercial methods that extend this innovation to the marketplace. The decision issued by the Israeli Patent Registrar in the matter of Digital Layers Inc confirms the patentability of software-related inventions. The Registrar ruled that the claimed invention should be examined as a whole and not by its components, basing his ruling on the recent matter of HTC Europe Co Ltd v. Apple Inc, quoting:

"…It causes the device to operate in a new and improved way and it presents an improved interface to application software writers. Now it is fair to say that this solution is embodied in software but, as I have explained, an invention which is patentable in accordance with conventional patentable criteria does not become unpatentable because a computer program is used to implement it…"

After Alice Corp. v. CLS Bank International, if the technology does fall within one of the categories, it must then be determined whether the technology is directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), and if so, it must additionally be determined whether the technology is a patent-eligible application of the exception. If an abstract idea is present in the technology, any element or combination of elements must be sufficient to ensure that the technology amounts to significantly more that the abstract idea itself. Examples of abstract ideas include fundamental economic practices (comparing new and stored information and using rules to identify options in SmartGene); certain methods of organizing human activities (managing game of Bingo in Planet Bingo v. VKGS and user interface for mean planning in Dietgoal Innovation vs. Bravo Media); an idea itself (store and transmit information in Cyberfone); and mathematical relationship/formulas (updating alarm limits using a mathematical formula in Parker v. Flook and generalized formulation of computer program to solve mathematical problem in Gottschalk v. Benson). The technology cannot merely amount to the application or instructions to apply the abstract idea on a computer, and is considered to amount to nothing more than requiring a generic computer system to merely carry out the abstract idea itself. Automating conventional activities using generic technology does not amount to an inventive concept as these simply describes “automation of a mathematical formula/relationship through use of generic computer function” (OIP Technologies v. Amazon). The procedure of the invention using an existing general purpose computer do not purport to improve any another technology or technical field, or to improve the functioning of a computer itself and do not move beyond a general link of the use of an abstract idea to a particular technological environment.

The Federal Circuit continues to refine patent eligibility for software

Following the Supreme Court’s decision in Alice v. CLS Bank, the court of appeals in Ultramercial v. Hulu reversed its prior decision and ruled that the claims were invalid under 35 U.S.C. § 101. Following the two-step framework outlined in Alice, Judge Lourie concluded that the claims were directed to an abstract idea.
The Federal Circuit’s decision in Digitech Image Techs. v. Electronics for Imaging illustrated the difficulty many modern software implemented inventions face. If a chemist were to invent a mixture of two ingredients that gives better gas mileage, it is hard to imagine that a claim to such a mixture would receive a § 101 rejection. Yet, when to elements of data are admixed to produce improved computational results, the court are quick to dismiss this as a patent-ineligible abstraction. The real problem Digitech faced was that both data elements were seen as being abstractions: one data type represented color information (an abstraction) and the other data type represented spatial information (another abstraction).
DDR Holdings v. Hotels.com, a 2014 Federal Circuit decision, provides a good discussion of a patent-eligible Internet-centric technology. In applying the Mayo/Alice two-part test, the court admitted it can be difficult sometimes to distinguish “between claims that recite a patent-eligible invention and claims that add too little to a patent-ineligible abstract concept”.
Content Extraction v. Wells Fargo Bank gives a roadmap to how the Court of Appeals for the Federal Circuit will likely handle business method patents in the future. First, if the manipulation of economic relations are deemed present, you can be sure that any innovative idea with the economic realm will be treated as part of the abstract idea. Essentially, no matter how clever an economic idea may be, that idea will be branded part of the abstract idea problem, for which there can be only one solution, and that is having something else innovative that is not part of the economic idea. Practically speaking, this means the technology needs to incorporate an innovative technology improvement that makes the clever economic idea possible.

So the fuzziness of software patents’ boundaries has already turned the ICT industry into one colossal turf war. The expanding reach of IP has introduced more and more possibilities for opportunistic litigation (suing to make a buck). In the US, two-thirds of all patent law suits are currently over software, with 2015 seeing more patent lawsuits filed than any other year before. Of the high-tech cases, more than 88% involved non-practicing entities (NPEs). These include two charmlessly evolving species who’s entire business model is lawsuits—patent trolls and sample trolls. These are corporations that don’t actually create anything, they simply acquire a library of intellectual property rights and then litigate to earn profits (and because legal expenses are millions of dollars, their targets usually highly motivated to settle out of court). And the patent trolls are most common back in the troubled realm of software. The estimated wealth loss in the US alone is $500,000,000,000 (that’s a lot of zeros).

Technology conversion and open innovation

For technological companies, conversion and the advance of open source approach, driven largely by collaborative processes introduced by GitHub, Google's Android, Apple’s Swift and most recently by Microsoft joining Linux Foundation, has created a systematic process for innovation which is increasing software functionality and design. 150 years ago, innovation required a dedicated team spending hours in a lab, extensively experimenting and discovering “10,000 ways not to make a light-bulb”, before finding one that worked. Today, innovation has gained a critical mass as technology and users’ feedback are combined to give a purposeful team the ability to find 10,000 ways not to do something in a matter of hours, with the right plan in place. Today, a development team can deliver a product in a matter of months and test it in such a way that customer responses are delivered to the right development team member directly with the feedback being implemented and a system being corrected (almost) in real-time. The life of a software today patent is still 20 years from the date the application was filed. The patent system, that has existed since 1790, is not equipped to handle this new technology and there is a need to establish an agile, sui generic, short-cycle— three to five years—form of protection dedicated solely to software protection. As patents play an essential role in market-centred systems of innovation, patent exclusivity criteria should be redesigned more systematically to reflect the ability of software patents to foster innovation and to encourage technology diffusion.

The belief in intellectual property has grown so dominantly it has pushed the original intent of patents out of public consciousness. But that original purpose is right there, in plain sight—the US Patent Act of 1790 reads “An Act to promote the progress of useful Arts”. However, the exclusive rights this act introduced were offered in sacrifice for a different purpose - the intent was to better the lives of everyone by incentivizing creativity and producing a rich pool of knowledge open to all—but exclusive rights themselves came to be considered the only point, so they were expanded exponentially, and the result hasn’t been more progress or more learning, but more squabbling and more legal abuse. AI is entering the age of daunting problems—we need the best ideas possible, we need them now, and we need them to spread as fast as possible. The common meme was overwhelmed by exclusivity obsession and it needs to spread again, especially today. If the meme prospers, our laws, our norms, and our society—they will all transform as well.

Dean Mai 11/3/16 Dean Mai 11/3/16

Ambient Intelligence as a Multidisciplinary Paradigm

The future of artificial intelligence is not so much about direct interaction between humans and machines, but rather indirect amalgamation with the technology that is all around us, as part of our everyday environment. Rather than having machines with all-purpose intelligence, humans will interact indirectly with machines having highly developed abilities in specific roles. Their sum will be a machine ecosystem that adapts to and aids in whatever humans are trying to do. In that future, the devices might feel more like parts of an overall environment we interact with, rather than separate units we use individually. This is what ambient intelligenceis.

In recent years, advances in artificial intelligence (AI) have opened up new business models and new opportunities for progress in critical areas such as personal computing, health, education, energy, and the environment. Machines are already surpassing human performance of certain specific tasks, such as image recognition.

Artificial intelligence technologies have received $974m of funding as a first half of 2016, set to surpass 2015’s total, with 200 AI-focused companies have raised nearly $1.5 billion in equity funding. These figures will continue to rise as more AI patent applications were filed in 2016 than ever before: more than three thousand patent applications versus just under a hundred in 2015.

Yet the future of artificial intelligence is not so much about direct interaction between humans and machines, but rather indirect amalgamation with the technology that is all around us, as part of our everyday environment. Rather than having machines with all-purpose intelligence, humans will interact indirectly with machines having highly developed abilities in specific roles. Their sum will be a machine ecosystem that adapts to and aids in whatever humans are trying to do.

In that future, the devices might feel more like parts of an overall environment we interact with, rather than separate units we use individually. This is what ambient intelligence is.

The IST Advisory Group (ISTAG) coined the term in 2001, with an ambitious vision of its widespread presence by 2010. The report describes technologies which exist today, such as wrist devices, smart appliances, driving guidance systems, and ride sharing applications. On the whole it might seem still very futuristic, but nothing in it seems outrageous. At first glance, its systems seem to differ from what we have today in pervasiveness more than in kind.

The scenarios, which ISTAG presents, surpass present technology in a major way, though. The devices they imagine anticipate and adapt to our needs in a much bigger way than anything we have today. This requires a high level of machine learning, both about us and about their environment. It implies a high level of interaction among the systems, so they can acquire information from one another.

Not Quite Turing's vision

Alan Turing thought that advances in computing would lead to intelligent machines. He envisioned a computer that could engage in a conversation indistinguishable from a human's. Time has shown that machine intelligence is poor at imitating human beings, but extremely good at specialized tasks. Computers can beat the best chess players, drive cars more safely than people can, and predict the weather for a week or more in advance. Computers don't compete with us at being human; they complement us with a host of specialties. They're also really good at exchanging information rapidly.

This leads naturally to the scenario where AI-implemented devices attend to our needs, each one serving a specific purpose but interacting with devices that serve other purposes.

We witness this in the Internet of Things. Currently most of its devices perform simple tasks, such as accepting remote direction and reporting status. They could do a lot more, though. Imagine a thermostat that doesn't just set the temperature when we instruct it to, but turns itself down when we leave the house and turns itself back up when we start out for home. This isn't a difficult task, computationally; it just requires access to more data about what we're doing.

Computers perform best in highly structured domains. They “like” to have everything unambiguous and predictable. Ambient intelligence, on the other hand, has to work in what are called "uncertain domains." (Much as in HBO’s Westworld, users (guests) are thrown into pre-determined storylines from which they are free to deviate, however ambient intelligence (hosts) are programmed with script objectives, so even minor deviations or improvisations based on a user’s interference won't totally disrupt their functioning, they adapt.) The information in these domains isn't restricted to a known set of values, and it often has to be measured in probability. What constitutes leaving home and returning home? That's where machine learning techniques, rather than algorithms, come into play.

To work effectively with us, machines have to catch on to our habits. They need to figure out that when we go out to lunch in the middle of the day, we most likely aren't returning home. Some people do return home at noon, though, so this has to be a personal measurement, not a universal rule.

Concerns About Privacy and Control

Giving machines so much information and leeway will inevitably raise concerns. When they gather so much information about us, how much privacy do we give up? Who else is collecting this information, and what are they using it for? Might advertisers be getting it to plan campaigns to influence us? Might governments be using it to learn our habits and track all our moves?

When the machines anticipate our needs, are they influencing us in subtle ways? This is already a concern in social media. Facebook builds feeds that supposedly reflect our interests, and in doing so it controls the information we see. Even without any intent to manipulate us, this leads to our seeing what we already agree with and missing anything that challenges our assumptions. There isn't much to prevent the manipulation of information to push us toward certain preferences or conclusions.

With ambient intelligence, this effect could be far more pervasive than it is today. The machines that we think are carrying out our wishes could herd us without being noticed.

The question of security is important. Many devices on the Internet of Things have almost nonexistent security. (An unknown attacker intermittently knocked many popular websites offline for hours last week, from Twitter to Amazon and Etsy to Netflix, by exploiting the security breach in ordinary household electronic devices such as DVRs, routers and digital closed-circuit cameras.) Devices have default passwords that are easily discovered. In recent months, this has let criminals build huge botnets of devices and use them for denial of service attacks on an unprecedented scale.

If a malicious party could take control of the devices in an ambient intelligent network, the results could be disastrous. Cars could crash, building maintenance systems shut down, daily commerce disintegrate. To be given so high a level of trust, devices will have to be far more secure than the ones of today.

The Convergence of Many Fields

Bringing about wide-scale ambient intelligence involves much more than technology. It will need psychological expertise to effectively anticipate people's needs without feeling intrusive or oppressive. It will involve engineering so that the devices can operate physical systems efficiently and give feedback from them. But mainly it will involve solving non technology-related factors: social, legal and ethical implications of full integration and adaptation of intelligent machines into our everyday life, accessing and controlling every aspect of it.