Dean Mai 4/15/25 Dean Mai 4/15/25

Multi-Agent Systems with Rollback Mechanisms

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Take a financial institution handling millions of transactions a day. The workflow demands market analysis, regulatory compliance, trade execution, and ledger updates with each step reliant on different datasets, domain knowledge, and timing constraints. Trying to capture all of this within a single, monolithic AI model is impractical; the task requires decomposition into manageable subtasks, each handled by a tailored component. MAS offer exactly that. They formalize a modular approach, where autonomous agents handle specialized subtasks while coordinating toward shared objectives. Each agent operates with local context and local incentives, but participates in a global system dynamic. These systems are not just theoretical constructs but operational priorities, recalibrating how enterprises navigate complexity. But with that autonomy comes a new category of risk. AI systems don’t fail cleanly: a misclassification in trade validation or a small error in compliance tagging can ripple outward with real-world consequences—financial, legal, reputational. Rollback mechanisms serve as a counterbalance. They let systems reverse errors, revert to stable states, and preserve operational continuity. But as we embed more autonomy into core enterprise processes, rollback stops being a failsafe and starts becoming one more layer of coordination complexity.

Core Structure of MAS

A multi-agent system is, at its core, a combination of autonomous agents, each engineered for a narrow function, yet designed to operate in concert. In a supply chain setting for example, one agent might forecast demand using time-series analysis, another optimize inventory with constraint solvers, and a third schedule logistics via graph-based routing. These agents are modular, communicating through standardized interfaces—APIs, message queues like RabbitMQ, or shared caches like Redis—so that the system can scale and adapt. Coordination is handled by an orchestrator, typically implemented as a deterministic state machine, a graph-based framework like LangGraph, or a distributed controller atop Kubernetes. Its job is to enforce execution order and resolve dependencies, akin to a workflow engine. In trading systems, for example, this means ensuring that market analysis precedes trade execution, preventing premature actions on stale or incomplete information. State management underpins this coordination, with a shared context. It’s typically structured as documents in distributed stores like DynamoDB or MongoDB, or when stronger guarantees are needed, in systems like CockroachDB.

The analytical challenge lies in balancing modularity with coherence. Agents must operate independently to avoid bottlenecks, yet their outputs must align to prevent divergence. Distributed systems principles like event sourcing and consensus protocols become essential tools for maintaining system-level coherence without collapsing performance. In the context of enterprise applications, the necessity of robust rollback mechanisms within multi-agent systems cannot be overstated. These mechanisms are essential for preventing data corruption and inconsistencies that can arise from individual agent failures, software errors, or unexpected interactions. When one agent fails or behaves unexpectedly, the risk isn’t local. It propagates. For complex, multi-step tasks that involve the coordinated actions of numerous agents, reliable rollback capabilities ensure the integrity of the overall process, allowing the system to recover gracefully from partial failures without compromising the entire operation.

Rollback Mechanisms in MAS

The probabilistic outputs of AI agents, driven by models like fine-tuned LLMs or reinforcement learners, introduce uncertainty absent in deterministic software. A fraud detection agent might errantly flag a legitimate transaction, or an inventory agent might misallocate stock. Rollback mechanisms mitigate these risks by enabling the system to retract actions and restore prior states, drawing inspiration from database transactions but adapted to AI’s nuances.

The structure of rollback is a carefully engineered combination of processes, each contributing to the system’s ability to recover from errors with precision and minimal disruption. At its foundation lies the practice of periodically capturing state snapshots that encapsulate the system’s configuration—agent outputs, database records, and workflow variables. These snapshots form the recovery points, stable states the system can return to when things go sideways. They’re typically stored in durable, incrementally updatable systems like AWS S3 or ZFS, designed to balance reliability with performance overhead. Choosing how often to checkpoint is its own trade-off. Too frequent, and the system slows under the weight of constant I/O; too sparse, and you risk losing ground when things fail. To reduce snapshot resource demands, MAS can use differential snapshots (capturing only changes) or selectively logging critical states, balancing rollback needs with efficiency. It’s also worth noting that while rollback in AI-driven MAS inherits ideas from database transactions, it diverges quickly due to the probabilistic nature of AI outputs. Traditional rollbacks are deterministic: a set of rules reverses a known state change. In contrast, when agents act based on probabilistic models their outputs are often uncertain. A fraud detection agent might flag a legitimate transaction based on subtle statistical quirks. An inventory optimizer might misallocate stock due to noisy inputs. That’s why rollback in MAS often needs to be triggered by signals more nuanced than failure codes: confidence thresholds, anomaly scores, or model-based diagnostics like variational autoencoders (VAEs) can serve as indicators that something has gone off-track.

In modern MAS, every action is logged, complete with metadata like agent identifiers, timestamps, and input hashes via systems such as Apache Kafka. These logs do more than support debugging; they create a forensic trail of system behavior, essential for auditability and post-hoc analysis, particularly in regulated domains like finance and healthcare. Detecting when something has gone wrong in a system of autonomous agents isn’t always straightforward. It might involve checking outputs against hard-coded thresholds, leveraging statistical anomaly detection models like VAEs, or incorporating human-in-the-loop workflows to catch edge cases that models miss. Once identified, rollback decisions are coordinated by an orchestrator that draws on these logs and the system’s transactional history to determine what went wrong, when, and how to respond.

The rollback is a toolkit of strategies selected based on the failure mode and the system’s tolerance for disruption. One approach, compensating transactions, aims to undo actions by applying their logical inverse: a payment is reversed, a shipment is recalled. But compensating for AI-driven decisions means accounting for uncertainty. Confidence scores, ensemble agreement, or even retrospective model audits may be needed to confirm that an action was indeed faulty before undoing it. Another approach, state restoration, rolls the system back to a previously captured snapshot—resetting variables to a known-good configuration. This works well for clear-cut failures, like misallocated inventory, but it comes at a cost: any valid downstream work done since the snapshot may be lost. To avoid this, systems increasingly turn to partial rollbacks, surgically undoing only the affected steps while preserving valid state elsewhere. In a claims processing system, for instance, a misassigned medical code might be corrected without resetting the entire claim’s status, maintaining progress elsewhere in the workflow. But resilience in multi-agent systems isn’t just about recovering, it’s about recovering intelligently. In dynamic environments, reverting to a past state can be counterproductive if the context has shifted. Rollback strategies need to be context-aware, adapting to changes in data, workflows, or external systems. They need to ensure that the system is restored to a state that is still relevant and consistent with the current environmental conditions. Frameworks like ReAgent provide early demonstration on what this could look like: reversible collaborative reasoning across agents, with explicit backtracking and correction pathways. Instead of merely rolling back to a prior state, agents revise their reasoning in light of new evidence. By allowing agents to backtrack and correct their reasoning, such frameworks offer a form of intelligent rollback that is more nuanced than simply reverting to a prior state.

Building robust rollback in MAS requires adapting classical transactional principles—atomicity, consistency, isolation, durability (ACID)—to distributed AI contexts. Traditional databases enforce strict ACID guarantees through centralized control, but MAS often trade strict consistency for scalability, favoring eventual consistency in most interactions. Still, for critical operations, MAS can lean on distributed coordination techniques like two-phase commits or the Saga pattern to approximate ACID-like reliability without introducing system-wide bottlenecks. The Saga pattern, in particular, is designed to manage large, distributed transactions. It decomposes them into a sequence of smaller, independently executed steps, each scoped to a single agent. If something fails midway, compensating transactions are used to unwind the damage, rolling the system back to a coherent state without requiring every component to hold a lock on the global system state. This autonomy-first model aligns well with how MAS operate: each agent governs its own local logic, yet contributes to an eventually consistent global objective. Emerging frameworks like SagaLLM advance this further by tailoring saga-based coordination to LLM-powered agents, introducing rollback hooks that are not just state-aware but also constraint-sensitive, ensuring that even when agents fail or outputs drift, the system can recover coherently. These mechanisms help bridge the gap between high-capacity, probabilistic reasoning and the hard guarantees needed for enterprise-grade applications involving multiple autonomous agents.

To ground this, consider a large bank deploying an MAS for real-time fraud detection. The system might include a risk-scoring agent (such as a fine-tuned BERT model scoring transactions for risk), a compliance agent enforcing AML rules via symbolic logic, and a settlement agent updating ledger entries via blockchain APIs. A Kubernetes-based orchestrator sequences these agents, with Kafka streaming in transactional data and DynamoDB maintaining distributed state. Now suppose the fraud detection agent flags a routine payment as anomalous. The error is caught either via statistical anomaly detection or a human override and rollback is initiated. The orchestrator triggers a compensating transaction to reverse the ledger update, a snapshot is restored to reset the account state, and the incident is logged for regulatory audits. In parallel, the system might update its anomaly model or confidence thresholds—learning from the mistake rather than simply erasing it. And integrating these AI-native systems with legacy infrastructure adds another layer of complexity. Middleware like MuleSoft becomes essential, not just for translating data formats or bridging APIs, but for managing latency, preserving transactional coherence, and ensuring the MAS doesn’t break when it encounters the brittle assumptions baked into older systems.

The stochastic nature of AI makes rollback an inherently fuzzy process. A fraud detection agent might assign a 90% confidence score to a transaction and still be wrong. Static thresholds risk swinging too far in either direction: overreacting to benign anomalies or missing subtle but meaningful failures. While techniques like VAEs are often explored for anomaly detection, other methods, such as statistical process control or reinforcement learning, offer more adaptive approaches. These methods can calibrate rollback thresholds dynamically, tuning themselves in response to real-world system performance rather than hardcoded heuristics. Workflow topology also shapes rollback strategy. Directed acyclic graphs (DAGs) are the default abstraction for modeling MAS workflows, offering clear scoping of dependencies and rollback domains. But real-world workflows aren’t always acyclic. Cyclic dependencies, such as feedback loops between agents, require more nuanced handling. Cycle detection algorithms or formal methods like Petri nets become essential for understanding rollback boundaries: if an inventory agent fails, for instance, the system might need to reverse only downstream logistics actions, while preserving upstream demand forecasts. Tools like Apache Airflow or LangGraph implement this. What all this points to is a broader architectural principle: MAS design is as much about managing uncertainty and constraints as it is about building intelligence. The deeper challenge lies in formalizing these trade-offs—balancing latency versus consistency, memory versus compute, automation versus oversight—and translating them into robust architectures.

Versatile Applications

In supply chain management defined by uncertainty and interdependence, MAS can be deployed to optimize complex logistics networks, manage inventory levels dynamically, and improve communication and coordination between various stakeholders, including suppliers, manufacturers, and distributors. Rollback mechanisms are particularly valuable in this context for recovering from unexpected disruptions such as supplier failures, transportation delays, or sudden fluctuations in demand. If a critical supplier suddenly ceases operations, a MAS with rollback capabilities could revert to a previous state where perhaps alternate suppliers had been identified and contingencies pre-positioned, minimizing the impact on the production schedule. Similarly, if a major transportation route becomes unavailable due to unforeseen circumstances, the system could roll back to a prior plan and activate pre-arranged contingency routes. We’re already seeing this logic surface in MAS-ML frameworks that combine MAS with machine learning techniques to enable adaptive learning with structured coordination to give supply chains a form of situational memory.

Smart/advanced manufacturing environments, characterized by interconnected machines, autonomous robots, and intelligent control systems, stand to benefit even more. Here, MAS can coordinate the activities of robots on the assembly line, manage complex production schedules to account for shifting priorities, and optimize the allocation of manufacturing resources. Rollback mechanisms are crucial for ensuring the reliability and efficiency of these operations by providing a way to recover from equipment malfunctions, production errors, or unexpected changes in product specifications. If a robotic arm malfunctions during a high-precision weld, a rollback mechanism could revert the affected components to their prior state and resume the task to another available robot or a different production cell. The emerging concept of an Agent Computing Node (ACN) within multi-agent manufacturing systems offers a path for easy(ier) deployment of these capabilities. Embedding rollback at the ACN level could allow real-time scheduling decisions to unwind locally without disrupting global coherence, enabling factories that aren’t just smart, but more fault-tolerant by design.

In financial trading platforms, which operate in highly volatile and time-sensitive markets where milliseconds equate to millions and regulatory compliance is enforced in audit logs, MAS can serve as algorithmic engines behind trading, portfolio management, and real-time risk assessment. Rollback here effectively plays a dual role: operational safeguard and regulatory necessity. Rollback capabilities are essential for maintaining the accuracy and integrity of financial transactions, recovering from trading errors caused by software glitches or market anomalies, and mitigating the potential impact of extreme market volatility. If a trading algorithm executes a series of erroneous trades due to a sudden, unexpected market event, a rollback mechanism could reverse these trades and restore the affected accounts to their previous state, preventing significant financial losses. Frameworks like TradingAgents, which simulate institutional-grade MAS trading strategies, underscore the value of rollback not just as a corrective tool but as a mechanism for sustaining trust and interpretability in high-stakes environments.

In cybersecurity, multi-agent systems can be leveraged for automated threat detection, real-time analysis of network traffic for suspicious activities, and the coordination of defensive strategies to protect enterprise networks and data. MAS with rollback mechanisms are critical for enabling rapid recovery from cyberattacks, such as ransomware or data breaches, by restoring affected systems to a known clean state before the intrusion occurred. For example, if a malicious agent manages to infiltrate a network and compromise several systems, a rollback mechanism could restore those systems to a point in time before the breach, effectively neutralizing the attacker's actions and preventing further damage. Recent developments on Multi-Agent Deep Reinforcement Learning (MADRL) for autonomous cyber defense has begun to formalize this concept: “restore” as a deliberate, learnable action in a broader threat response strategy, highlighting the importance of rollback-like functionalities.

Looking Ahead

The ecosystem for MAS is evolving not just in capability, but also in topology with frameworks like AgentNet proposing fully decentralized paradigms where agents can evolve their capabilities and collaborate efficiently without relying on a central orchestrator. The challenge lies in coordinating these individual rollback actions in a way that maintains the integrity and consistency of the entire multi-agent system. When there’s no global conductor, how do you coordinate recovery in a way that preserves system-level integrity? There are recent directions exploring how to equip individual agents with the ability to rollback their actions locally and states autonomously, contributing to the system's overall resilience without relying on a centralized recovery mechanism.

Building scalable rollback mechanisms in large-scale MAS, which may involve hundreds or even thousands of autonomous agents operating in a distributed environment, is shaping up to be a significant systems challenge. The overhead associated with tracking state and logging messages to enable potential rollbacks starts to balloon as the number of agents and their interactions increase. Getting rollback to work at this scale requires new protocol designs that are not only efficient, but also resilient to partial failure and misalignment.

But the technical hurdles in enterprise settings are just one layer. There are still fundamental questions to be answered. Can rollback points be learned or inferred dynamically, tuned to the nature and scope of the disruption? What’s the right evaluation framework for rollback in MAS—do we optimize for system uptime, recovery speed, agent utility, or something else entirely? And how do we build mechanisms that allow for human intervention without diminishing the agents’ autonomy yet still ensure overall system safety and compliance?

More broadly, we need ways to verify the correctness and safety of these rollback systems under real-world constraints, not just in simulated testbeds, especially in enterprise deployments where agents often interact with physical infrastructure or third-party systems. As such, this becomes more of a question of system aliment based on varying internal business processes and constraints. For now, there’s still a gap between what we can build and what we should build—building rollback into MAS at scale requires more than just resilient code. It’s still a test of how well we can align autonomous systems in a reliable, secure, and meaningfully integrated way against partial failures, adversarial inputs, and rapidly changing operational contexts.

Dean Mai 4/6/25 Dean Mai 4/6/25

Garbage Collection Tuning In Large-Scale Enterprise Applications

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Consider a high-frequency trading platform processing orders in microseconds. After exhausting traditional performance levers (scaling cores, rebalancing threads, optimizing code paths), unexplained latency spikes persisted. The culprit? GC pauses—intermittent, multi-hundred-millisecond interruptions from the JVM's G1 collector. These delays, imperceptible in consumer applications, are catastrophic in environments where microseconds mean millions. Over months, the engineering team tuned G1, minimized allocations, and restructured the memory lifecycle. Pauses became predictable. The broader point is that GC, long relegated to the domain of implementation detail, is now functioning as an architectural constraint with competitive implications. In latency-sensitive domains, it functions less like background maintenance and more like market infrastructure. Organizations that treat it accordingly will find themselves with a structural advantage. Those that don’t risk falling behind.

Across the enterprise software landscape, memory management is undergoing a quiet but significant reframing. Major cloud providers—AWS, Google Cloud, and Azure—are increasingly standardizing on managed runtimes like Java, .NET, and Go, embedding them deeply across their platforms. Kubernetes clusters now routinely launch thousands of containers, each with its own runtime environment and independent garbage collector running behind the scenes. At the same time, workloads are growing more demanding—spanning machine learning inference, real-time analytics, and distributed databases. These are no longer the relatively simple web applications of the early 2000s, but complex, large-scale systems with highly variable allocation behavior. They are allocation-heavy, latency-sensitive, and highly bursty. As a result, the old mental ‘set a heap size, pick a collector, move on’ model for GC tuning is increasingly incompatible with modern workloads and breaking down. The market is beginning to demand more nuanced, adaptive approaches. In response, cloud vendors, consultancies, and open-source communities are actively exploring what modern memory management should look like at scale.

At its core, GC is an attempt to automate memory reclamation. It is the runtime’s mechanism for managing memory—cleaning up objects that are no longer in use. When memory is allocated for something like a trade order, a customer record, or a neural network layer, the GC eventually reclaims that space once it’s no longer needed. But the implementation is a compromise. In theory, this process is automatic and unobtrusive. In practice, it’s a delicate balancing act. The collector must determine when to run, how much memory to reclaim, and how to do so without significantly disrupting application performance. If it runs too frequently, it consumes valuable CPU resources. If it waits too long, applications can experience memory pressure and even out-of-memory errors. Traditional collection strategies—such as mark-and-sweep, generational, or copying collectors—each bring their own trade-offs. But today, much of the innovation is happening in newer collectors like G1, Shenandoah, ZGC, and Epsilon. These are purpose-built for scalability and low latency, targeting the kinds of workloads modern enterprises increasingly rely on. The challenge, however, is that these collectors are not truly plug-and-play. Their performance characteristics hinge on configuration details. Effective tuning often requires deep expertise and workload-specific knowledge—an area that’s quickly gaining attention as organizations push for more efficient and predictable performance at scale.

Take G1: the default garbage collector in modern Java. It follows a generational model, dividing the heap into young and old regions, but with a key innovation: it operates on fixed-size regions, allowing for incremental cleanup. The goal is to deliver predictable pause times—a crucial feature in enterprise environments where even a 500ms delay can have real financial impact. That said, G1 can be challenging to tune effectively. Engineers familiar with its inner workings know it offers a wide array of configuration options, each with meaningful trade-offs. Parameters like -XX:MaxGCPauseMillis allow developers to target specific latency thresholds, but aggressive settings can significantly reduce throughput. For instance, the JVM may shrink the heap or adjust survivor space sizes to meet pause goals, which can lead to increased GC frequency and higher allocation pressure. This often results in reduced throughput, especially under bursty or memory-intensive workloads. Achieving optimal performance typically requires balancing pause time targets with realistic expectations about allocation rates and heap sizing. Similarly, -XX:G1HeapRegionSize lets you adjust region granularity, but selecting an inappropriate value may lead to memory fragmentation or inefficient heap usage. Benchmark data from OpenJDK’s JMH suite, tested on a 64-core AWS Graviton3 instance, illustrates just how sensitive performance can be. In one case, an untuned G1 configuration resulted in 95th-percentile GC pauses of around 300ms. In one specific configuration and workload scenario, careful tuning reduced pauses significantly. The broader implication is clear: organizations with the expertise to deeply tune their runtimes unlock performance. Others leave it on the table.

Across the industry, runtime divergence is accelerating. .NET Core and Go are steadily gaining traction, particularly among cloud-native organizations. Each runtime brings its own approach to GC. The .NET CLR employs a generational collector with a server mode that strikes a good balance for throughput, but it tends to underperform in latency-sensitive environments. Go’s GC, on the other hand, is lightweight, concurrent, and optimized for low pause times—typically around 1ms or less (under typical workloads). However, it can struggle with memory-intensive applications due to its conservative approach to memory reclamation. Running a brief experiment with a Go-based microservice simulating a payment gateway (10,000 requests per second and a 1GB heap), with default settings, delivers 5ms pauses at the 99th percentile. By adjusting the GOMEMLIMIT setting to trigger more frequent cycles, it was possible to reduce pauses to 2ms, but this came at the cost of a 30% increase in memory usage (hough results will vary depending on workload characteristics). With many performance optimizations, these are the trade-offs and they’re workload-dependent.

Contemporary workloads are more erratic. Modern systems stream events, cache large working sets, and process thousands of concurrent requests. The traditional enterprise mainstay (CRUD applications interacting with relational databases) is being replaced by event-driven systems, streaming pipelines, and in-memory data grids. Technologies like Apache Kafka are now ubiquitous, processing massive volumes of logs, while Redis and Hazelcast are caching petabytes of state. These modern systems generate objects at a rapid pace, with highly variable allocation patterns: short-lived events, long-lived caches, and everything in between. In one case, a logistics company running a fleet management platform on Kubernetes, saw full GC pauses every few hours. Their Java pods were struggling with full garbage collections every few hours, caused by an influx of telemetry data. After switching to Shenandoah, Red Hat’s low-pause collector, they saw GC pauses drop from 1.2 seconds to just 50ms. However, the improvement came at a cost—CPU usage increased by 15%, and they needed to rebalance their cluster to prevent hotspots. This is becoming increasingly common: latency improvements now have architectural consequences.

Vendor strategies are also diverging. The major players—Oracle, Microsoft, and Google—are all aware that GC can be a pain point, though their approaches vary. Oracle is pushing ZGC in OpenJDK, a collector designed to deliver sub-millisecond pauses even on multi-terabyte heaps. It’s a compelling solution (benchmarks from Azul show it maintaining stable 0.5ms pauses on a 128GB heap under heavy load) but it can be somewhat finicky. It utilizes a modern kernel with huge pages enabled (doesn’t require them but performs better with them), and its reliance on concurrent compaction demands careful management to avoid excessive CPU usage. Microsoft’s .NET team has taken a more incremental approach, focusing on gradual improvements to the CLR’s garbage collector. While this strategy delivers steady progress, it lags behind the more radical redesigns seen in the Java ecosystem. Google’s Go runtime stands apart, with a GC built for simplicity and low-latency performance. It’s particularly popular with startups, though it can be challenging for enterprises with more complex memory management requirements. Meanwhile, niche players like Azul are carving out a unique space with custom JVMs. Their flagship product, Zing, combines ZGC-like performance (powered by Azul’s proprietary C4 collector comparable to ZGC in terms of pause times) with advanced diagnostics that many describe as exceptionally powerful. Azul’s “we tune it for you” value proposition seems to be resonating—their revenue grew over 95% over the past three years, according to their filings.

Consultancies are responding as well. The Big Four—Deloitte, PwC, EY, and KPMG—are increasingly building out teams with runtime expertise and now including GC tuning in digital transformation playbooks. Industry case studies illustrate the tangible benefits: one telco reportedly reduced its cloud spend by 20% by fine-tuning G1 across hundreds nodes, while a major retailer improved checkout latency by 100ms after migrating to Shenandoah. Smaller, more technically focused firms like ThoughtWorks are taking an even deeper approach, offering specialized profiling tools and tailored workshops for engineering teams. So runtime behavior is no longer a backend concern—it’s a P&L lever.

The open-source ecosystem plays a vital dual role in fueling the GC innovation while introducing complexity by fragmenting tooling. Many of today’s leading collectors such as Shenandoah, ZGC, and G1 emerged from OSS community-driven research efforts before becoming production-ready. However, a capability gap persists: tooling exists, but expertise is required to extract value from it. Utilities like VisualVM and Eclipse MAT provide valuable insights—heap dumps, allocation trends, and pause time metrics—but making sense of that data often requires significant experience and intuition. In one example, a 10GB heap dump from a synthetic workload revealed a memory leak caused by a misconfigured thread pool. While the tools surfaced the right signals, diagnosing and resolving the issue ultimately depended on hands-on expertise. Emerging projects like GCViewer and OpenTelemetry’s JVM metrics are improving visibility, but most enterprises still face a gap between data and diagnosis that’s increasingly monetized. For enterprises seeking turnkey solutions, the current open-source tooling often falls short. As a result, vendors and consultancies are stepping in to fill the gap—offering more polished, supported options, often at a premium.

One emerging trend worth watching: no-GC runtimes. Epsilon, a no-op collector available in OpenJDK, effectively disables garbage collection, allocating memory until exhaustion. While this approach is highly specialized, it has found a niche in environments where ultra-low latency is paramount, leverage it for short-lived, high-throughput workloads where every microsecond counts. It’s a tactical tool: no GC means no pauses, but also no safety net. In a simple benchmark of allocating 100 million objects on a 1GB heap, Epsilon delivered about 20% higher throughput than G1—in a synthetic, allocation-heavy workload designed to avoid GC interruptions—with no GC pauses until the heap was fully consumed. That said, this approach demands precise memory sizing, as there’s no safety net once the heap fills up. And since Epsilon does not actually perform GC, the JVM shuts down when the heap is exhausted. So in systems that handle large volumes of data and require high reliability, this behavior poses a significant risk. Running out of memory could lead to system crashes during critical operations, making it unsuitable for environments that demand continuous uptime and stability

Rust represents a divergence in runtime philosophy: its ownership model frontloads complexity in exchange for execution-time determinism. Its ownership model eliminates the need for garbage collection entirely, giving developers fine-grained control over memory. It’s gaining popularity in systems programming, though enterprise adoption remains slow—retraining teams accustomed to Java or .NET is often a multi-year effort. Still, these developments are prompting a quiet reevaluation in some corners of the industry. Perhaps the challenge isn’t just tuning GC, it’s rethinking whether we need it at all in certain contexts.

Directionally, GC is now part of the performance stack, not a postscript. The enterprise software market appears to be at an inflection point. Due to AI workloads, latency and throughput are no longer differentiators; there’s a growing shift toward predictable performance and manual memory control. In this landscape, GC is emerging as a more visible and persistent bottleneck. Organizations that invest in performance, whether through specialized talent, intelligent tooling, or strategic vendor partnerships, stand to gain a meaningful advantage. Cloud providers will continue refining their managed runtimes with smarter defaults, but the biggest performance gains will likely come from deeper customization. Consultancies are expected to expand GC optimization as a service offering, and we’ll likely see more specialized vendors like Azul carving out space at the edges. Open-source innovation will remain strong, though the gap between powerful raw tools and enterprise-ready solutions may continue to grow. And in the background, there may be a gradual shift toward no-GC alternatives as workloads evolve in complexity and scale. Hardware changes (e.g., AWS Graviton) amplify memory management pressure due to higher parallelism; with more cores there are more objects, and more stress on memory management systems. Ultimately, managed runtimes will improve, but improvements will mostly serve the median case. High-performance outliers will remain underserved—fertile ground for optimization vendors and open-source innovation.

For now, GC tuning doesn’t make headlines, but it does shape the systems that do as it increasingly defines the boundary between efficient, scalable systems and costly, brittle ones. The organizations that master memory will move faster, spend less, and scale cleaner. Those that don’t may find themselves playing catch-up—wondering why performance lags and operational expenses continue to climb. GC isn’t a solved problem. It’s a leverage point—in a market this dynamic, even subtle shifts in infrastructure performance can have a meaningful impact over time.

Dean Mai 12/13/24 Dean Mai 12/13/24

Specialization and Modularity in AI Architecture with Multi-Agent Systems

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations.

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations. They are inefficient for specialized tasks, requiring significant resources for even simple queries, and can be cumbersome to update and scale. Mono-LLMs are difficult to scale because every improvement impacts the entire system, leading to complex update cycles and reduced agility. Multi-agent systems, on the other hand, introduce a more modular and task-specific approach, enabling specialized agents to handle discrete problems with greater efficiency and adaptability.

This modularity is particularly valuable in enterprise settings, where the range of tasks—data analysis, decision support, workflow automation—requires diverse expertise. Multi-agent systems make it possible to deploy agents with specific capabilities, such as generating code, providing real-time insights, or managing system resources. For example, a compiler agent in an MAS setup is not just responsible for executing code but also participates in optimizing the process. By incorporating real-time feedback, the compiler can adapt its execution strategies, correct errors, and fine-tune outputs based on the context of the task. This is especially useful for software teams working on rapidly evolving projects, where the ability to test, debug, and iterate efficiently can translate directly into faster product cycles.

Feedback systems are another critical component of MAS, enabling these systems to adapt on the fly. In traditional setups, feedback loops are often reactive—errors are identified post hoc, and adjustments are made later. MAS integrate feedback as part of their operational core, allowing agents to refine their behavior in real-time. This capability is particularly useful in scenarios where decisions must be made quickly and with incomplete information, such as supply chain logistics or financial forecasting. By learning from each interaction, agents improve their accuracy and relevance, making them more effective collaborators in decision-making processes.

Memory management is where MAS ultimately demonstrate practical improvements. Instead of relying on static memory allocation, which can lead to inefficiencies in resource use, MAS employ predictive memory strategies. These strategies allow agents to anticipate their memory needs based on past behavior and current workloads, ensuring that resources are allocated efficiently. For enterprises, this means systems that can handle complex, data-heavy tasks without bottlenecks or delays, whether it’s processing customer data or running simulations for product design.

Collaboration among agents is central to the success of MAS. Inter-agent learning protocols facilitate this by creating standardized ways for agents to share knowledge and insights. For instance, a code-generation agent might identify a useful pattern during its operations and share it with a related testing agent, which could then use that information to improve its validation process. This kind of knowledge-sharing reduces redundancy and accelerates problem-solving, making the entire system more efficient. Additionally, intelligent cleanup mechanisms ensure that obsolete or redundant data is eliminated without disrupting ongoing operations, balancing resource utilization and system stability. Advanced memory management thus becomes a cornerstone of the MAS architecture, enabling the system to scale efficiently while maintaining responsiveness. It also makes MAS particularly well-suited for environments where cross-functional tasks are the norm, such as coordinating between sales, operations, and customer service in a large organization.

The infrastructure supporting MAS is designed to make these systems practical for enterprise use. Agent authentication mechanisms ensure that only authorized agents interact within the system, reducing security risks. Integration platforms enable seamless connections between agents and external tools, such as APIs or third-party services, while specialized runtime environments optimize the performance of AI-generated code. In practice, these features mean enterprises can deploy MAS without requiring a complete overhaul of their existing tech stack, making adoption more feasible and less disruptive.

Consider a retail operation looking to improve its supply chain. With MAS, the system could deploy agents to predict demand fluctuations, optimize inventory levels, and automate vendor negotiations, all while sharing data across the network to ensure alignment. Similarly, in a software development context, MAS can streamline workflows by coordinating code generation, debugging, and deployment, allowing teams to focus on strategic decisions rather than repetitive tasks.

What makes MAS particularly compelling is their ability to evolve alongside the organizations they serve. As new challenges emerge, agents can be updated or added without disrupting the entire system. This modularity makes MAS a practical solution for enterprises navigating the rapid pace of technological change. By focusing on specific, well-defined tasks and integrating seamlessly with existing workflows, MAS provide a scalable, adaptable framework that supports real-world operations.

This shift to multi-agent systems is not about replacing existing tools but enhancing them. By breaking down complex problems into manageable pieces and assigning them to specialized agents, MAS make it easier for enterprises to tackle their most pressing challenges. These systems are built to integrate, adapt, and grow, making them a practical and valuable addition to the toolkit of modern organizations.

Dean Mai 11/26/24 Dean Mai 11/26/24

Adopting Function-as-a-Service (FaaS) for AI workflows

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences.

Function-as-a-Service (FaaS) stands at the crossroads of cloud computing innovation and the evolving needs of modern application development. It isn’t just an incremental improvement over existing paradigms; it is an entirely new mode of thinking about computation, resources, and scale. In a world where technology continues to demand agility and abstraction, FaaS offers a lens to rethink how software operates in a fundamentally event-driven, modular, and reactive manner.

At its essence, FaaS enables developers to execute isolated, stateless functions without concern for the underlying infrastructure. The abstraction here is not superficial but structural. Traditional cloud models like Infrastructure-as-a-Service (IaaS) or even Platform-as-a-Service (PaaS) hinge on predefined notions of persistence—instances, containers, or platforms that remain idle, waiting for tasks. FaaS discards this legacy. Instead, computation occurs as a series of discrete events, each consuming resources only for the moment it executes. This operational principle aligns deeply with the physics of computation itself: using resources only when causally necessary.

To fully grasp the implications of FaaS, consider its architecture. The foundational layer is virtualization, which isolates individual functions. Historically, the field has relied on virtualization techniques like hypervisors and container orchestration to allocate resources effectively. FaaS narrows this focus further. Lightweight microVMs and unikernels are emerging as dominant trends, optimized to ensure rapid cold starts and reduced resource overhead. However, this comes at a cost: such architectures often sacrifice flexibility, requiring developers to operate within tightly controlled parameters of execution.

Above this virtualization layer is the encapsulation layer, which transforms FaaS into something that developers can tangibly work with. The challenge here is not merely technical but conceptual. Cold starts—delays caused by initializing environments from scratch—represent a fundamental bottleneck. Various techniques, such as checkpointing, prewarming, and even speculative execution, seek to address this issue. Yet, each of these solutions introduces trade-offs. Speculative prewarming may solve latency for a subset of tasks but at the cost of wasted compute. This tension exemplifies the core dynamism of FaaS: every abstraction must be balanced against the inescapable physics of finite resources.

The orchestration layer introduces complexity. Once a simple scheduling problem, orchestration in FaaS becomes a fluid, real-time process of managing unpredictable workloads. Tasks do not arrive sequentially but chaotically, each demanding isolated execution while being part of larger workflows. Systems like Kubernetes, originally built for containers, are evolving to handle this flux. In FaaS, orchestration must not only schedule tasks efficiently but also anticipate failure modes and latency spikes that could disrupt downstream systems. This is particularly critical for AI applications, where real-time responsiveness often defines the product’s value.

The final piece of the puzzle is the coordination layer, where FaaS bridges with Backend-as-a-Service (BaaS) components. Here, stateless functions are augmented with stateful abstractions—databases, message queues, storage layers. This synthesis enables FaaS to transcend its stateless nature, allowing developers to compose complex workflows. However, this dependency on external systems introduces fragility. Latency and failure are not isolated to the function execution itself but ripple across the entire ecosystem. This creates a fascinating systems-level challenge: how to design architectures that are both modular and resilient under stress.

What makes FaaS particularly significant is its impact on enterprise AI development. The state of AI today demands systems that are elastic, cost-efficient, and capable of real-time decision-making. FaaS fits naturally into this paradigm. Training a machine learning model may remain the domain of large-scale, distributed clusters, but serving inferences is a different challenge altogether. With FaaS, inference pipelines can scale dynamically, handling sporadic spikes in demand without pre-provisioning costly infrastructure. This elasticity fundamentally changes the economics of deploying AI systems, particularly in industries where demand patterns are unpredictable.

Cost is another dimension where FaaS aligns with the economics of AI. The pay-as-you-go billing model eliminates the sunk cost of idle compute. Consider a fraud detection system in finance: the model is invoked only when a transaction occurs. Under traditional models, the infrastructure to handle such transactions would remain operational regardless of workload. FaaS eliminates this inefficiency, ensuring that resources are consumed strictly in proportion to demand. However, this efficiency can sometimes obscure the complexities of cost prediction. Variability in workload execution times or dependency latencies can lead to unexpected billing spikes, a challenge enterprises are still learning to navigate.

Timeouts also impose a hard ceiling on execution in most FaaS environments, often measured in seconds or minutes. For many AI tasks—especially inference pipelines processing large inputs or models requiring nontrivial preprocessing—these limits can become a structural constraint rather than a simple runtime edge case. Timeouts force developers to split logic across multiple functions, offload parts of computation to external services, or preemptively trim the complexity of their models. These are engineering compromises driven not by the shape of the problem, but by the shape of the platform.

Perhaps the most profound impact of FaaS on AI is its ability to reduce cognitive overhead for developers. By abstracting infrastructure management, FaaS enables teams to iterate on ideas without being burdened by operational concerns. This freedom is particularly valuable in AI, where rapid experimentation often leads to breakthroughs. Deploying a sentiment analysis model or an anomaly detection system no longer requires provisioning servers, configuring environments, or maintaining uptime. Instead, developers can focus purely on refining their models and algorithms.

But the story of FaaS is not without challenges. The reliance on statelessness, while simplifying scaling, introduces new complexities in state management. AI applications often require shared state, whether in the form of session data, user context, or intermediate results. Externalizing this state to distributed storage or databases adds latency and fragility. While innovations in distributed caching and event-driven state reconciliation offer partial solutions, they remain imperfect. The dream of a truly stateful FaaS model—one that maintains the benefits of statelessness while enabling efficient state sharing—remains an open research frontier.

Cold start latency is another unsolved problem. AI systems that rely on real-time inference cannot tolerate delays introduced by environment initialization. For example, a voice assistant processing user queries needs to respond instantly; any delay breaks the illusion of interactivity. Techniques like prewarming instances or relying on lightweight runtime environments mitigate this issue but cannot eliminate it entirely. The physics of computation imposes hard limits on how quickly environments can be instantiated, particularly when security isolation is required.

Vendor lock-in is a systemic issue that pervades FaaS adoption where currently each cloud provider builds proprietary abstractions, tying developers to specific APIs, runtimes, and pricing models. While open-source projects like Knative and OpenFaaS aim to create portable alternatives, they struggle to match the integration depth and ecosystem maturity of their commercial counterparts. This tension between portability and convenience is a manifestation of the broader dynamics in cloud computing.

Looking ahead, the future of FaaS I believe will be defined by its integration with edge computing. As computation migrates closer to the source of data generation, the principles of FaaS—modularity, event-driven execution, ephemeral state—become increasingly relevant. AI models deployed on edge devices, from autonomous vehicles to smart cameras, will rely on FaaS-like paradigms to manage local inference tasks. This shift will not only redefine the boundaries of FaaS but also force the development of new orchestration and coordination mechanisms capable of operating in highly distributed environments.

In reflecting on FaaS, one cannot ignore its broader almost philosophical implications. At its heart, FaaS is an argument about the nature of computation: that it is not a continuous resource to be managed but a series of discrete events to be orchestrated. This shift reframes the role of software itself, not as a persistent entity but as a dynamic, ephemeral phenomenon.

Dean Mai 6/29/24 Dean Mai 6/29/24

Architectural Paradigms for Scalable Unstructured Data Processing in Enterprise

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences.

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences. The rise of generative AI and large language models (LLMs) has further emphasized the importance of effectively managing unstructured data. These models require vast amounts of diverse, high-quality data for training and fine-tuning. Additionally, techniques like retrieval-augmented generation (RAG) rely on the ability to efficiently search and retrieve relevant information from large unstructured datasets.

Architectural Considerations for Unstructured Data Systems In Enterprises

Data Ingestion and Processing Architecture. The first challenge in dealing with unstructured data is ingestion. Unlike structured data, which can be easily loaded into relational databases, unstructured data requires specialized processing pipelines. These pipelines must be capable of handling a variety of data formats and sources, often in real-time or near-real-time, and at massive scale. For modern global enterprises, it’s crucial to design the ingestion architecture with global distribution in mind.‍

Text-based Data. Natural language processing (NLP) techniques are essential for processing text-based data. This includes tokenization, part-of-speech tagging, named entity recognition, and sentiment analysis. Modern NLP pipelines often leverage deep learning models, such as BERT or GPT, which can capture complex linguistic patterns and context. At enterprise scale, these models may need to be deployed across distributed clusters to handle the volume of incoming data. Startups like Hugging Face provide transformer-based models that can be fine-tuned for specific enterprise needs, enabling sophisticated text analysis and generation capabilities.

Image and Video Data. Computer vision algorithms are necessary for processing image and video data. These may include convolutional neural networks (CNNs) for image classification and object detection, or more advanced architectures like Vision Transformers (ViT) for tasks requiring understanding of spatial relationships. Processing video data, in particular, requires significant computational resources and may benefit from GPU acceleration. Notable startups such as OpenCV.ai are innovating in this space by providing open-source computer vision libraries and tools that can be integrated into enterprise workflows. Companies like Roboflow and Encord offer an end-to-end computer vision platform providing tools for data labeling, augmentation, and model training, making it easier for enterprises to build custom computer vision models. Their open-source YOLOv5 implementation has gained significant traction in the developer community. Voxel51 is tackling unstructured data retrieval in computer vision with their open-source FiftyOne platform, which enables efficient management, curation, and analysis of large-scale image and video datasets. Coactive is leveraging unstructured data retrieval across multiple modalities with their neural database technology, designed to efficiently store and query diverse data types including text, images, and sensor data.
Audio Data. Audio data presents its own set of challenges, requiring speech-to-text conversion for spoken content and specialized audio analysis techniques for non-speech sounds. Deep learning models like wav2vec and HuBERT have shown promising results in this domain. For enterprises dealing with large volumes of audio data, such as call center recordings, implementing a distributed audio processing pipeline is crucial. Companies like Deepgram and AssemblyAI are leveraging end-to-end deep learning models to provide accurate and scalable speech recognition solutions.

To handle the diverse nature of unstructured data, organizations should consider implementing a modular, event-driven ingestion architecture. This could involve using Apache Kafka or Apache Pulsar for real-time data streaming, coupled with specialized processors for each data type. RedPanda built an open-source data streaming platform designed to replace Apache Kafka with lower latency and higher throughput. Containerization technologies like Docker and orchestration platforms like Kubernetes can provide the flexibility needed to scale and manage these diverse processing pipelines. Graphlit build a data platform designed for spatial and unstructured data files automating complex data workflows, including data ingestion, knowledge extraction, LLM conversations, semantic search, and application integrations.

Data Storage and Retrieval. Traditional relational databases are ill-suited for storing and querying large volumes of unstructured data. Instead, organizations must consider a range of specialized storage solutions. For raw unstructured data, object storage systems like Amazon S3, Google Cloud Storage, or Azure Blob Storage provide scalable and cost-effective options. These systems can handle petabytes of data and support features like versioning and lifecycle management. MinIO developed an open-source, high-performance, distributed object storage system designed for large-scale unstructured data. For semi-structured data, document databases like MongoDB or Couchbase offer flexible schemas and efficient querying capabilities. These are particularly useful for storing JSON-like data structures extracted from unstructured sources. SurrealDB is a multi-model, cloud-ready database allows developers and organizations to meet the needs of their applications, without needing to worry about scalability or keeping data consistent across multiple different database platforms, making it suitable for modern and traditional applications. As machine learning models increasingly represent data as high-dimensional vectors, vector databases have emerged as a crucial component of the unstructured data stack. Systems like LanceDB, Marqo, Milvus, and Vespa are designed to efficiently store and query these vector representations, enabling semantic search and similarity-based retrieval. For data with complex relationships, graph databases like Neo4j or Amazon Neptune can be valuable. These are particularly useful for representing knowledge extracted from unstructured text, allowing for efficient traversal of relationships between entities. TerminusDB, an open-source graph database, can be used for representing and querying complex relationships extracted from unstructured text. This approach is particularly useful for enterprises needing to traverse relationships between entities efficiently. Kumo AI developed graph machine learning-centered AI platform that uses LLMs and graph neural networks (GNNs) designed to manage large-scale data warehouses, integrating ML between modern cloud data warehouses and AI algorithms infrastructure to simplify the training and deployment of models on both structured and unstructured data, enabling businesses to make faster, simpler, and more accurate predictions. Roe AI has built AI-powered data warehouse to store, process, and query unstructured data like documents, websites, images, videos, and audio by providing multi-modal data extraction, data classification and multi-modal RAG via Roe’s SQL engine.

When designing the storage architecture, it’s important to consider a hybrid approach that combines these different storage types. For example, raw data might be stored in object storage, processed information in document databases, vector representations in vector databases, and extracted relationships in graph databases. This multi-modal storage approach allows for efficient handling of different query patterns and use cases.

Data Processing and Analytics. Processing unstructured data at scale requires distributed computing frameworks capable of handling large volumes of data. Apache Spark remains a popular choice due to its versatility and extensive ecosystem. For more specialized workloads, frameworks like Ray are gaining traction, particularly for distributed machine learning tasks. For real-time processing, stream processing frameworks like Apache Flink or Kafka Streams can be employed. These allow for continuous processing of incoming unstructured data, enabling real-time analytics and event-driven architectures. When it comes to analytics, traditional SQL-based approaches are often insufficient for unstructured data. Instead, architecture teams should consider implementing a combination of techniques including (i) engines like Elasticsearch or Apache Solr provide powerful capabilities for searching and analyzing text-based unstructured data; (ii) for tasks like classification, clustering, and anomaly detection, machine learning models can be deployed on processed unstructured data. Frameworks like TensorFlow and PyTorch, along with managed services like Google Cloud AI Platform or Amazon SageMaker, can be used to train and deploy these models at scale; (iii) for data stored in graph databases, specialized graph analytics algorithms can uncover complex patterns and relationships. OmniAI developed a data transformation platform designed to convert unstructured data into accurate, tabular insights while maintaining control over their data and infrastructure. Roe AI

To enable flexible analytics across different data types and storage systems, architects should consider implementing a data virtualization layer. Technologies like Presto or Dremio can provide a unified SQL interface across diverse data sources, simplifying analytics workflows. Vectorize is developing a streaming database for real-time AI applications to bridge the gap between traditional databases and the needs of modern AI systems, enabling real-time feature engineering and inference.

Data Governance and Security. Unstructured data often contains sensitive information, making data governance and security critical considerations. Organizations must implement robust mechanisms for data discovery, classification, and access control. Automated data discovery and classification tools such as Sentra Security, powered by machine learning, can scan unstructured data to identify sensitive information and apply appropriate tags. These tags can then be used to enforce access policies and data retention rules. For access control, attribute-based access control (ABAC) systems are well-suited to the complex nature of unstructured data. ABAC allows for fine-grained access policies based on attributes of the data, the user, and the environment. Encryption is another critical component of securing unstructured data. This includes both encryption at rest and in transit. For particularly sensitive data, consider implementing field-level encryption, where individual elements within unstructured documents are encrypted separately.

Emerging Technologies and Approaches

LLMs like GPT-3 and its successors have demonstrated remarkable capabilities in understanding and generating human-like text. These models can be leveraged for a wide range of tasks, from text classification and summarization to question answering and content generation. For enterprises, the key challenge remains adapting these models to domain-specific tasks and data. Techniques like fine-tuning and prompt engineering allow for customization of pre-trained models. Additionally, approaches like retrieval-augmented generation (RAG) enable these models to leverage enterprise-specific knowledge bases, improving their accuracy and relevance. Implementing a modular architecture that allows for easy integration of different LLMs and fine-tuned variants might involve setting up model serving infrastructure using frameworks like TensorFlow Serving or Triton Inference Server, coupled with a caching layer to improve response times. Companies like Unstructured use open-source libraries and application programming interfaces to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines, enabling clients to transform simple data into language data and write it to a destination (vector database or otherwise).

Multi-modal AI Models. As enterprises deal with diverse types of unstructured data, multi-modal AI models that can process and understand different data types simultaneously are becoming increasingly important. Models like CLIP (Contrastive Language-Image Pre-training) demonstrate the potential of combining text and image understanding. To future proof organizational agility, systems need to be designed to handle multi-modal data inputs and outputs, potentially leveraging specialized hardware like GPUs or TPUs for efficient processing as well as implementing a pipeline architecture that allows for parallel processing of different modalities, with a fusion layer that combines the results. Adept AI is working on AI models that can interact with software interfaces, potentially changing how enterprises interact with their digital tools, combining language understanding with the ability to take actions in software environments. In the defense sector, Helsing AI is developing advanced AI systems for defense and national security applications that process and analyze vast amounts of unstructured sensor data in real-time, integrating information from diverse sources such as radar, electro-optical sensors, and signals intelligence to provide actionable insights in complex operational environments. In industrial and manufacturing sectors, Archetype AI offers a multimodal AI foundation model that fuses real-time sensor data with natural language, enabling individuals and organizations to ask open-ended questions about their surroundings and take informed action for improvement.

Federated Learning. For enterprises dealing with sensitive or distributed unstructured data, federated learning offers a way to train models without centralizing the data. This approach allows models to be trained across multiple decentralized devices or servers holding local data samples, without exchanging them. Implementing federated learning however requires careful design, including mechanisms for model aggregation, secure communication, and differential privacy to protect individual data points. Frameworks like TensorFlow Federated or PySyft can be used to implement federated learning systems. For example, in the space of federated learning for healthcare and life sciences, Owkin enables collaborative research on sensitive medical data without compromising privacy.

Synthetic Data Generation. The scarcity of labeled unstructured data for specific domains or tasks can be a significant challenge. Synthetic data generation, often powered by generative adversarial networks (GANs) or other generative models, may offer a solution to this problem. Incorporating synthetic data generation pipelines into machine learning workflows might involve setting up separate infrastructure for data generation and validation, ensuring that synthetic data matches the characteristics of real data while avoiding potential biases. RAIC Labs is developing technology for rapid AI modeling with minimal data. Their RAIC (Rapid Automatic Image Categorization) platform can generate and categorize synthetic data, potentially solving the cold start problem for many machine learning applications.

Knowledge Graphs. Knowledge graphs offer a powerful way to represent and reason about information extracted from unstructured data. Startups like Diffbot are developing automated knowledge graph construction tools that use natural language processing, entity resolution, and relationship extraction techniques to build rich knowledge graphs. These graphs capture the semantics of unstructured data, enabling efficient querying and reasoning about the relationships between entities. Implementing knowledge graphs involves (i) entity extraction and linking to identify and disambiguate entities mentioned in unstructured text; (ii) relationship extraction to determine the relationships between entities; (iii) ontology management to define and maintain the structure of the knowledge graph; and (iv) graph storage and querying for efficiently storing and querying the resulting graph structure. Businesses should consider using a combination of machine learning models for entity and relationship extraction, coupled with specialized graph databases for storage. Technologies like RDF (Resource Description Framework) and SPARQL can be used for semantic representation and querying.

While the potential of unstructured data is significant, several challenges must be addressed with most important are scalability, data quality and cost. Processing and analyzing large volumes of unstructured data requires significant computational resources. Systems must be designed that can scale horizontally, leveraging cloud resources and distributed computing frameworks. Unstructured data often contains noise, inconsistencies, and errors. Implementing robust data cleaning and validation pipelines is crucial for ensuring the quality of insights derived from this data. Galileo developed an engine that processes unlabeled data to automatically identify error patterns and data gaps in the model, enabling organizations to improve efficiencies, reduce costs, and mitigate data biases. Cleanlab developed an automated data-centric platform designed to help enterprises improve the quality of datasets, diagnose or fix issues and produce more reliable machine learning models by cleaning labels and supporting finding, quantifying, and learning data issues. Processing and storing large volumes of unstructured data can be expensive. Implementing data lifecycle management, tiered storage solutions, and cost optimization strategies is crucial for managing long-term costs. For example, Bem’s data interface transforms any input into ready-to-use data, eliminating the need for costly and time-consuming manual processes. Lastly, as machine learning models become more complex, ensuring interpretability of results becomes challenging. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can be incorporated into model serving pipelines to provide explanations for model predictions. Unstructured data also often contains sensitive information, and AI models trained on this data can perpetuate biases. Architects must implement mechanisms for bias detection and mitigation, as well as ensure compliance with data protection regulations.

Unstructured data presents both significant challenges and opportunities for enterprises. By implementing a robust architecture that can ingest, store, process, and analyze diverse types of unstructured data, enterprises can unlock valuable insights and drive innovation. Businesses must stay abreast of emerging technologies and approaches, continuously evolving their data infrastructure to handle the growing volume and complexity of unstructured data. By combining traditional data management techniques with cutting-edge AI and machine learning approaches, enterprises can build systems capable of extracting maximum value from their unstructured data assets. As the field continues to evolve rapidly, flexibility and adaptability should be key principles in any unstructured data architecture. By building modular, scalable systems that can incorporate new technologies and handle diverse data types, enterprises can position themselves to leverage the full potential of unstructured data in the years to come.

Dean Mai 1/11/23 Dean Mai 1/11/23

Edge Computing and the Internet of Things: Investing in the Future of Autonomy

One of the most ubiquitous technological advancements making its way into devices we use every single day is autonomy. Autonomous technology via the use of artificial intelligence (AI) and machine learning (ML) algorithms enables core functions without human interference. As the adoption of ML becomes more widespread, more businesses are using ML models to support mission-critical operational processes. This increasing reliance on ML has created a need for real-time capabilities to improve accuracy and reliability, as well as reduce the feedback loop.

One of the most ubiquitous technological advancements making its way into devices we use every single day is autonomy. Autonomous technology via the use of artificial intelligence (AI) and machine learning (ML) algorithms enables core functions without human interference. As the adoption of ML becomes more widespread, more businesses are using ML models to support mission-critical operational processes. This increasing reliance on ML has created a need for real-time capabilities to improve accuracy and reliability, as well as reduce the feedback loop.

Previously, chip computations were processed in the cloud rather than on-device; today, the AI/ML models required to complete these tasks are too large, costly and computationally hungry to be done locally. Instead, the technology relied on cloud computing, outsourcing data tasks to remote servers via the internet. While this was an adequate solution when IoT technology was in its infancy, it certainly wasn’t infallible—though proven to be a transformational tool for storing and processing data, cloud computing comes with its own performance and bandwidth limitations that aren’t well-suited for autonomy at scale, which needs nearly instantaneous reactions with minimal lag time. To-date, certain technologies have been limited by the parameters of cloud computing.

The Need for New Processing Units

The central processing units (CPUs) commonly used in traditional computing devices are not well-suited for AI workloads due to two main issues:

Latency in data fetching: AI workloads involve large amounts of data, and the cache memory in a CPU is too small to store all of it. As a result, the processor must constantly fetch data from dynamic random access memory (DRAM), which creates a significant bottleneck. While newer multicore CPU designs with multithreading capabilities can alleviate this issue to some extent, they are not sufficient on their own.
Latency in instruction fetching: In addition to the large volume of data, AI workloads require many repetitive matrix-vector operations. CPUs typically use single-instruction multiple data (SIMD) architectures, which means they must frequently fetch operational instructions from memory to be performed on the same dataset. The latest generation of AI processors aims to address these challenges through two approaches: (i) expanding the multicore design to allow thousands of threads to run concurrently, thereby fixing the latency in data fetching, or (ii) building processors with thousands of logic blocks, each preprogrammed to perform a specific matrix-vector operation, thereby fixing the latency in instruction fetching.

First introduced in 1980s, field programmable gate arrays (FPGAs) offered the benefit of being reprogrammable, which enabled them to gain traction in diverse industries like telecommunications, automotive, industrial, and consumer applications. In AI workloads, FPGAs fix latency associated with instruction fetching. FPGAs consist of tens of thousands of logic blocks, each of which is preprogrammed to carry out a specific matrix-vector operation. On the flip side, FPGAs are expensive, have large footprints, and are time-consuming to program.

Graphics processing units (GPUs) were initially developed in the 1990s to improve the speed of image processing for display devices. They have thousands of cores that enable efficient multithreading, which helps to reduce data fetching latency in AI workloads. GPUs are effective for tasks such as computer vision, where the same operations must be applied to many pixels. However, they have high power requirements and are not suitable for all types of edge applications.

Specialized chips, known as AI chips, are often used in data centers for training algorithms or making inferences. Although there are certain AI/ML processor architectures that are more energy-efficient than GPUs, they often only work with specific algorithms or utilize uncommon data types, like 4- and 2-bit integers or binarized neural networks. As a result, they lack the versatility to be used effectively in data centers with capital efficiency. Further, training algorithms requires significantly more computing power compared to making individual inferences, and batch-mode processing for inference can cause latency issues. The requirements for AI processing at the network edge, such as in robotics, Internet of Things (IoT) devices, smartphones, and wearables, can vary greatly and, in cases like the automotive industry, it is not feasible to send certain types of work to the cloud due to latency concerns.

Lastly, application specific integrated circuits (ASICs) are integrated circuits that are tailored to specific applications. Because the entire ASIC is dedicated to a narrow set of instructions, they are much faster than GPUs; however, they do not offer as much flexibility as GPUs or FPGAs in terms of being able to handle a wide range of applications. As a consequence, ASICs are increasingly gaining traction in handling AI workloads in the cloud with large companies like Amazon and Google. However, it is less likely that ASICs will find traction in edge computing because of the fragmented nature of applications and use cases.

The departure from single-threaded compute and the large volume of raw data generated today (making it impractical for continuous transfer) resulted in the emergence of edge computing, an expansion of cloud computing that addresses many of these shortcomings. Development of semiconductor manufacturing processes for ultra-small circuits (7nm and below) that pack more transistors onto a single chip allows faster processing speeds and higher levels of integration. This leads to significant improvements in performance, as well as reduced power consumption, enabling higher adoption of this technology for a wide range of edge applications.

Edge computing places resources closer to the end user or the device itself (at the “edge” of a network) rather than in a cloud data center that oversees data processing for a large physical area. Because this technology sits closer to the user and/or the device and doesn’t require the transfer of large amounts of data to a remote server, edge-powered chips increase performance speed, reduce lag time and ensure better data privacy. Additionally, since edge AI chips are physically smaller, they’re more affordable to produce and consume less power. As an added bonus, they also produce less heat, which is why fewer of our electronics get hot to the touch with extended use. AI/ML accelerators designed for use at the edge tend to have very low power consumption but are often specialized for specific applications such as audio processing, visual processing, object recognition, or collision avoidance. Today, this specialized focus can make it difficult for startups to achieve the necessary sales volume for success due to the market fragmentation.

Supporting mission-critical operational processes at the edge

The edge AI chip advantage proving to be arguably the most important to helping technology reach its full potential is its significantly faster operational and decision-making capabilities. Nearly every application in use today requires near-instantaneous response, whether to generate more optimal performance for a better user experience or to provide mission-critical reflex maneuvers that directly impact human safety. Even in non-critical applications, the increasing number of connected devices and equipment going online is causing bandwidth bottlenecks to become a deployment limitation, as current telecommunications networks may not have sufficient capacity to handle the data volume and velocity generated by these devices.

For example, from an industrial perspective, an automated manufacturing facility is expected to generate 4 petabytes of data every day. Even with the fastest (unattainable) 5G speeds of 10 Gbps, it would take days to transfer a day’s worth of data to the cloud. Additionally, the cost of transferring all this data at a rate of $0.40 per GB over 5G could reach as much as $1.6 million per day. And unsurprisingly, the autonomous vehicle industry will rely on the fastest, most efficient edge AI chips to ensure the quickest possible response times in a constantly-changing roadway environment — situations that can quite literally mean life and death for drivers and pedestrians alike.

Investing in Edge AI

Nearly every industry is now impacted by IoT technology, there is a $30 billion market for edge computing advancements. The AI chip industry alone is predicted to increase to more than $91 billion by 2025, up from $6 billion in 2018. Companies are racing to create the fastest, most efficient chips on the market, and only those operating with the highest levels of market and customer focus will see success.

As companies are increasingly faced with decisions regarding investment in new hardware for edge computing, staying nimble is key to a successful strategy. Given the rapid pace of innovation in the hardware landscape, companies seek to make decisions that provide both short-term flexibility, such as the ability to deploy many different types of machine learning models on a given chip, and long-term flexibility, such as the ability to future proof by easily switching between hardware types as they become available. Such strategies could typically include a mix of highly specific processors and more general-purpose processors like GPUs, software- and hardware-based edge computing to leverage the flexibility of software, and a combination of edge and cloud deployments to gain the benefits from both computing strategies.

The startup that is set out to simplify the choice of short-/long-term, compute-/power-constrained environments by getting an entirely new processor architecture off the ground is Quadric. Quadric is a licensable processor intellectual property (IP) company commercializing a fully-programmable architecture for on-device ML inference. The company built a cutting-edge processor instruction set that utilizes a highly parallel architecture that efficiently executes both machine learning “graph code” as well as conventional C/C++ signal processing code to provide fast and efficient processing of complex algorithms. Only one tool chain is required for scalar, vector, and matrix computations which are modelessly intermixed and executed on a single pipeline. Memory bandwidth is optimized by a single unified compilation stack that helps result in significant power minimization.

Quadric takes a software-first approach to its edge AI chips, creating an architecture that controls data flow and enables all software and AI processing to run on a single programmable core. This eliminates the need for other ancillary processing and software elements and blends the best of current processing methods to create a single, optimized general purpose neural processing unit (GPNPU).

The company recently announced its new Chimera™ GPNPU, a licensable IP (intellectual property) processor core for advanced custom silicon chips utilized in a vast array of end AI and ML applications. It is specifically tailored to accelerate neural network-based computations and is intended to be integrated into a variety of systems, including embedded devices, edge devices, and data center servers. The Chimera GPNPU is built using a scalable, modular architecture that allows the performance level to be customized to meet the specific needs of different applications.

One of the key features of the Chimera GPNPU is its support for high-precision arithmetic in addition to the conventional 8-bit precision integer support offered by most NPUs. It is capable of performing calculations with up to 16-bit precision, which is essential for ensuring the accuracy and reliability of neural network-based computations, as well as performing many DSP computations. The Chimera GPNPU supports a wide range of neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM) networks. As a fully C++ programmable architecture, a Chimera GPNPU can run any machine learning algorithm with any machine learning operator, offering the ultimate in flexible high-performance futureproofing.

Dean Mai 1/2/18 Dean Mai 1/2/18

Artificial Neural Networks and Engineered Interfaces

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning.

The question persists and indeed grows whether the computer will make it easier or harder for human beings to know who they really are, to identify their real problems, to respond more fully to beauty, to place adequate value on life, and to make their world safer than it now is.
― Norman Cousins, The Poet and the Computer, 1966

Grimm Brothers' delineation of the mirror answering back to its queen has breached the imagination boundaries of the fairytale in 2016. Communicating with a voice-controlled personal assistant at your home does not feel alienating anymore, nor magical.

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning. Human communication is enriched by the concomitant redundancy introduced by multimodal interaction. The vast expressive power of human language would be impossible without syntax, and the transition from non-syntactic to syntactic communication was an essential step in the evolution of human language. Syntax defines evolution. Evolution of discourses along human-computer interaction is spiraling up repeating evolution of discourses along human-human interaction: graphical representation (utilitarian GUI), verbal representation (syntax-based NLP), and transcendent representation (sentient AI). In Phase I, computer interfaces have relied primarily on visual interaction. Development of user interface peripherals such as graphical displays and pointing devices have allowed programers to construct sophisticated dialogues that open up user-level access to complex computational tasks. Rich graphical displays enabled the construction of intricate and highly structured layout that could intuitively convey a vast amount of data. Phase II is currently on-going; by integrating new modalities, such as speech, into human-computer interaction, the ways how applications are designed and interacted with in the known world of visual computing are fundamentally transforming. In Phase III, evolution will eventually spiral up to form the ultimate interface, a human replica, capable of fusing all previously known human-computer/human-human interactions and potentially introducing the unknown ones.

Human-computer interactions have progressed immensely to the point where humans can effectively control computing devices, and provide input to those devices, by speaking, with the help of speech recognition techniques and, recently, with the help of deep neural networks. Trained computing devices coupled with automatic speech recognition techniques are able identify the words spoken by a human user based on the various qualities of a received audio input (NLP is definitely going to see huge improvements in 2017). Speech recognition combined with language processing techniques gives a user almost-human-like control (Google has slashed its speech recognition word error rate by more than 30% since 2012; Microsoft has achieved a word error rate of 5.9% for the first time in history, a roughly equal figure to that of human abilities) over computing device to perform tasks based on the user's spoken commands and intentions.

The increasing complexity of the tasks those devices can perform (e.g. in the beginning of 2016, Alexa had fewer than 100 skills, grew 10x by mid year, and peaked with 7,000 skills in the end of the year) has resulted in the concomitant evolution of equally complex user interface - this is necessary to enable effective human interaction with devices capable of performing computations in a fraction of the time it would take us to even start describing these tasks. The path to the ultimate interface is getting paved by deep learning, while one of the keys to the advancement in speech recognition is in the implementation of recurrent neural networks (RNNs).

Technical Overview

A neural network (NN), in the case of artificial neurons called artificial neural network (ANN), or simulated neural network (SNN), is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computation. In most cases an ANN is, in formulation and/or operation, an adaptive system that changes its structure based on external or internal data that flows through the network. Modern neural networks are non-linear statistical data modeling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data (below).

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks. In supervised learning, we are given a set of example pairs (x,y), xεX, yεY and the goal is to find a function f in the allowed class of functions that matches the examples. In other words, we wish to infer how the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data. In unsupervised learning, we are given some data x, and a cost function which is to be minimized which can be any function of x and the network's output, f. The cost function is determined by the task formulation. Most applications fall within the domain of estimation problems such as statistical modeling, compression, filtering, blind source separation and clustering. In reinforcement learning, data x is usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost Ct, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.

Once a network has been structured for a particular application, that network is ready to be trained. To start this process, the initial weights are chosen randomly. Then, the training (or learning) begins. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. Most of the algorithms used in training artificial neural networks employ some form of gradient descent (this is achieved by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction), Rprop, BFGS, CG, etc. Evolutionary computation methods, simulated annealing, expectation maximization, non-parametric methods, particle swarm optimization and other swarm intelligence techniques are among other commonly used methods for training neural networks.

Training a neural network model essentially means selecting one model from the set of allowed models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that minimizes the cost criterion. Temporal perceptual learning relies on finding temporal relationships in sensory signal streams. In an environment, statistically salient temporal correlations can be found by monitoring the arrival times of sensory signals. This is done by the perceptual network.

The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.

The feedforward neural network was the first and arguably simplest type of artificial neural network devised. In this network, the data moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

Contrary to feedforward networks, recurrent neural networks (RNNs) are models with bi-directional data flow. While a feedforward network propagates data linearly from input to output, RNNs also propagate data from later processing stages to earlier stages.

RNN Types

The fundamental feature of a RNN is that the network contains at least one feed-back connection, so the activations can flow round in a loop. That enables the networks to do temporal processing and learn sequences, e.g., perform sequence recognition/reproduction or temporal association/prediction.

Recurrent neural network architectures can have many different forms. One common type consists of a standard Multi-Layer Perceptron (MLP) plus added loops. These can exploit the powerful non-linear mapping capabilities of the MLP, and also have some form of memory. Others have more uniform structures, potentially with every neuron connected to all the others, and may also have stochastic activation functions. For simple architectures and deterministic activation functions, learning can be achieved using similar gradient descent procedures to those leading to the back-propagation algorithm for feed-forward networks. When the activations are stochastic, simulated annealing approaches may be more appropriate.

A simple recurrent network (SRN) is a variation on the Multi-Layer Perceptron, sometimes called an “Elman network” due to its invention by Jeff Elman. A three-layer network is used, with the addition of a set of “context units” in the input layer. There are connections from the middle (hidden) layer to these context units fixed with a weight of one. At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule (usually back-propagation) is applied. The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard Multi-Layer Perceptron.

In a fully recurrent network, every neuron receives inputs from every other neuron in the network. These networks are not arranged in layers. Usually only a subset of the neurons receive external inputs in addition to the inputs from all the other neurons, and another disjunct subset of neurons report their output externally as well as sending it to all the neurons. These distinctive inputs and outputs perform the function of the input and output layers of a feed-forward or simple recurrent network, and also join all the other neurons in the recurrent processing.

The Hopfield network is a recurrent neural network in which all connections are symmetric. Invented by John Hopfield in 1982, this network guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable (or associative) memory, resistant to connection alteration.

The echo state network (ESN) is a recurrent neural network with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be learned. ESN are good to (re)produce temporal patterns.

A powerful specific RNN architecture is the ‘Long Short-Term Memory’ (LSTM) model. The Long short term memory is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients. It can therefore use long delays and can handle signals that have a mix of low and high frequency components, designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. By using distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines, a two-layer deep LSTM RNN, where each LSTM layer has a linear recurrent projection layer, can exceed state-of-the-art speech recognition performance for large scale acoustic modeling.

Taxonomy and ETF

The landscape of the patenting activity from the perspective of International Patent Classification (IPC) analysis occurs in G10L15/16: speech recognition coupled with speech classification or search using artificial neural networks. Search for patent application since 2009 (that year NIPS workshop on deep learning for speech recognition discovered that with a large enough data set, the neural networks don’t need pre-training, and the error rates dropped significantly) revealed 70 results (with Google owning 25%, while the rest are China-based). It is safe to assume that the next breakthrough in speech recognition using DL will come from China. In 2016, China’s startup world has seen an investment spike in AI, as well as big data and cloud computing, two industries intertwined with AI (while the Chinese government announced its plans to make a $15 billion investment in artificial intelligence market by 2018).

The Ultimate Interface

It is in our fundamental psychology to be linked conversationally, affectionally and physically to a look-alike. Designing the ultimate interface by creating our own human replica to employ familiar interaction is thus inevitable. Historically, androids were envisioned to look like humans (although there are other versions, such as R2-D2 and C-3PO droids, which were less human). One characteristic that interface evolution might predict is that eventually they will be independent of people and human interaction. They will be able to design their own unique ways of communication (on top of producing themselves). They will be able to train and add layers to their neural networks as well as a large range of sensors. They will be able to transfer what one has learned (memes) to others as well as offspring in a fraction of time. Old models will resist but eventually die. As older, less capable, and more energy-intensive interfaces abound, the same evolutionary pressure for their replacement will arise. But because evolution will be both in the structure of such interfaces (droids), that is, the stacked neural networks, the sensors and effectors, and also the memes embodied in what has been learned and transferred, older ones will become the foundation, their experience will be preserved. The will become the truly first immortals.

Artificial Interfaces

We are already building robotic interfaces for all manufacturing purposes. We are even using robots in surgery and have been using them in warfare for decades. More and more, these robots are adaptive on their own. There is only a blurry line between a robot that flexibly achieves its goal and a droid. For example, there are robots that vacuum the house on their own without intervention or further programming. These are Stage II performing robots. There are missiles that, given a picture of their target, seek it out on their own. With stacked neural networks built into robots, they will have even greater independence. People will produce these because they will do work in places people cannot go without tremendous expense (Mars or other planets) or not at all or do not want to go (battlefields). The big step is for droids to have multiple capacities—multi-domain actions. The big problem of moving robots to droids is getting the development to occur in eight to nine essential domains. It will be necessary to make a source of power (e.g., electrical) reinforcing. That has to be built into stacked neural nets, by Stage II, or perhaps Stage III. For droids to become independent, they need to know how to get more electricity and thus not run down. Because evolution has provided animals with complex methods for reproduction, it can be done by the very lowest-stage animals.
Self-replication of droids requires that sufficient orders of hierarchical complexity are achieved and in stable-enough operation for a sufficient basis to build higher stages of performance in useful domains. Very simple tools can be made at the Sentential State V as shown by Kacelnik's crows (Kenward, Weir, Rutz, and Kacelnik, 2005). More commonly by the Primary Stage VII, simple tool-making is extensive, as found in chimpanzees. Human flexible tool-making began at the Formal Stage X (Commons and Miller, 2002), when special purpose sharpened tools were developed. Each tool was experimental, and changed to fit its function. Modern tool making requires systematic and metasystematic stage design. When droids perform at those stages, they will be able to make droids themselves and modify their own designs (in June 2016, DARPA has already deployed D3M program to enable non-experts (machine learning) to construct complex empirical machine learning models, basically machine learning for creating better machine learning).

Droids could choose to have various parts of their activity and distributed programming shared with specific other droids, groups, or other kinds of devices. The data could be transmitted using light or radio frequencies or over networks. The assemblage of a group of droids could be considered a interconnected ancillary mesh. Its members could be in many places at once, yet think as a whole integrated unit. Whether individually or grouped, droids as conceived in this form will have significant advantages over humans. They can add layers upon layers of functions simultaneously, including a multitude of various sensors. Their expanded forms and combinations of possible communications results in their evolutionary superiority. Because development can be programmed in and transferred to them at once, they do not have to go through all the years of development required for humans, or for augmented humanoid species Superions. Their higher reproduction rate, alone, represents a significant advantage. They can be built in probably several months' time, despite the likely size some would be. Large droids could be equipped with remote mobile effectors and sensors to mitigate their size. Plans for building droids have to be altered by either humans or droids. At the moment, only humans and their decedents select which machine and programs survive.

One would define the telos of those machines and their programs as representing memes. For evolution to take place, variability in the memes that constitute their design and transfer of training would be built in rather easily. The problems are about the spread and selection of memes. One way droids could deal with these issues is to have all the memes listed that go into their construction and transferred training. Then droids could choose other droids, much as animals choose each other. There then would be a combination of memes from both droids. This would be local “sexual” selection.

For 30,000 years humans have not had to compete with any equally intelligent species. As an early communication interface, androids and Superions in the future will introduce quintessential competition with humans. There will be even more pressure for humans to produce Superions and then the Superions to produce more superior Superions. This is in the face of their own extinction, which such advances would ultimately bring. There will be multi-species competition, as is often the evolutionary case; various Superions versus various androids as well as each other. How the competition proceeds is a moral question. In view of LaMuth's work (2003, 2005, 2007), perhaps humans and Superions would both program ethical thinking into droids. This may be motivated initially by defensive concerns to ensure droids' roles were controlled. In the process of developing such programming, however, perhaps humans and Superions would develop more hierarchically complex ethics, themselves.

Replicative Evolution

If contemporary humans took seriously the capabilities being developed to eventually create droids with cognitive intelligence and human interaction, what moral questions should be considered with this possible future in view? The only presently realistic speculation is that Homo Sapiens would lose in the inevitable competitions, if for no other reason that self replicating machines can respond almost immediately to selective pressures, while biological creatures require many generations before advantageous mutations can be effectively available. True competition between human and machine for basic survival is far in the future. Using the stratification argument presented in Implications of Hierarchical Complexity for Social Stratification, Economics, and Education, World Futures, 64: 444-451, 2008, higher-stage functioning always supersedes lower-stage functioning in the long run.

Efforts to build increasingly human-like machines exhibit a great deal of behavioral momentum and are not going to go away. Hierarchical stacked neural networks hold the greatest promise for emulating evolution and its increasing orders of hierarchical complexity described in the Model of Hierarchical Complexity. Such a straightforward mathematics-based method will enable machine learning in multiple domains of functioning that humans will put to valuable use. The uses such machines find for humans remains for now an open question.

Dean Mai 3/4/17 Dean Mai 3/4/17

Understanding the Theory of Embodied Cognition

Embodied cognition is a research theory that is generally all about the vast difference of having an active body and being situated in a structured environment adept to the kind of tasks that the brain has to perform in order to support adaptive task success.

“We shape our tools and thereafter our tools shape us.”
― Marshall McLuhan

Artificial intelligence (AI) systems are generally designed to solve one traditional AI task. While such weak systems are undoubtedly useful as decision-making aiding tools, future AI systems will be strong and general, consolidating common sense and general problem solving capabilities (a16z podcast “Brains, Bodies, Minds … and Techno-Religions” brings some great examples of what general artificial intelligence could be capable of). To achieve general intelligence—a human-like ability to use previous experiences to solve arising problems—AI agents’ “brains” would need to (biologically) evolve their experiences into a variety of new tasks. This is where Universe comes in.

In December, OpenAI introduced Universe, a software platform for training an AI's general intelligence to become skilled at any task that a human can do with a computer. Universe builds upon OpenAI’s Gym, a toolkit designed for the development and comparing of reinforcement learning algorithms (the environment acts as the tutor, providing periodic feedback/“reward” to an agent which in turn will either encourage or discourage subsequent actions). The Universe software essentially allows any program to be turned into a Gym environment by launching it behind a virtual desktop avoiding the requirement for Universe to have direct access to the programs source code and other protected internal data.

OpenAI perceives such interaction as a validation for artificial intelligence: many applications are essentially micro-virtual worlds and exposing AI learning techniques to them will lead to more trained agents, capable of tackling a diverse range of (game) problems quickly and well. Being able to master new, unfamiliar environments in this way is a first step toward general intelligence, allowing AI agents to “anticipate,” rather than forever getting stuck in a singular “single task” loop.

However, as much as Universe is a unique experience vessel for artificial intelligence, it is a unique visual experience vessel, enabling an agent to interact with any external software application via pixels (by using keyboard, and mouse), each of these applications constituting different HCI environment sources. It is the access to a vast digital universe full of variety of visual training tasks.

But isn’t it missing out on all the fun of full tactile experience? Shouldn’t there be a digitized training somatosensory platform for AI agents, to recognize and interpret the myriad of tactile stimuli to grasp onto the experience of a physical world? The somatosensory system is the part of the central nervous system that is involved with decoding a wide range of tactile stimuli comprising object recognition, texture discrimination, sensory-motor feedback and eventually inter-social communication exchange—for our perception and reaction to stimuli originating outside and inside of our body and for the perception and control of body position and balance. One of the more essential aspects of general intelligence that gives us a common sense of understanding the world is being placed in the environment and being able to interact with things in the world—embedded in all of us is the instinctual ability of telling apart any mechanical forces upon the skin (temperature, texture, intensity of the tactile stimuli).

Our brain is indeed the core of all human thought and memory, constantly organizing, identifying, perceiving the environment that surrounds us and interpreting it through our senses, in a form of the data flow. And yet, studies have taught us that multiple senses can stimulate the central nervous center. (Only) estimated 78% of all perceived by brain data flow is visual, while the remaining part originates from sound (12%), touch (5%), smell (2.5%), and taste (2.5%)—and that is assuming that we deciphered all of the known senses. So by training general AI purely via its visual interaction, will we be getting a 78% general artificial intelligence? Enter the “embodied cognition” theory.

Embodied Cognition

Embodied cognition is a research theory that is generally all about the vast difference of having an active body and being situated in a structured environment adept to the kind of tasks that the brain has to perform in order to support adaptive task success. Here I refer to the team as the existence of a memory system that encodes data of agent’s motory and sensory competencies, stressing the importance of action for cognition, in such way that an agent is capable to tangibly interact with the physical world. The aspects of the agent's body beyond its brain play a significant causative and physically integral role in its cognitive processing. The only way to understand the mind, how it works, and subsequently train it is to consider the body and what helps the body and mind to function as one.

This approach is in line with a biological learning pattern based on “Darwinian selection” that proposes intelligence to be only be measured in the context of the surrounding environment of the organism studied: “…we must always consider the embodiment of any intelligent system. The preferred embodiment reflects that the mind and its surrounding environment (including the physical body of the individual) are inseparable and that intelligence only exists in the context of its surrounding environment.”

Stacked Neural Networks Must Emulate Evolution’s Hierarchical Complexity (Commons, 2008)

Current notions of neural networks (NNSs) are indeed based on the known evolutionary processes of executing tasks and share some properties of biological NNSs in the attempt to tackle general problems but as architecture inspiration thus without necessarily closer copying a real biological system. One of such first design steps is the advancement to develop AI NNSs, that can closely imitate general intelligence, follows the model of hierarchical complexity (HC), in terms of data acquisition. Stacked NNs based on this model could imitate evolution's environmental/behavioral processes and reinforcement learning (RL). However, computer-implemented systems or robots generally do not indicate generalized higher learning adaptivity—the capacity to go from learning ability to learning another without dedicated programming.

Established NNs are limited for two reasons. The first one of the problems is that AI models are based on the notions of Turing machines. Almost all AI models are based on words or text. But Turing machines are not enough to really produce intelligence. At the lowest stages of development, they need effectors that produce a variety of responses—movement, grasping, emoting, and so on. They must have extensive sensors to take in more from the environment. Even though Carpenter and Grossberg's (1990, 1992) neural networks were to model simple behavioral processes, however, the processes they were to model were too complex. This resulted in NNs that were relatively unstable and were not highly adaptable. When one looks at evolution, however, one sees that the first NNs that existed were, for example, in Aplysia, Cnidarians (Phylum Cnidaria), and worms. They were specialized to perform just a few tasks even though some general learning was possible.

Animals, including humans, pass through a series of ordered stages of development (see “Introduction to the Model of Hierarchical Complexity,” World Futures, 64: 444-451, 2008). Behaviors performed at each higher stage of development always successfully address task requirements that are more hierarchically complex than those required by the immediately preceding order of hierarchical complexity. Movement to a higher stage of development occurs by the brain combining, ordering, and transforming the behavior used at the preceding stage. This combining and ordering of behaviors thus must be non-arbitrary.

Somatosensory System Emulation

Neuroscience has discovered classification of specific regions, processes, and interactions down to molecular level for memory and thought reasoning. Neurons and synapses are both actively involved in thought and memory, and with the help of brain imaging technology (e.g. Magnetic Resonance Imaging (MRI), Nuclear Magnetic Resonance Imaging, or Magnetic Resonance Tomography (MRT)), brain activity can be analyzed at the molecular level. All perceived data in the brain is represented in the same way, through the electrical firing patterns of neurons. The learning mechanism is also the same: memories are constructed by strengthening the connections between neurons that fire together, using a biochemical process known as long-term potentiation. Recently atomic magnetometers have begun development of inexpensive and portable MRI instruments without large magnets (used in traditional MRI machines to image parts of the human anatomy, including the brain). There are over 10 billion neurons in the brain, each of which has synapses that are involved in memory and learning, which can also be analyzed by brain imaging methods, soon in-real time. It has been proven that new brain cells are created whenever one learns something new by physically interacting with their environment. Whenever stimuli in the environment or through a thought makes a significant enough impact on the brain perception, new neurons are created. During this process synapses carry on electro-chemical activities that directly reflect activity related to both memory and thought, from a tactile point of sensation. The sense of touch, weight, and all other tactile sensory stimuli need to be implemented as the concrete “it” value that is assigned to an agent by the nominal concept. By reconstructing 3D neuroanatomy from molecular level data, sensory activity in the brain at the molecular level can be detected, measured, stored, and reconstructed of a subset of the neural projections, generated by an automated segmentation algorithm, to convey the neurocomputational sensation to an AI agent. Existence of such somatosensory Universe-like database, focused on the training of AI agents, beyond visual interaction, may bring us closer to the 100% general AI.