Dean Mai 4/15/25 Dean Mai 4/15/25

Multi-Agent Systems with Rollback Mechanisms

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Take a financial institution handling millions of transactions a day. The workflow demands market analysis, regulatory compliance, trade execution, and ledger updates with each step reliant on different datasets, domain knowledge, and timing constraints. Trying to capture all of this within a single, monolithic AI model is impractical; the task requires decomposition into manageable subtasks, each handled by a tailored component. MAS offer exactly that. They formalize a modular approach, where autonomous agents handle specialized subtasks while coordinating toward shared objectives. Each agent operates with local context and local incentives, but participates in a global system dynamic. These systems are not just theoretical constructs but operational priorities, recalibrating how enterprises navigate complexity. But with that autonomy comes a new category of risk. AI systems don’t fail cleanly: a misclassification in trade validation or a small error in compliance tagging can ripple outward with real-world consequences—financial, legal, reputational. Rollback mechanisms serve as a counterbalance. They let systems reverse errors, revert to stable states, and preserve operational continuity. But as we embed more autonomy into core enterprise processes, rollback stops being a failsafe and starts becoming one more layer of coordination complexity.

Core Structure of MAS

A multi-agent system is, at its core, a combination of autonomous agents, each engineered for a narrow function, yet designed to operate in concert. In a supply chain setting for example, one agent might forecast demand using time-series analysis, another optimize inventory with constraint solvers, and a third schedule logistics via graph-based routing. These agents are modular, communicating through standardized interfaces—APIs, message queues like RabbitMQ, or shared caches like Redis—so that the system can scale and adapt. Coordination is handled by an orchestrator, typically implemented as a deterministic state machine, a graph-based framework like LangGraph, or a distributed controller atop Kubernetes. Its job is to enforce execution order and resolve dependencies, akin to a workflow engine. In trading systems, for example, this means ensuring that market analysis precedes trade execution, preventing premature actions on stale or incomplete information. State management underpins this coordination, with a shared context. It’s typically structured as documents in distributed stores like DynamoDB or MongoDB, or when stronger guarantees are needed, in systems like CockroachDB.

The analytical challenge lies in balancing modularity with coherence. Agents must operate independently to avoid bottlenecks, yet their outputs must align to prevent divergence. Distributed systems principles like event sourcing and consensus protocols become essential tools for maintaining system-level coherence without collapsing performance. In the context of enterprise applications, the necessity of robust rollback mechanisms within multi-agent systems cannot be overstated. These mechanisms are essential for preventing data corruption and inconsistencies that can arise from individual agent failures, software errors, or unexpected interactions. When one agent fails or behaves unexpectedly, the risk isn’t local. It propagates. For complex, multi-step tasks that involve the coordinated actions of numerous agents, reliable rollback capabilities ensure the integrity of the overall process, allowing the system to recover gracefully from partial failures without compromising the entire operation.

Rollback Mechanisms in MAS

The probabilistic outputs of AI agents, driven by models like fine-tuned LLMs or reinforcement learners, introduce uncertainty absent in deterministic software. A fraud detection agent might errantly flag a legitimate transaction, or an inventory agent might misallocate stock. Rollback mechanisms mitigate these risks by enabling the system to retract actions and restore prior states, drawing inspiration from database transactions but adapted to AI’s nuances.

The structure of rollback is a carefully engineered combination of processes, each contributing to the system’s ability to recover from errors with precision and minimal disruption. At its foundation lies the practice of periodically capturing state snapshots that encapsulate the system’s configuration—agent outputs, database records, and workflow variables. These snapshots form the recovery points, stable states the system can return to when things go sideways. They’re typically stored in durable, incrementally updatable systems like AWS S3 or ZFS, designed to balance reliability with performance overhead. Choosing how often to checkpoint is its own trade-off. Too frequent, and the system slows under the weight of constant I/O; too sparse, and you risk losing ground when things fail. To reduce snapshot resource demands, MAS can use differential snapshots (capturing only changes) or selectively logging critical states, balancing rollback needs with efficiency. It’s also worth noting that while rollback in AI-driven MAS inherits ideas from database transactions, it diverges quickly due to the probabilistic nature of AI outputs. Traditional rollbacks are deterministic: a set of rules reverses a known state change. In contrast, when agents act based on probabilistic models their outputs are often uncertain. A fraud detection agent might flag a legitimate transaction based on subtle statistical quirks. An inventory optimizer might misallocate stock due to noisy inputs. That’s why rollback in MAS often needs to be triggered by signals more nuanced than failure codes: confidence thresholds, anomaly scores, or model-based diagnostics like variational autoencoders (VAEs) can serve as indicators that something has gone off-track.

In modern MAS, every action is logged, complete with metadata like agent identifiers, timestamps, and input hashes via systems such as Apache Kafka. These logs do more than support debugging; they create a forensic trail of system behavior, essential for auditability and post-hoc analysis, particularly in regulated domains like finance and healthcare. Detecting when something has gone wrong in a system of autonomous agents isn’t always straightforward. It might involve checking outputs against hard-coded thresholds, leveraging statistical anomaly detection models like VAEs, or incorporating human-in-the-loop workflows to catch edge cases that models miss. Once identified, rollback decisions are coordinated by an orchestrator that draws on these logs and the system’s transactional history to determine what went wrong, when, and how to respond.

The rollback is a toolkit of strategies selected based on the failure mode and the system’s tolerance for disruption. One approach, compensating transactions, aims to undo actions by applying their logical inverse: a payment is reversed, a shipment is recalled. But compensating for AI-driven decisions means accounting for uncertainty. Confidence scores, ensemble agreement, or even retrospective model audits may be needed to confirm that an action was indeed faulty before undoing it. Another approach, state restoration, rolls the system back to a previously captured snapshot—resetting variables to a known-good configuration. This works well for clear-cut failures, like misallocated inventory, but it comes at a cost: any valid downstream work done since the snapshot may be lost. To avoid this, systems increasingly turn to partial rollbacks, surgically undoing only the affected steps while preserving valid state elsewhere. In a claims processing system, for instance, a misassigned medical code might be corrected without resetting the entire claim’s status, maintaining progress elsewhere in the workflow. But resilience in multi-agent systems isn’t just about recovering, it’s about recovering intelligently. In dynamic environments, reverting to a past state can be counterproductive if the context has shifted. Rollback strategies need to be context-aware, adapting to changes in data, workflows, or external systems. They need to ensure that the system is restored to a state that is still relevant and consistent with the current environmental conditions. Frameworks like ReAgent provide early demonstration on what this could look like: reversible collaborative reasoning across agents, with explicit backtracking and correction pathways. Instead of merely rolling back to a prior state, agents revise their reasoning in light of new evidence. By allowing agents to backtrack and correct their reasoning, such frameworks offer a form of intelligent rollback that is more nuanced than simply reverting to a prior state.

Building robust rollback in MAS requires adapting classical transactional principles—atomicity, consistency, isolation, durability (ACID)—to distributed AI contexts. Traditional databases enforce strict ACID guarantees through centralized control, but MAS often trade strict consistency for scalability, favoring eventual consistency in most interactions. Still, for critical operations, MAS can lean on distributed coordination techniques like two-phase commits or the Saga pattern to approximate ACID-like reliability without introducing system-wide bottlenecks. The Saga pattern, in particular, is designed to manage large, distributed transactions. It decomposes them into a sequence of smaller, independently executed steps, each scoped to a single agent. If something fails midway, compensating transactions are used to unwind the damage, rolling the system back to a coherent state without requiring every component to hold a lock on the global system state. This autonomy-first model aligns well with how MAS operate: each agent governs its own local logic, yet contributes to an eventually consistent global objective. Emerging frameworks like SagaLLM advance this further by tailoring saga-based coordination to LLM-powered agents, introducing rollback hooks that are not just state-aware but also constraint-sensitive, ensuring that even when agents fail or outputs drift, the system can recover coherently. These mechanisms help bridge the gap between high-capacity, probabilistic reasoning and the hard guarantees needed for enterprise-grade applications involving multiple autonomous agents.

To ground this, consider a large bank deploying an MAS for real-time fraud detection. The system might include a risk-scoring agent (such as a fine-tuned BERT model scoring transactions for risk), a compliance agent enforcing AML rules via symbolic logic, and a settlement agent updating ledger entries via blockchain APIs. A Kubernetes-based orchestrator sequences these agents, with Kafka streaming in transactional data and DynamoDB maintaining distributed state. Now suppose the fraud detection agent flags a routine payment as anomalous. The error is caught either via statistical anomaly detection or a human override and rollback is initiated. The orchestrator triggers a compensating transaction to reverse the ledger update, a snapshot is restored to reset the account state, and the incident is logged for regulatory audits. In parallel, the system might update its anomaly model or confidence thresholds—learning from the mistake rather than simply erasing it. And integrating these AI-native systems with legacy infrastructure adds another layer of complexity. Middleware like MuleSoft becomes essential, not just for translating data formats or bridging APIs, but for managing latency, preserving transactional coherence, and ensuring the MAS doesn’t break when it encounters the brittle assumptions baked into older systems.

The stochastic nature of AI makes rollback an inherently fuzzy process. A fraud detection agent might assign a 90% confidence score to a transaction and still be wrong. Static thresholds risk swinging too far in either direction: overreacting to benign anomalies or missing subtle but meaningful failures. While techniques like VAEs are often explored for anomaly detection, other methods, such as statistical process control or reinforcement learning, offer more adaptive approaches. These methods can calibrate rollback thresholds dynamically, tuning themselves in response to real-world system performance rather than hardcoded heuristics. Workflow topology also shapes rollback strategy. Directed acyclic graphs (DAGs) are the default abstraction for modeling MAS workflows, offering clear scoping of dependencies and rollback domains. But real-world workflows aren’t always acyclic. Cyclic dependencies, such as feedback loops between agents, require more nuanced handling. Cycle detection algorithms or formal methods like Petri nets become essential for understanding rollback boundaries: if an inventory agent fails, for instance, the system might need to reverse only downstream logistics actions, while preserving upstream demand forecasts. Tools like Apache Airflow or LangGraph implement this. What all this points to is a broader architectural principle: MAS design is as much about managing uncertainty and constraints as it is about building intelligence. The deeper challenge lies in formalizing these trade-offs—balancing latency versus consistency, memory versus compute, automation versus oversight—and translating them into robust architectures.

Versatile Applications

In supply chain management defined by uncertainty and interdependence, MAS can be deployed to optimize complex logistics networks, manage inventory levels dynamically, and improve communication and coordination between various stakeholders, including suppliers, manufacturers, and distributors. Rollback mechanisms are particularly valuable in this context for recovering from unexpected disruptions such as supplier failures, transportation delays, or sudden fluctuations in demand. If a critical supplier suddenly ceases operations, a MAS with rollback capabilities could revert to a previous state where perhaps alternate suppliers had been identified and contingencies pre-positioned, minimizing the impact on the production schedule. Similarly, if a major transportation route becomes unavailable due to unforeseen circumstances, the system could roll back to a prior plan and activate pre-arranged contingency routes. We’re already seeing this logic surface in MAS-ML frameworks that combine MAS with machine learning techniques to enable adaptive learning with structured coordination to give supply chains a form of situational memory.

Smart/advanced manufacturing environments, characterized by interconnected machines, autonomous robots, and intelligent control systems, stand to benefit even more. Here, MAS can coordinate the activities of robots on the assembly line, manage complex production schedules to account for shifting priorities, and optimize the allocation of manufacturing resources. Rollback mechanisms are crucial for ensuring the reliability and efficiency of these operations by providing a way to recover from equipment malfunctions, production errors, or unexpected changes in product specifications. If a robotic arm malfunctions during a high-precision weld, a rollback mechanism could revert the affected components to their prior state and resume the task to another available robot or a different production cell. The emerging concept of an Agent Computing Node (ACN) within multi-agent manufacturing systems offers a path for easy(ier) deployment of these capabilities. Embedding rollback at the ACN level could allow real-time scheduling decisions to unwind locally without disrupting global coherence, enabling factories that aren’t just smart, but more fault-tolerant by design.

In financial trading platforms, which operate in highly volatile and time-sensitive markets where milliseconds equate to millions and regulatory compliance is enforced in audit logs, MAS can serve as algorithmic engines behind trading, portfolio management, and real-time risk assessment. Rollback here effectively plays a dual role: operational safeguard and regulatory necessity. Rollback capabilities are essential for maintaining the accuracy and integrity of financial transactions, recovering from trading errors caused by software glitches or market anomalies, and mitigating the potential impact of extreme market volatility. If a trading algorithm executes a series of erroneous trades due to a sudden, unexpected market event, a rollback mechanism could reverse these trades and restore the affected accounts to their previous state, preventing significant financial losses. Frameworks like TradingAgents, which simulate institutional-grade MAS trading strategies, underscore the value of rollback not just as a corrective tool but as a mechanism for sustaining trust and interpretability in high-stakes environments.

In cybersecurity, multi-agent systems can be leveraged for automated threat detection, real-time analysis of network traffic for suspicious activities, and the coordination of defensive strategies to protect enterprise networks and data. MAS with rollback mechanisms are critical for enabling rapid recovery from cyberattacks, such as ransomware or data breaches, by restoring affected systems to a known clean state before the intrusion occurred. For example, if a malicious agent manages to infiltrate a network and compromise several systems, a rollback mechanism could restore those systems to a point in time before the breach, effectively neutralizing the attacker's actions and preventing further damage. Recent developments on Multi-Agent Deep Reinforcement Learning (MADRL) for autonomous cyber defense has begun to formalize this concept: “restore” as a deliberate, learnable action in a broader threat response strategy, highlighting the importance of rollback-like functionalities.

Looking Ahead

The ecosystem for MAS is evolving not just in capability, but also in topology with frameworks like AgentNet proposing fully decentralized paradigms where agents can evolve their capabilities and collaborate efficiently without relying on a central orchestrator. The challenge lies in coordinating these individual rollback actions in a way that maintains the integrity and consistency of the entire multi-agent system. When there’s no global conductor, how do you coordinate recovery in a way that preserves system-level integrity? There are recent directions exploring how to equip individual agents with the ability to rollback their actions locally and states autonomously, contributing to the system's overall resilience without relying on a centralized recovery mechanism.

Building scalable rollback mechanisms in large-scale MAS, which may involve hundreds or even thousands of autonomous agents operating in a distributed environment, is shaping up to be a significant systems challenge. The overhead associated with tracking state and logging messages to enable potential rollbacks starts to balloon as the number of agents and their interactions increase. Getting rollback to work at this scale requires new protocol designs that are not only efficient, but also resilient to partial failure and misalignment.

But the technical hurdles in enterprise settings are just one layer. There are still fundamental questions to be answered. Can rollback points be learned or inferred dynamically, tuned to the nature and scope of the disruption? What’s the right evaluation framework for rollback in MAS—do we optimize for system uptime, recovery speed, agent utility, or something else entirely? And how do we build mechanisms that allow for human intervention without diminishing the agents’ autonomy yet still ensure overall system safety and compliance?

More broadly, we need ways to verify the correctness and safety of these rollback systems under real-world constraints, not just in simulated testbeds, especially in enterprise deployments where agents often interact with physical infrastructure or third-party systems. As such, this becomes more of a question of system aliment based on varying internal business processes and constraints. For now, there’s still a gap between what we can build and what we should build—building rollback into MAS at scale requires more than just resilient code. It’s still a test of how well we can align autonomous systems in a reliable, secure, and meaningfully integrated way against partial failures, adversarial inputs, and rapidly changing operational contexts.

Dean Mai 4/6/25 Dean Mai 4/6/25

Garbage Collection Tuning In Large-Scale Enterprise Applications

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Consider a high-frequency trading platform processing orders in microseconds. After exhausting traditional performance levers (scaling cores, rebalancing threads, optimizing code paths), unexplained latency spikes persisted. The culprit? GC pauses—intermittent, multi-hundred-millisecond interruptions from the JVM's G1 collector. These delays, imperceptible in consumer applications, are catastrophic in environments where microseconds mean millions. Over months, the engineering team tuned G1, minimized allocations, and restructured the memory lifecycle. Pauses became predictable. The broader point is that GC, long relegated to the domain of implementation detail, is now functioning as an architectural constraint with competitive implications. In latency-sensitive domains, it functions less like background maintenance and more like market infrastructure. Organizations that treat it accordingly will find themselves with a structural advantage. Those that don’t risk falling behind.

Across the enterprise software landscape, memory management is undergoing a quiet but significant reframing. Major cloud providers—AWS, Google Cloud, and Azure—are increasingly standardizing on managed runtimes like Java, .NET, and Go, embedding them deeply across their platforms. Kubernetes clusters now routinely launch thousands of containers, each with its own runtime environment and independent garbage collector running behind the scenes. At the same time, workloads are growing more demanding—spanning machine learning inference, real-time analytics, and distributed databases. These are no longer the relatively simple web applications of the early 2000s, but complex, large-scale systems with highly variable allocation behavior. They are allocation-heavy, latency-sensitive, and highly bursty. As a result, the old mental ‘set a heap size, pick a collector, move on’ model for GC tuning is increasingly incompatible with modern workloads and breaking down. The market is beginning to demand more nuanced, adaptive approaches. In response, cloud vendors, consultancies, and open-source communities are actively exploring what modern memory management should look like at scale.

At its core, GC is an attempt to automate memory reclamation. It is the runtime’s mechanism for managing memory—cleaning up objects that are no longer in use. When memory is allocated for something like a trade order, a customer record, or a neural network layer, the GC eventually reclaims that space once it’s no longer needed. But the implementation is a compromise. In theory, this process is automatic and unobtrusive. In practice, it’s a delicate balancing act. The collector must determine when to run, how much memory to reclaim, and how to do so without significantly disrupting application performance. If it runs too frequently, it consumes valuable CPU resources. If it waits too long, applications can experience memory pressure and even out-of-memory errors. Traditional collection strategies—such as mark-and-sweep, generational, or copying collectors—each bring their own trade-offs. But today, much of the innovation is happening in newer collectors like G1, Shenandoah, ZGC, and Epsilon. These are purpose-built for scalability and low latency, targeting the kinds of workloads modern enterprises increasingly rely on. The challenge, however, is that these collectors are not truly plug-and-play. Their performance characteristics hinge on configuration details. Effective tuning often requires deep expertise and workload-specific knowledge—an area that’s quickly gaining attention as organizations push for more efficient and predictable performance at scale.

Take G1: the default garbage collector in modern Java. It follows a generational model, dividing the heap into young and old regions, but with a key innovation: it operates on fixed-size regions, allowing for incremental cleanup. The goal is to deliver predictable pause times—a crucial feature in enterprise environments where even a 500ms delay can have real financial impact. That said, G1 can be challenging to tune effectively. Engineers familiar with its inner workings know it offers a wide array of configuration options, each with meaningful trade-offs. Parameters like -XX:MaxGCPauseMillis allow developers to target specific latency thresholds, but aggressive settings can significantly reduce throughput. For instance, the JVM may shrink the heap or adjust survivor space sizes to meet pause goals, which can lead to increased GC frequency and higher allocation pressure. This often results in reduced throughput, especially under bursty or memory-intensive workloads. Achieving optimal performance typically requires balancing pause time targets with realistic expectations about allocation rates and heap sizing. Similarly, -XX:G1HeapRegionSize lets you adjust region granularity, but selecting an inappropriate value may lead to memory fragmentation or inefficient heap usage. Benchmark data from OpenJDK’s JMH suite, tested on a 64-core AWS Graviton3 instance, illustrates just how sensitive performance can be. In one case, an untuned G1 configuration resulted in 95th-percentile GC pauses of around 300ms. In one specific configuration and workload scenario, careful tuning reduced pauses significantly. The broader implication is clear: organizations with the expertise to deeply tune their runtimes unlock performance. Others leave it on the table.

Across the industry, runtime divergence is accelerating. .NET Core and Go are steadily gaining traction, particularly among cloud-native organizations. Each runtime brings its own approach to GC. The .NET CLR employs a generational collector with a server mode that strikes a good balance for throughput, but it tends to underperform in latency-sensitive environments. Go’s GC, on the other hand, is lightweight, concurrent, and optimized for low pause times—typically around 1ms or less (under typical workloads). However, it can struggle with memory-intensive applications due to its conservative approach to memory reclamation. Running a brief experiment with a Go-based microservice simulating a payment gateway (10,000 requests per second and a 1GB heap), with default settings, delivers 5ms pauses at the 99th percentile. By adjusting the GOMEMLIMIT setting to trigger more frequent cycles, it was possible to reduce pauses to 2ms, but this came at the cost of a 30% increase in memory usage (hough results will vary depending on workload characteristics). With many performance optimizations, these are the trade-offs and they’re workload-dependent.

Contemporary workloads are more erratic. Modern systems stream events, cache large working sets, and process thousands of concurrent requests. The traditional enterprise mainstay (CRUD applications interacting with relational databases) is being replaced by event-driven systems, streaming pipelines, and in-memory data grids. Technologies like Apache Kafka are now ubiquitous, processing massive volumes of logs, while Redis and Hazelcast are caching petabytes of state. These modern systems generate objects at a rapid pace, with highly variable allocation patterns: short-lived events, long-lived caches, and everything in between. In one case, a logistics company running a fleet management platform on Kubernetes, saw full GC pauses every few hours. Their Java pods were struggling with full garbage collections every few hours, caused by an influx of telemetry data. After switching to Shenandoah, Red Hat’s low-pause collector, they saw GC pauses drop from 1.2 seconds to just 50ms. However, the improvement came at a cost—CPU usage increased by 15%, and they needed to rebalance their cluster to prevent hotspots. This is becoming increasingly common: latency improvements now have architectural consequences.

Vendor strategies are also diverging. The major players—Oracle, Microsoft, and Google—are all aware that GC can be a pain point, though their approaches vary. Oracle is pushing ZGC in OpenJDK, a collector designed to deliver sub-millisecond pauses even on multi-terabyte heaps. It’s a compelling solution (benchmarks from Azul show it maintaining stable 0.5ms pauses on a 128GB heap under heavy load) but it can be somewhat finicky. It utilizes a modern kernel with huge pages enabled (doesn’t require them but performs better with them), and its reliance on concurrent compaction demands careful management to avoid excessive CPU usage. Microsoft’s .NET team has taken a more incremental approach, focusing on gradual improvements to the CLR’s garbage collector. While this strategy delivers steady progress, it lags behind the more radical redesigns seen in the Java ecosystem. Google’s Go runtime stands apart, with a GC built for simplicity and low-latency performance. It’s particularly popular with startups, though it can be challenging for enterprises with more complex memory management requirements. Meanwhile, niche players like Azul are carving out a unique space with custom JVMs. Their flagship product, Zing, combines ZGC-like performance (powered by Azul’s proprietary C4 collector comparable to ZGC in terms of pause times) with advanced diagnostics that many describe as exceptionally powerful. Azul’s “we tune it for you” value proposition seems to be resonating—their revenue grew over 95% over the past three years, according to their filings.

Consultancies are responding as well. The Big Four—Deloitte, PwC, EY, and KPMG—are increasingly building out teams with runtime expertise and now including GC tuning in digital transformation playbooks. Industry case studies illustrate the tangible benefits: one telco reportedly reduced its cloud spend by 20% by fine-tuning G1 across hundreds nodes, while a major retailer improved checkout latency by 100ms after migrating to Shenandoah. Smaller, more technically focused firms like ThoughtWorks are taking an even deeper approach, offering specialized profiling tools and tailored workshops for engineering teams. So runtime behavior is no longer a backend concern—it’s a P&L lever.

The open-source ecosystem plays a vital dual role in fueling the GC innovation while introducing complexity by fragmenting tooling. Many of today’s leading collectors such as Shenandoah, ZGC, and G1 emerged from OSS community-driven research efforts before becoming production-ready. However, a capability gap persists: tooling exists, but expertise is required to extract value from it. Utilities like VisualVM and Eclipse MAT provide valuable insights—heap dumps, allocation trends, and pause time metrics—but making sense of that data often requires significant experience and intuition. In one example, a 10GB heap dump from a synthetic workload revealed a memory leak caused by a misconfigured thread pool. While the tools surfaced the right signals, diagnosing and resolving the issue ultimately depended on hands-on expertise. Emerging projects like GCViewer and OpenTelemetry’s JVM metrics are improving visibility, but most enterprises still face a gap between data and diagnosis that’s increasingly monetized. For enterprises seeking turnkey solutions, the current open-source tooling often falls short. As a result, vendors and consultancies are stepping in to fill the gap—offering more polished, supported options, often at a premium.

One emerging trend worth watching: no-GC runtimes. Epsilon, a no-op collector available in OpenJDK, effectively disables garbage collection, allocating memory until exhaustion. While this approach is highly specialized, it has found a niche in environments where ultra-low latency is paramount, leverage it for short-lived, high-throughput workloads where every microsecond counts. It’s a tactical tool: no GC means no pauses, but also no safety net. In a simple benchmark of allocating 100 million objects on a 1GB heap, Epsilon delivered about 20% higher throughput than G1—in a synthetic, allocation-heavy workload designed to avoid GC interruptions—with no GC pauses until the heap was fully consumed. That said, this approach demands precise memory sizing, as there’s no safety net once the heap fills up. And since Epsilon does not actually perform GC, the JVM shuts down when the heap is exhausted. So in systems that handle large volumes of data and require high reliability, this behavior poses a significant risk. Running out of memory could lead to system crashes during critical operations, making it unsuitable for environments that demand continuous uptime and stability

Rust represents a divergence in runtime philosophy: its ownership model frontloads complexity in exchange for execution-time determinism. Its ownership model eliminates the need for garbage collection entirely, giving developers fine-grained control over memory. It’s gaining popularity in systems programming, though enterprise adoption remains slow—retraining teams accustomed to Java or .NET is often a multi-year effort. Still, these developments are prompting a quiet reevaluation in some corners of the industry. Perhaps the challenge isn’t just tuning GC, it’s rethinking whether we need it at all in certain contexts.

Directionally, GC is now part of the performance stack, not a postscript. The enterprise software market appears to be at an inflection point. Due to AI workloads, latency and throughput are no longer differentiators; there’s a growing shift toward predictable performance and manual memory control. In this landscape, GC is emerging as a more visible and persistent bottleneck. Organizations that invest in performance, whether through specialized talent, intelligent tooling, or strategic vendor partnerships, stand to gain a meaningful advantage. Cloud providers will continue refining their managed runtimes with smarter defaults, but the biggest performance gains will likely come from deeper customization. Consultancies are expected to expand GC optimization as a service offering, and we’ll likely see more specialized vendors like Azul carving out space at the edges. Open-source innovation will remain strong, though the gap between powerful raw tools and enterprise-ready solutions may continue to grow. And in the background, there may be a gradual shift toward no-GC alternatives as workloads evolve in complexity and scale. Hardware changes (e.g., AWS Graviton) amplify memory management pressure due to higher parallelism; with more cores there are more objects, and more stress on memory management systems. Ultimately, managed runtimes will improve, but improvements will mostly serve the median case. High-performance outliers will remain underserved—fertile ground for optimization vendors and open-source innovation.

For now, GC tuning doesn’t make headlines, but it does shape the systems that do as it increasingly defines the boundary between efficient, scalable systems and costly, brittle ones. The organizations that master memory will move faster, spend less, and scale cleaner. Those that don’t may find themselves playing catch-up—wondering why performance lags and operational expenses continue to climb. GC isn’t a solved problem. It’s a leverage point—in a market this dynamic, even subtle shifts in infrastructure performance can have a meaningful impact over time.

Dean Mai 12/13/24 Dean Mai 12/13/24

Specialization and Modularity in AI Architecture with Multi-Agent Systems

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations.

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations. They are inefficient for specialized tasks, requiring significant resources for even simple queries, and can be cumbersome to update and scale. Mono-LLMs are difficult to scale because every improvement impacts the entire system, leading to complex update cycles and reduced agility. Multi-agent systems, on the other hand, introduce a more modular and task-specific approach, enabling specialized agents to handle discrete problems with greater efficiency and adaptability.

This modularity is particularly valuable in enterprise settings, where the range of tasks—data analysis, decision support, workflow automation—requires diverse expertise. Multi-agent systems make it possible to deploy agents with specific capabilities, such as generating code, providing real-time insights, or managing system resources. For example, a compiler agent in an MAS setup is not just responsible for executing code but also participates in optimizing the process. By incorporating real-time feedback, the compiler can adapt its execution strategies, correct errors, and fine-tune outputs based on the context of the task. This is especially useful for software teams working on rapidly evolving projects, where the ability to test, debug, and iterate efficiently can translate directly into faster product cycles.

Feedback systems are another critical component of MAS, enabling these systems to adapt on the fly. In traditional setups, feedback loops are often reactive—errors are identified post hoc, and adjustments are made later. MAS integrate feedback as part of their operational core, allowing agents to refine their behavior in real-time. This capability is particularly useful in scenarios where decisions must be made quickly and with incomplete information, such as supply chain logistics or financial forecasting. By learning from each interaction, agents improve their accuracy and relevance, making them more effective collaborators in decision-making processes.

Memory management is where MAS ultimately demonstrate practical improvements. Instead of relying on static memory allocation, which can lead to inefficiencies in resource use, MAS employ predictive memory strategies. These strategies allow agents to anticipate their memory needs based on past behavior and current workloads, ensuring that resources are allocated efficiently. For enterprises, this means systems that can handle complex, data-heavy tasks without bottlenecks or delays, whether it’s processing customer data or running simulations for product design.

Collaboration among agents is central to the success of MAS. Inter-agent learning protocols facilitate this by creating standardized ways for agents to share knowledge and insights. For instance, a code-generation agent might identify a useful pattern during its operations and share it with a related testing agent, which could then use that information to improve its validation process. This kind of knowledge-sharing reduces redundancy and accelerates problem-solving, making the entire system more efficient. Additionally, intelligent cleanup mechanisms ensure that obsolete or redundant data is eliminated without disrupting ongoing operations, balancing resource utilization and system stability. Advanced memory management thus becomes a cornerstone of the MAS architecture, enabling the system to scale efficiently while maintaining responsiveness. It also makes MAS particularly well-suited for environments where cross-functional tasks are the norm, such as coordinating between sales, operations, and customer service in a large organization.

The infrastructure supporting MAS is designed to make these systems practical for enterprise use. Agent authentication mechanisms ensure that only authorized agents interact within the system, reducing security risks. Integration platforms enable seamless connections between agents and external tools, such as APIs or third-party services, while specialized runtime environments optimize the performance of AI-generated code. In practice, these features mean enterprises can deploy MAS without requiring a complete overhaul of their existing tech stack, making adoption more feasible and less disruptive.

Consider a retail operation looking to improve its supply chain. With MAS, the system could deploy agents to predict demand fluctuations, optimize inventory levels, and automate vendor negotiations, all while sharing data across the network to ensure alignment. Similarly, in a software development context, MAS can streamline workflows by coordinating code generation, debugging, and deployment, allowing teams to focus on strategic decisions rather than repetitive tasks.

What makes MAS particularly compelling is their ability to evolve alongside the organizations they serve. As new challenges emerge, agents can be updated or added without disrupting the entire system. This modularity makes MAS a practical solution for enterprises navigating the rapid pace of technological change. By focusing on specific, well-defined tasks and integrating seamlessly with existing workflows, MAS provide a scalable, adaptable framework that supports real-world operations.

This shift to multi-agent systems is not about replacing existing tools but enhancing them. By breaking down complex problems into manageable pieces and assigning them to specialized agents, MAS make it easier for enterprises to tackle their most pressing challenges. These systems are built to integrate, adapt, and grow, making them a practical and valuable addition to the toolkit of modern organizations.

Dean Mai 11/26/24 Dean Mai 11/26/24

Adopting Function-as-a-Service (FaaS) for AI workflows

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences.

Function-as-a-Service (FaaS) stands at the crossroads of cloud computing innovation and the evolving needs of modern application development. It isn’t just an incremental improvement over existing paradigms; it is an entirely new mode of thinking about computation, resources, and scale. In a world where technology continues to demand agility and abstraction, FaaS offers a lens to rethink how software operates in a fundamentally event-driven, modular, and reactive manner.

At its essence, FaaS enables developers to execute isolated, stateless functions without concern for the underlying infrastructure. The abstraction here is not superficial but structural. Traditional cloud models like Infrastructure-as-a-Service (IaaS) or even Platform-as-a-Service (PaaS) hinge on predefined notions of persistence—instances, containers, or platforms that remain idle, waiting for tasks. FaaS discards this legacy. Instead, computation occurs as a series of discrete events, each consuming resources only for the moment it executes. This operational principle aligns deeply with the physics of computation itself: using resources only when causally necessary.

To fully grasp the implications of FaaS, consider its architecture. The foundational layer is virtualization, which isolates individual functions. Historically, the field has relied on virtualization techniques like hypervisors and container orchestration to allocate resources effectively. FaaS narrows this focus further. Lightweight microVMs and unikernels are emerging as dominant trends, optimized to ensure rapid cold starts and reduced resource overhead. However, this comes at a cost: such architectures often sacrifice flexibility, requiring developers to operate within tightly controlled parameters of execution.

Above this virtualization layer is the encapsulation layer, which transforms FaaS into something that developers can tangibly work with. The challenge here is not merely technical but conceptual. Cold starts—delays caused by initializing environments from scratch—represent a fundamental bottleneck. Various techniques, such as checkpointing, prewarming, and even speculative execution, seek to address this issue. Yet, each of these solutions introduces trade-offs. Speculative prewarming may solve latency for a subset of tasks but at the cost of wasted compute. This tension exemplifies the core dynamism of FaaS: every abstraction must be balanced against the inescapable physics of finite resources.

The orchestration layer introduces complexity. Once a simple scheduling problem, orchestration in FaaS becomes a fluid, real-time process of managing unpredictable workloads. Tasks do not arrive sequentially but chaotically, each demanding isolated execution while being part of larger workflows. Systems like Kubernetes, originally built for containers, are evolving to handle this flux. In FaaS, orchestration must not only schedule tasks efficiently but also anticipate failure modes and latency spikes that could disrupt downstream systems. This is particularly critical for AI applications, where real-time responsiveness often defines the product’s value.

The final piece of the puzzle is the coordination layer, where FaaS bridges with Backend-as-a-Service (BaaS) components. Here, stateless functions are augmented with stateful abstractions—databases, message queues, storage layers. This synthesis enables FaaS to transcend its stateless nature, allowing developers to compose complex workflows. However, this dependency on external systems introduces fragility. Latency and failure are not isolated to the function execution itself but ripple across the entire ecosystem. This creates a fascinating systems-level challenge: how to design architectures that are both modular and resilient under stress.

What makes FaaS particularly significant is its impact on enterprise AI development. The state of AI today demands systems that are elastic, cost-efficient, and capable of real-time decision-making. FaaS fits naturally into this paradigm. Training a machine learning model may remain the domain of large-scale, distributed clusters, but serving inferences is a different challenge altogether. With FaaS, inference pipelines can scale dynamically, handling sporadic spikes in demand without pre-provisioning costly infrastructure. This elasticity fundamentally changes the economics of deploying AI systems, particularly in industries where demand patterns are unpredictable.

Cost is another dimension where FaaS aligns with the economics of AI. The pay-as-you-go billing model eliminates the sunk cost of idle compute. Consider a fraud detection system in finance: the model is invoked only when a transaction occurs. Under traditional models, the infrastructure to handle such transactions would remain operational regardless of workload. FaaS eliminates this inefficiency, ensuring that resources are consumed strictly in proportion to demand. However, this efficiency can sometimes obscure the complexities of cost prediction. Variability in workload execution times or dependency latencies can lead to unexpected billing spikes, a challenge enterprises are still learning to navigate.

Timeouts also impose a hard ceiling on execution in most FaaS environments, often measured in seconds or minutes. For many AI tasks—especially inference pipelines processing large inputs or models requiring nontrivial preprocessing—these limits can become a structural constraint rather than a simple runtime edge case. Timeouts force developers to split logic across multiple functions, offload parts of computation to external services, or preemptively trim the complexity of their models. These are engineering compromises driven not by the shape of the problem, but by the shape of the platform.

Perhaps the most profound impact of FaaS on AI is its ability to reduce cognitive overhead for developers. By abstracting infrastructure management, FaaS enables teams to iterate on ideas without being burdened by operational concerns. This freedom is particularly valuable in AI, where rapid experimentation often leads to breakthroughs. Deploying a sentiment analysis model or an anomaly detection system no longer requires provisioning servers, configuring environments, or maintaining uptime. Instead, developers can focus purely on refining their models and algorithms.

But the story of FaaS is not without challenges. The reliance on statelessness, while simplifying scaling, introduces new complexities in state management. AI applications often require shared state, whether in the form of session data, user context, or intermediate results. Externalizing this state to distributed storage or databases adds latency and fragility. While innovations in distributed caching and event-driven state reconciliation offer partial solutions, they remain imperfect. The dream of a truly stateful FaaS model—one that maintains the benefits of statelessness while enabling efficient state sharing—remains an open research frontier.

Cold start latency is another unsolved problem. AI systems that rely on real-time inference cannot tolerate delays introduced by environment initialization. For example, a voice assistant processing user queries needs to respond instantly; any delay breaks the illusion of interactivity. Techniques like prewarming instances or relying on lightweight runtime environments mitigate this issue but cannot eliminate it entirely. The physics of computation imposes hard limits on how quickly environments can be instantiated, particularly when security isolation is required.

Vendor lock-in is a systemic issue that pervades FaaS adoption where currently each cloud provider builds proprietary abstractions, tying developers to specific APIs, runtimes, and pricing models. While open-source projects like Knative and OpenFaaS aim to create portable alternatives, they struggle to match the integration depth and ecosystem maturity of their commercial counterparts. This tension between portability and convenience is a manifestation of the broader dynamics in cloud computing.

Looking ahead, the future of FaaS I believe will be defined by its integration with edge computing. As computation migrates closer to the source of data generation, the principles of FaaS—modularity, event-driven execution, ephemeral state—become increasingly relevant. AI models deployed on edge devices, from autonomous vehicles to smart cameras, will rely on FaaS-like paradigms to manage local inference tasks. This shift will not only redefine the boundaries of FaaS but also force the development of new orchestration and coordination mechanisms capable of operating in highly distributed environments.

In reflecting on FaaS, one cannot ignore its broader almost philosophical implications. At its heart, FaaS is an argument about the nature of computation: that it is not a continuous resource to be managed but a series of discrete events to be orchestrated. This shift reframes the role of software itself, not as a persistent entity but as a dynamic, ephemeral phenomenon.

Dean Mai 6/15/17 Dean Mai 6/15/17

Legal Personhood for Artificial Intelligences

There are innumerable examples of other ways in which information technology has caused changes in the existing legislative structures. The law is naturally elastic, and can be expanded or amended to adapt to the new circumstances created by technological advancement. The continued development of artificial intelligence, however, may challenge the expansive character of the law because it presents an entirely novel situation.

They kept hooking hardware into him – decision-action boxes to let him boss other computers, bank on bank of additional memories, more banks of associational neural nets,’ another tubful of twelve-digit random numbers, a greatly augmented temporary memory. Human brain has around ten-to-tenth neurons. By third year Mike has better than one and a half times that number of neuristors. And woke up.
― The Moon is a Harsh Mistress, Robert A. Heinlein

Following Google I/O, Google's annual developer conference, where the company revealed the roadmap for highly-intelligent conversational AI and a bot-powered platform, as artificial intelligence disrupts how we live our lives, redefining how we would interact with present and future technology tools by automating things in a new way, it is inevitable we all have to imbibe the automated life gospel. One of the steps into that life is trying to unify the scope of the current technological advancements into a coherent framework of thought by exploring how current law applies to different sets of legal rights regarding artificial intelligence.

Artificial intelligence may generally be defined as the intelligence possessed by machines or software used to operate machines. It also encompasses the academic field of study that is more widely known as computer science. The basic premise of this field of study is that scientists can engineer intelligent agents that are capable of making accurate perceptions concerning their environment. These agents are then able to make correct actions based on these perceptions. The discipline of artificial intelligence explores the possibility of passing on traits that human beings possess as intelligent beings. These include knowledge, reasoning, the ability to learn and plan, perception, movement of objects and communication using language. As an academic field, it may be described as being interdisciplinary, as it combines sciences such as mathematics, computer science, and neuroscience as well as professional studies such as linguistics, psychology and philosophy. Professionals involved in the development of artificial intelligence use different tools to get machines to simulate characteristics of intelligence only found in humans.

But artificial intelligence only follows the lead of the already omnipresent challenges and changes to the existing legal frameworks. The twenty first century is undoubtedly the age of information and technology. Exciting scientific breakthroughs continue to be experienced as innovators work to create better, more intelligent and energy efficient machines. Rapid information technology development has posed challenges to several areas of law both domestically and internationally. Many of these challenges have been discussed at length and continue to be addressed through reforms of existing laws.

The trend towards reform of law to keep up with the growth of technology can also be illustrated by observing the use of social media to generate content. As social media has continued to grow and influence the world, international media law has recognized citizen journalism. The traditional role of journalists has been to generate and disseminate information. As the world’s population has gained increased access to smart devices, ordinary people have been able to capture breaking stories that are then uploaded to the internet through several platforms. This has eroded the sharp distinction that previously existed between professional journalists and ordinary citizens, as the internet provides alternatives to traditional news media sources.

There are innumerable examples of other ways in which information technology has caused changes in the existing legislative structures. The law is naturally elastic, and can be expanded or amended to adapt to the new circumstances created by technological advancement. The continued development of artificial intelligence, however, may challenge the expansive character of the law because it presents an entirely novel situation. To begin with, artificial intelligence raises philosophical questions concerning the nature of the minds of human beings. These philosophical questions are connected to legal and ethical issues of creating machines that are programmed to possess the qualities that are innate and unique to human beings. If machines can be built to behave like humans, then they must be accorded some form of legal personality, similar to that which humans have. At the very least, the law must make provision for the changes that advanced artificial intelligence will cause in the society through the introduction of a new species capable of rational, logical thought. By deriving general guidelines based on the case law of the past, it should aid the lawmakers to close the gap on technological singularity.

Legal personality endows its subjects with the capacity to have rights and obligations before the law. Without legal personality, there is no legal standing to conduct any binding transactions both domestically and internationally. Legal personality is divided into two categories. Human beings are regarded as natural or physical persons. The second category encompasses non-living legal subjects who are artificial but nonetheless treated as persons by the law. This is a fundamental concept in corporate law and international law. Corporations, states and international legal organizations are treated as persons before the law and are known as juridical persons. Without legal personality, there can be no basis upon which legal rights and duties can be established.

Natural persons have a wide array of rights that are recognized and protected by law. Civil and political rights protect an individual’s freedoms to self-expression, assemble, information, own property and self-determination. Social and economic rights acknowledge the individual’s fundamental needs to lead a dignified and productive life. These include the right to education, healthcare, adequate food, decent housing and shelter. As artificial intelligence continues to develop, and smarter machines are produced, it may be necessary to grant these machines legal personality.

This may seem like far-fetched scientific fiction, but it is in fact closer to reality than the general population is aware of. Computer scientists are at the frontline of designing cutting edge software and advanced robots that could revolutionize the way human live. Just like Turing’s machine was able to accomplish feats that were impossible for human mathematicians, scientists, and cryptologists, during World War II, the robots of the future will be able to think and act autonomously. Similarly, the positive implications of increased capacity to produce artificial intelligence, is the development of powerful machines. These machines could solve many of the problems that continue to hinder human progress such as disease, hunger, adverse weather and aging. The science of artificial intelligence would make it possible to program these machines to provide solutions to human problems, and their superior abilities would make it possible to find these solutions within a short period of time instead of decades or centuries.

The current legal framework does not provide an underlying definition of what determines whether a certain entity acquires legal rights. The philosophical approach does not yet distinguish between strong and weak forms of artificial intelligence.

Weak artificial intelligence merely facilitates a tool for enhancing human technological abilities. A running application comprising artificial intelligence aspects, such as Siri, represents only a simulation of a cognitive process but does not constitute a cognitive process itself. Strong artificial intelligence, on the other hand, suggests that a software application in principle can be designed to become aware of itself, become intelligent, understand, have perception of the world, and present cognitive states that are associated with the human mind.

The prospects for the development and use of artificial intelligence are exciting, but this narrative would be incomplete without making mention of the possible dangers as well. Humans may retain some level of remote control but the possibility that these created objects could rise up to positions of dominance over human beings is certainly a great concern. With the use of machines and the continual improvement of existing technology, some scientists are convinced that it is only a matter of time before artificial intelligence surpasses that of human intelligence.

Secondly, ethicists and philosophers have questioned whether it is sound to pass on innate characteristics of human beings on to machines if this could ultimately mean that the human race will become subject to these machines. Perhaps increased use of artificial intelligence to produce machines may dehumanize society, as functions that were previously carried out in the society become mechanized. In the past mechanization has resulted in loss of jobs as manpower is no longer required when machines can do the work. Reflections on history reveal that machines have assisted humans to make work easier, but it has not been possible to achieve an idyllic existence simply because machines exist.

Lastly, if this advanced software should fall into the hands of criminals, terrorist organizations or states that are set against peace and non-violence, the consequences would be dire. Criminal organizations could expand dangerous networks across the world using technological tools. Machines could be trained to kill or maim victims. Criminals could remotely control machines to commit crimes in different geographical areas. Software could be programmed to steal sensitive private information and incentivize corporate espionage.

The "singularity” is a term that was first coined by Vernor Vinge to describe a theoretical situation where machines created by humans develop superior intelligence and end the era of human dominance that would be as intelligent or more intelligent that human mind, using the exponential growth of computing power, based on the law of accelerating returns, combined with human understanding of the complexity of the brain.

As highlighted earlier, strong artificial intelligence that matches or surpasses human intelligence has not yet been developed, although its development has been envisioned. Strong artificial intelligence is a prominent theme in many science fiction movies probably because the notion of a super computer with the ability to outsmart humans is very interesting. In the meantime, before this science fiction dream can become a reality, weak artificial intelligence has slowly become a commonplace part of everyday life. Search engines and smart phone apps are the most common examples of weak artificial intelligence. These programs are simply designed and possess the ability to mimic simple aspects of human intelligence. Google is able to search for information on the web using key words or phrases inserted in by the user. The scenario of dominance by artificial intelligence seems a long way off from the current status quo. However, the launch of chatbots points towards the direction artificial intelligence will take in the near future using weak artificial intelligence.

Chatbots are the next link in the evolution chain of virtual personal assistants, such as Siri. Siri is the shortened version of the Scandinavian name Sigrid which means beauty or victory. It is a virtual personal assistant that is able to mimic human elements of interaction as it carries out its duties. The program is enabled with a speech function that enables it to reply to queries as well as take audio instructions. This is impressive as it does not require the user to type instructions. Siri is able to decode a verbal message, understand the instructions given and act on these instructions. Siri is able to provide information when requested to do so. It can also send text messages, organize personal schedules, book appointments and take note of important meetings on behalf of its user. Another impressive feature of the program is its ability to collect information about the user. As the user gives more instructions Siri stores this information and uses it to refine the services it offers to the user. The excitement that has greeted the successful launch of Siri within the mass market is imaginable. After Siri, came the chatbots. Chatbots are a type of conversational agent, a software designed to simulate an intelligent conversation with one or more human users via auditory or textual methods. The technology may be considered as weak artificial intelligence, but the abilities demonstrated by the program offer a glimpse into what the future holds for artificial intelligence development. For legal regulators virtual personal assistants' features demand that existing structures be reviewed to accommodate the novel circumstances that its use has introduced. As more programs like Siri contitnue to be commercialized, these new legal grey areas will feature more often in mainstream debate. Intellectual property law and liability law will probably be the areas most affected by uptake of chatbots by consumers.

Intellectual property law creates ownership rights for creators or inventors, to protect their interests in the works they create. Copyright law in particular, protects artistic creations by controlling the means by which these creations are distributed. The owners of copyright are then able to use their artistic works to earn an income. Anyone else who wants to deal with the creative works for profit or personal use must get authorization from the copyright owner. Persons who infringe on copyright are liable to face civil suits, arrest and fines. In the case of chatbots, the owner of the sounds produced by the program has not been clearly defined. It is quite likely that in the near future, these sounds will become a lucrative form of creative work and when that does happen it will be imperative that the law defines who the owner of these sounds is. Users are capable of using chatbot's features to mix different sounds, including works protected by copyright, to come up with new sounds. In this case, the law is unclear whether such content would be considered to be new content or whether it would be attributed to the original producers of the sound.

Another important question that would have to be addressed would be the issue of ownership between the creators of artificial intelligence programs, the users of such programs and those who utilize the output produced by the programs. A case could be made that the creators of the program are the original authors and are entitled to copyright the works that are produced using such a program. As artificial intelligence gains popularity within the society and more people have access to machines and programs like Siri, it is inevitable that conflicts of ownership will arise as different people battle to be recognized as the owner of the works produced. From the perspective of intellectual property, artificial intelligence cannot be left within the public domain. Due to its innate value and its capacity to generate new content, there will definitely be ownership wrangles. The law therefore needs to provide clarity and guidance on who has the right to claim ownership.

Law enforcement agents must constantly innovate in order to successfully investigate crime. Although the internet has made it easier to commit certain crimes, programs such as the ‘Sweetie’, avatar run by the charity Terres des Hommes based in Holland, illustrate how artificial intelligence can help to solve crime. The Sweetie avatar was developed by the charity to help investigate sex tourists who targeted children online. The offenders in such crimes engage in sexual acts with children from developing countries. The children are lured into the illicit practice with promises that they will be paid for their participation. After making contact and confirming that the children are indeed underage, the offenders then request the children to perform sexual acts in front of the cameras. The offenders may also perform sexual acts and request the children to view them.

The offenders prey on vulnerable children who often come from poor developing countries. The children are physically and mentally exploited to gratify offenders from wealthy Western countries. In October 2014, the Sweetie avatar project experienced its first successful conviction of a sex predator. The man, an Australian national named Scott Robert Hansen admitted that he had sent nude images of himself performing obscene acts to Sweetie. Hansen also pleaded guilty to possession of child pornography. Both these offenses were violations of previous orders issued against him as a repeat sexual offender. Sweetie is an app that is able to mimic the movements of a real ten year old girl. The 3D model is very lifelike, and the app allows for natural interactions such as typing during chats, nodding in response to questions asked or comments made. The app also makes it possible for the operator to move the 3D model from side to side in its seat. Hansen fell for the ploy and believed that Sweetie was a real child.

According to the court, it was immaterial that Sweetie did not exist. Hansen was guilty because he believed that she was a real child and his intention was to perform obscene acts in front of her. Although Hansen was the only person to be convicted as a result of the Terres des Hommes project, researchers working on it had patrolled the internet for ten weeks. In that time, thousands of men had gotten in touch with Sweetie. Terres des Hommes compiled a list of one thousand suspects which was handed over to Interpol and state police agencies for further investigations. The Sweetie project illustrates that artificial intelligence can be utilized to investigate difficult crimes such as sex tourism. The biggest benefit of such a project is that it created an avatar that was very convincing and removed the need to use real people in the undercover operation. In addition the project had an ideal way of collecting evidence through use of a form of artificial intelligence that was very difficult to contradict. Thus, in a way, artificial intelligence provided grounds for challenging the already existing legal rights of the accused

Presently the law provides different standards of liability for those who break the law. In criminal law, a person is liable for criminal activity if they demonstrate that they have both a guilty mind (the settled intent to commit a crime) and they performed the guilty act in line with this intent. In civil cases liability for wrongdoing can be reduced based on mitigating factors such as the contributory negligence of the other party. There is currently no explicit provision in law that allows defendants to escape liability by claiming that they relied on incorrect advice from an intelligent machine. However, with increased reliance on artificial intelligence to guide basic daily tasks, the law will eventually have to address this question. If a user of artificial intelligence software makes a mistake while acting on information from the software, they may suffer losses or damages arising from the mistake. In such cases the developers of the software may be required to compensate the user or incur liability for the consequences of their software’s failure. If machines can be built with the ability to make critical decisions, it is important to have a clear idea of who will be held accountable for the actions of the machine.

Autonomous driverless cars represent an interesting example of the inception for such decisions to be made in the future. Florida, Nevada, Michigan, and D.C. states have also passed laws allowing autonomous cars driving on their streets in some capacity. The question to how autonomous cars might lead to the change of the liability and ethical rights stands upon software ethical settings that might control self-driving vehicles to prioritize human lives over financial or property loss. The numerous ethical dilemmas revolving around autonomous cars choosing to save passengers over saving a child’s life could arise. The lawmakers, regulators and standards organizations should develop concise legal principles upon which such ethical questions will be addressed by defining a liable entity.

Turing, one of the fathers of modern computer science and artificial intelligence, envisioned a world in which machines could be designed to think independently and solve problems. Modern scientists still share Turing’s vision. It is this vision that inspires countless mathematicians and developers around the world to continue on designing better software applications with greater capabilities. The scientific community and the society at large, have several positive expectations concerning artificial intelligence and the potential benefits humankind could reap from its development. Intelligent machines have the potential to make our daily lives easer as well as unlock mysteries that cannot be solved by human ingenuity. They also have the potential to end the dominance of human beings on this planet. The need for law to be reformed with regard to artificial intelligence is apparent. As the world heads into the next scientific era with both excitement and fear, the law must find a way to adjust the new circumstances created by machines that can think. As we involve artificial intelligence more in our lives and try to learn about its legal implications, there will undoubtedly be changes needed to be applied.

Dean Mai 12/15/16 Dean Mai 12/15/16

Patents in an era of artificial intelligence

The fuzziness of software patents’ boundaries has already turned the ICT industry into one colossal turf war. The expanding reach of IP has introduced more and more possibilities for opportunistic litigation (suing to make a buck). In the US, two-thirds of all patent law suits are currently over software, with 2015 seeing more patent lawsuits filed than any other year before.

“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”
― George Bernard Shaw

Just in the last month, headlines about the future of artificial intelligence (AI) were dominating most of the technology news across the globe:

On 15 November, OpenAI, a research company in San Francisco, California, co-founded by entrepreneur Elon Musk, announced their partnership with Microsoft to start running most of their large-scale experiments on Microsoft’s open source deep learning software platform, Azure;
Two weeks later, Comma.ai open sourced its AI driver assistance system and robotics research platform;
On 3 December, DeepMind, a unit of Google headquartered in London, opened up its own 3D virtual world, DeepMind Lab, for download and customization by outside developers;
Two days later, OpenAI released a ‘meta-platform’ that enables AI programs to easily interact with dozens of 3D games originally designed for humans, as well as with some web browsers and smartphone apps;
A day later, in a keynote at the annual Neural Information Processing Systems conference (NIPS) Russ Salakhutdinov, director of AI research at Apple, announced that Apple’s machine learning team would both publish its research and engage with academia;
And on 10 December, Facebook announced to open-source their AI hardware design, Big Sur

What’s going on here? In the AI field, maybe more than in any other, the research thrives directly on open collaboration—AI researchers routinely attend industry conferences, publish papers, and contribute to open-source projects with mission statements geared toward the safe and careful joint development of machine intelligence. There is no doubt that AI will radically transform our society, having the same levels of impact as the Internet has since the nineties. And it has got me thinking that with AI becoming cheaper, more powerful and ever-more pervasive, with a potential to recast our economy, education, communication, transportation, security and healthcare from top to bottom, it is of the utmost importance that it (software and hardware) wouldn’t be hindered by the same innovation establishment that was designed to promote it.

System glitch

Our ideas are meant to be shared—in the past, the works of Shakespeare, Rembrandt and Gutenberg could be openly copied and built upon. But the growing dominance of the market economy, where the products of our intellectual labors can be acquired, transferred and sold, produced a system side-effect glitch. Due to the development costs (of actually inventing a new technology), the price of unprotected original products is simply higher than the price of their copies. The introduction of patent (to protect inventions) and copyright (to protect media) laws was intended to address this imbalance. Both aimed to encourage the creation and proliferation of new ideas by providing a brief and limited period of when no one else could copy your work. This gave creators a window of opportunity to break even with their investments and potentially make a profit. After which their work entered a public domain where it could be openly copied and built upon. This was the inception of open innovation cycle—an accessible vast distributed network of ideas, products, arts and entertainment - open to all as the common good. The influence of the market transformed this principle into believing that ideas are a form of property and subsequently this conviction yield a new term of “intellectual property” (IP).

Loss aversion

“People’s tendency to prefer avoiding losses to acquiring equivalent gains”: it’s better to not lose $10 than to find $10 and we hate losing what we’ve got. To apply this principle to intellectual property: we believe that ideas are property; the gains we gain from copying the ideas of others don’t make a big impression on us, however when it’s our ideas being copied, we perceive it as a property loss and we get (excessively) territorial. Most of us have no problem with copying (as long as we’re the ones doing it). When we copy, we justify it; when others copy, we vilify it. So with the blind eye toward our own mimicry and propelled by faith in markets and ultimate ownership, IP swelled beyond its original intent with broader interpretations of existing laws, new legislation, new realms of coverage and alluring rewards. Starting in the late nineties, in the US, a series of new copyright laws and regulations began to be shaped (Net Act of 1997, DMCA of 1998, Pro-IP of 2008, The Enforcement of Intellectual Property Rights Act of 2008) and many more are in the works (SOPA, The Protect IP Act, Innovative Design Protection and Piracy Prevention Act, CAS “Six Strikes Program”). In Europe, there is currently 179 different sets of laws, implementing rules and regulations, geographical indications, treaty approvals, legal literature, IP jurisprudence documents, administered treaties and treaty memberships.

In the patents domain, technological coverage to prevent loss aversion made the leap from physical inventions to virtual ones, most notably—software.

Rundown of computing history

The first computer was a machine of cogs and gears, and became practical only in the 1950s and 60s with the invention of semi-conductors. Forty years ago, (mainframe-based) IBM emerged as an industry forerunner. Thirty years ago, (client server-based) Microsoft leapfrogged and gave ordinary people computing utility tools, such as word-processing. As computing became more personal and the World-Wide-Web turned Internet URLs into web site names that people could access, (internet-based) Google offered the ultimate personal service, free gateway to the infinite data web, and became the new computing leader. Ten years ago, (social-computing) Facebook morphed into a social medium as a personal identity tool. Today, (conversational-computing) Snap challenges Facebook as-Facebook-challenged-Google-as-Google-challenged-Microsoft-as-Microsoft-challenged-IBM-as-IBM-challenged-cogs-and-gears.

History of software patenting

Most people in the S/W patent debate are familiar with Apple v. Samsung, Oracle v. Google with open-source arguments, etc., but many are not familiar with the name Martin Goetz. Martin Goetz filed the first software patent in 1968, for a data organizing program his small company wished to sell for use on IBM machines. At the time, IBM offered all of their software as a part of the computers that they sold. This gave any other competitors in the software space a difficult starting point: competitors either offered their own hardware (HP produced their first computer just 2 years earlier) or convince people to buy software to replace the free software that came with the IBM computers.

Martin Goetz was leading a small software company, and did not want IBM to take his technological improvements and use the software for IBM's bundled programs without reimbursement, so he filed for a software patent. Thus, in 1968, the first software patent was issued to a small company, to help them compete against the largest computer company of the time. Although they had filed a patent to protect their IP, Goetz's company still had a difficult time competing in a market that was dominated by IBM, so they joined the US Justice Department's Anti-Trust suit against IBM, forcing IBM to un-bundle their software suite from their hardware appliances.

So the beginning of the software industry started in 1969, with the unbundling of software by IBM and others. Consumers had previously regarded application and utility programs as cost-free because they were bundled in with the hardware. With unbundling, competing software products could be put on the market because such programs were no longer included in the price of the hardware. Almost immediately, the software industry has emerged. On the other hand, it was quickly evident that some type of protection would be needed for this new form of intellectual property.

Unfortunately, neither copyright law nor patent law seemed ready to take on this curious hybrid of creative expression and functional utility. During the 1970s, there was total confusion as to how to protect software from piracy. A few copyrights were issued by the Copyright Office, but most were rejected. A few software patents were granted by the PTO, but most patent applications for software-related inventions were rejected. The worst effect for the new industry was the uncertainty as to how this asset could be protected. Finally, in 1980, after an extensive review by the National Commission on New Technological Uses of Copyrighted Works (CONTU), Congress amended the Copyright Act of 1976 to cover software. It took a number of important cases to resolve most of the remaining issues in the copyright law, and there are still some issues being litigated, such as the so-called “look and feel”, but it appears that this area of the law is quite well understood now. For patents, it took a 1981 Supreme Court decision, Diamond v. Diehr, to bring software into the mainstream of patent law. This decision ruled that the presence of software in an otherwise patentable technology did not make that invention unpatentable. Diamond v. Diehr opened the door for a flood of software-related patent applications. Unfortunately, the PTO was not prepared for this new development, and in the intervening years they have issued thousands of patents that appear to be questionable to the software industry. It took a few years after 1981 for the flow of software-related applications to increase, and then there was some delay because of the processing of these applications. Now the number of infringement case is on the rise.

The transition from physical patents to virtual patents was not a natural one. In its core, a patent is a blueprint for how to recreate an invention; while (the majority of) software patents are more like a loose description of something that would look like if it actually was invented. And software patents are written in the broadest possible language to get the broadest possible protection - the vagueness of these terms can sometimes reach absurd levels, for example “information manufacturing machine” which covers anything computer-like or “material object” which covers… pretty much everything.

What now?

35 U.S.C. 101 reads as follows:

“Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirement of this title.”

When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the technology is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter. Since it became widespread and commercially valuable, it has been highly difficult to classify software within a specific category of intellectual property protection.

Attempts are usually made in the field of software technology to combine methods or means used in different fields or apply them to another field in order to achieve an intended effect. Consequently, combining technologies used in different fields and applying them to another field is usually considered to be within the exercise of an ordinary creative activity of a person skilled in the art, so that when there is no technical difficulty (technical blocking factor) for such combination or application, the inventive step is not affirmatively inferred unless there exist special circumstances, such as remarkably advantageous effects. Software is not a monolithic work: it possesses a number of elements that can fall within different categories of intellectual property protection.

In Israel, legal doctrines adapt to changes in innovative technological products and the commercial methods that extend this innovation to the marketplace. The decision issued by the Israeli Patent Registrar in the matter of Digital Layers Inc confirms the patentability of software-related inventions. The Registrar ruled that the claimed invention should be examined as a whole and not by its components, basing his ruling on the recent matter of HTC Europe Co Ltd v. Apple Inc, quoting:

"…It causes the device to operate in a new and improved way and it presents an improved interface to application software writers. Now it is fair to say that this solution is embodied in software but, as I have explained, an invention which is patentable in accordance with conventional patentable criteria does not become unpatentable because a computer program is used to implement it…"

After Alice Corp. v. CLS Bank International, if the technology does fall within one of the categories, it must then be determined whether the technology is directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), and if so, it must additionally be determined whether the technology is a patent-eligible application of the exception. If an abstract idea is present in the technology, any element or combination of elements must be sufficient to ensure that the technology amounts to significantly more that the abstract idea itself. Examples of abstract ideas include fundamental economic practices (comparing new and stored information and using rules to identify options in SmartGene); certain methods of organizing human activities (managing game of Bingo in Planet Bingo v. VKGS and user interface for mean planning in Dietgoal Innovation vs. Bravo Media); an idea itself (store and transmit information in Cyberfone); and mathematical relationship/formulas (updating alarm limits using a mathematical formula in Parker v. Flook and generalized formulation of computer program to solve mathematical problem in Gottschalk v. Benson). The technology cannot merely amount to the application or instructions to apply the abstract idea on a computer, and is considered to amount to nothing more than requiring a generic computer system to merely carry out the abstract idea itself. Automating conventional activities using generic technology does not amount to an inventive concept as these simply describes “automation of a mathematical formula/relationship through use of generic computer function” (OIP Technologies v. Amazon). The procedure of the invention using an existing general purpose computer do not purport to improve any another technology or technical field, or to improve the functioning of a computer itself and do not move beyond a general link of the use of an abstract idea to a particular technological environment.

The Federal Circuit continues to refine patent eligibility for software

Following the Supreme Court’s decision in Alice v. CLS Bank, the court of appeals in Ultramercial v. Hulu reversed its prior decision and ruled that the claims were invalid under 35 U.S.C. § 101. Following the two-step framework outlined in Alice, Judge Lourie concluded that the claims were directed to an abstract idea.
The Federal Circuit’s decision in Digitech Image Techs. v. Electronics for Imaging illustrated the difficulty many modern software implemented inventions face. If a chemist were to invent a mixture of two ingredients that gives better gas mileage, it is hard to imagine that a claim to such a mixture would receive a § 101 rejection. Yet, when to elements of data are admixed to produce improved computational results, the court are quick to dismiss this as a patent-ineligible abstraction. The real problem Digitech faced was that both data elements were seen as being abstractions: one data type represented color information (an abstraction) and the other data type represented spatial information (another abstraction).
DDR Holdings v. Hotels.com, a 2014 Federal Circuit decision, provides a good discussion of a patent-eligible Internet-centric technology. In applying the Mayo/Alice two-part test, the court admitted it can be difficult sometimes to distinguish “between claims that recite a patent-eligible invention and claims that add too little to a patent-ineligible abstract concept”.
Content Extraction v. Wells Fargo Bank gives a roadmap to how the Court of Appeals for the Federal Circuit will likely handle business method patents in the future. First, if the manipulation of economic relations are deemed present, you can be sure that any innovative idea with the economic realm will be treated as part of the abstract idea. Essentially, no matter how clever an economic idea may be, that idea will be branded part of the abstract idea problem, for which there can be only one solution, and that is having something else innovative that is not part of the economic idea. Practically speaking, this means the technology needs to incorporate an innovative technology improvement that makes the clever economic idea possible.

So the fuzziness of software patents’ boundaries has already turned the ICT industry into one colossal turf war. The expanding reach of IP has introduced more and more possibilities for opportunistic litigation (suing to make a buck). In the US, two-thirds of all patent law suits are currently over software, with 2015 seeing more patent lawsuits filed than any other year before. Of the high-tech cases, more than 88% involved non-practicing entities (NPEs). These include two charmlessly evolving species who’s entire business model is lawsuits—patent trolls and sample trolls. These are corporations that don’t actually create anything, they simply acquire a library of intellectual property rights and then litigate to earn profits (and because legal expenses are millions of dollars, their targets usually highly motivated to settle out of court). And the patent trolls are most common back in the troubled realm of software. The estimated wealth loss in the US alone is $500,000,000,000 (that’s a lot of zeros).

Technology conversion and open innovation

For technological companies, conversion and the advance of open source approach, driven largely by collaborative processes introduced by GitHub, Google's Android, Apple’s Swift and most recently by Microsoft joining Linux Foundation, has created a systematic process for innovation which is increasing software functionality and design. 150 years ago, innovation required a dedicated team spending hours in a lab, extensively experimenting and discovering “10,000 ways not to make a light-bulb”, before finding one that worked. Today, innovation has gained a critical mass as technology and users’ feedback are combined to give a purposeful team the ability to find 10,000 ways not to do something in a matter of hours, with the right plan in place. Today, a development team can deliver a product in a matter of months and test it in such a way that customer responses are delivered to the right development team member directly with the feedback being implemented and a system being corrected (almost) in real-time. The life of a software today patent is still 20 years from the date the application was filed. The patent system, that has existed since 1790, is not equipped to handle this new technology and there is a need to establish an agile, sui generic, short-cycle— three to five years—form of protection dedicated solely to software protection. As patents play an essential role in market-centred systems of innovation, patent exclusivity criteria should be redesigned more systematically to reflect the ability of software patents to foster innovation and to encourage technology diffusion.

The belief in intellectual property has grown so dominantly it has pushed the original intent of patents out of public consciousness. But that original purpose is right there, in plain sight—the US Patent Act of 1790 reads “An Act to promote the progress of useful Arts”. However, the exclusive rights this act introduced were offered in sacrifice for a different purpose - the intent was to better the lives of everyone by incentivizing creativity and producing a rich pool of knowledge open to all—but exclusive rights themselves came to be considered the only point, so they were expanded exponentially, and the result hasn’t been more progress or more learning, but more squabbling and more legal abuse. AI is entering the age of daunting problems—we need the best ideas possible, we need them now, and we need them to spread as fast as possible. The common meme was overwhelmed by exclusivity obsession and it needs to spread again, especially today. If the meme prospers, our laws, our norms, and our society—they will all transform as well.