Dean Mai 8/5/25 Dean Mai 8/5/25

LLM-Hardened DSLs for Probabilistic Code Generation in High-Assurance Systems

When LLMs demonstrated competitive performance on coding benchmarks such as HumanEval, MBPP, and DS-100, it marked more than a productivity shift—it introduced a new modality of code generation: from causally reasoned programs to probabilistically inferred token sequences. However, these benchmarks primarily evaluate syntactic plausibility, not semantic correctness, operational reliability, or compliance with safety-critical invariants. Unlike human developers, LLMs lack intrinsic grounding in consequence, executional context or intent. We now write code alongside systems that don’t reason the way we do. That distinction becomes critical when LLM-generated code is deployed into high-stake environments with real-world impact. We need language and tooling principles that assume collaboration with probabilistic, interpretable-but-unreliable generative systems. And I believe that starts with LLM-hardened domain-specific languages (DSLs). Not as legacy artifacts, but as first-class safety interfaces that encode formal verification primitives, constrain generation entropy, and provide deterministic guarantees within probabilistic synthesis pipelines.

The DSL is dead. Long live the LLM-hardened DSL

What Is a DSL? A DSL is a programming or specification language tailored to a well-scoped domain, trading Turing completeness and generality for domain-relevant expressiveness, verifiability, and semantic clarity. Traditional DSLs are human-centric by design, emphasizing readability, concise syntax, and an assumed author with domain knowledge and explicitly modeled intent.

What Makes a DSL Hardened? An LLM-hardened DSL is a DSL intentionally co-designed with the assumption that: a non-deterministic agent (LLM) will co-author its expressions; generation is statistical, not causal; verification must be embedded, not post-hoc; and syntax and semantics must actively constrain generation entropy.

Most DSLs today were designed for a pre-generative context, optimized for human authorship, interpretability, and manual intent modeling, not for resilience against hallucinated or invalid code paths. Traditional DSLs assume a human programmer with intention, discipline, and domain expertise. But when LLMs write or assist in generating code, we are no longer dealing with conscious authorship. We’re interacting with inference traces from models that were not trained with safety as a first-class concern, and that operate in a fundamentally non-causal paradigm.

LLM-hardened DSLs must invert this assumption. They must be constructed from first principles with the presupposition that a large fraction of their instantiations will be proposed by systems that generate code through statistical pattern matching rather than explicit logical reasoning. This changes the entire notion of what a DSL should do. It is no longer merely a constrained vocabulary for expression but becomes an interlingua between human intent, machine reasoning, and executable verification. This triadic role mirrors compiler front-ends, which bridge human-readable code and low-level execution—except here, the compiler must also account for probabilistic corruption at the input layer.

DSLs in this context become adversarial surfaces, or defensive interfaces. The job of the language designer is to anticipate how a non-deterministic code generator will interact with syntax, grammar, semantics, and constraints, and to build a language that fails loudly and early, or—better—can encode its own rejection of invalid or unsafe constructs as part of its canonical form.

Why LLMs and DSLs collide in safety-critical applications

When LLMs are applied in enterprise infrastructure, defense operations, or regulated healthcare systems, the problem is not just one of capability but trust. Current LLMs have limited ability to assess the correctness of their own outputs, a challenge that remains largely unsolved despite ongoing research in model calibration and uncertainty quantification; in high-assurance environments, we must treat the LLM as an unreliable but prolific code contributor, and design DSLs that reject, constrain, or verify their outputs in real time.

In low-risk applications, this epistemic instability may be tolerable. In safety-critical systems, it poses unacceptable risks. If an LLM misinterprets the syntax used in an air traffic control configuration file, or if it hallucinates a plausible but invalid construct in a radiation therapy machine’s scripting interface, the result is not merely a functional bug but a potential system-level failure with real-world consequences.

Deploying LLMs as reliable synthesizers or verifiers in high-assurance systems assumes a level of intentionality and semantic rigor that current models do not guarantee. We cannot pretend that verification is a solved problem simply because the model outputs code that looks correct.

Instead, we must redesign the interface between language models and software artifacts. This is where LLM-hardened DSLs enter the picture as active safety boundaries. The objective is not to make LLMs perfect, but to architect language surfaces where their imperfection is detectable, isolatable, and recoverable.

Shaping the language surface for probabilistic synthesis

One of the under-explored ideas in modern AI-assisted programming is that the shape of a language’s syntax and semantics can either absorb or reject probabilistic error introduced by LLM generation. DSLs that support overgeneralized constructs, ambiguous grammars, or semantically overloaded operators act as latent error amplifiers under probabilistic synthesis.

Conversely, a DSL that has been shaped explicitly to constrain the expressive entropy of LLM outputs (i.e., the unpredictability and variability of legal syntax structures available at each token generation step) can act as a regularizer on the synthesis process itself, biasing the model toward verifiable outputs. This is not the same as making a language simple. It is making it predictably constraining. For instance, rather than allowing arbitrary expressions with conditionals, loops, and user-defined functions, one might require a fixed template structure with explicitly enumerated control flows and a limited, closed vocabulary. This constrains the LLM’s expression space to paths that are both parseable and interpretable by humans and verifiable by machine.

Moreover, the syntactic idioms of the language can be designed to guide the LLM toward semantically safe constructions.

Consider the idea of syntactic redundancy—repeating critical information in multiple forms to allow for local consistency checks. Or semantic mirroring, where function signatures encode invariant properties that can be checked statically. These patterns form a class of syntactic fault-tolerance mechanisms, aimed at biasing LLM generation toward provably consistent, locally checkable code paths. While largely redundant for expert human authors, such patterns become instrumental heuristics in detecting, mitigating, or preempting LLM-induced semantic noise.

Interfacing verification with code generation

In the conventional software stack, verification is a downstream process. Code is authored, then verified. In the LLM+DSL co-generation regime, this separation is no longer viable. Generation must be steered by verification at every step. This doesn’t mean post-hoc testing. It means building DSLs that embed verification constraints into their generative affordances.

One approach is to treat DSL generation as a type of constrained decoding problem. Here, the LLM is not merely generating free-form syntax but is conditioned on partial parses, semantic constraints, and type-level invariants. The DSL is equipped with a partial evaluator or interpreter that can provide real-time feedback about the validity of partial generations. In other words, the DSL functions both as a programming language and as a speculative execution partial evaluator.

The architecture required to support this is non-trivial. It involves tight coupling between the DSL runtime, the LLM inference loop, and a layer of constraint satisfaction logic that acts as a referee. But this architecture pays for itself in domains where the cost of an invalid output is measured in millions of dollars or irreversible damage.

While fully integrated verification-aware LLM generation remains a research frontier, practical partial implementations are already feasible. For instance:

Function-calling interfaces (e.g., OpenAI tools, LangChain) allow partial syntax trees to be verified before continuation.
Constraint-programming hooks (e.g., using z3 or PyDantic) can enforce schema-level bounds during decoding.
Symbolic execution engines (e.g., Rosette, KLEE) can serve as speculative interpreters over LLM-generated partial code.

These tools can serve as a scaffold toward hardened co-generation architectures. While most current techniques impose constraints reactively or in bounded feedback loops, the goal is to move toward proactive, generation-steering verification primitives.

Architectural invariants

An LLM-hardened DSL is not simply a narrow language with some prompts designed for Copilot or ChatGPT. It is a co-designed artifact (both syntactic and semantic) that is engineered from the ground up to interface with, be interpreted by, and provide defense mechanisms against the probabilistic reasoning capabilities and failure modes of LLMs.

There are five architectural invariants that define an LLM-hardened DSL for high-assurance domains.

The first invariant is semantic anchoring. Every construct in the DSL should have a well-defined operational semantics that maps to a verifiable execution model. LLMs trained to generate or transform programs in this DSL must have access to semantic embeddings that bind natural-language descriptions to these semantics in a differentiable way. This enables automated verification of generated code against formal specifications.

The second invariant is latent affordance encoding. Each syntactic unit must expose its latent affordances—the set of valid operations, transformations, and compositions—in a machine-readable and generatively accessible form. For LLMs, this enables contextual pruning of the completion space. Instead of sampling blindly from the language model’s posterior, the LLM is guided by affordance constraints embedded in the DSL grammar, potentially via structured decoding mechanisms or finetuned autoregressive adapters.

The third invariant is counterfactual robustness. The DSL must provide constructs for generating adversarial counterfactuals that expose semantic edge cases. These counterfactuals are then used to finetune or evaluate the LLM’s ability to recognize and avoid unsafe interpretations. This includes support for meta-programmatic reflection and anti-pattern enumeration; facilities that allow both static and dynamic verification engines to stress-test the generative model across the full spectrum of operational risk.

The fourth invariant is interactive executability. Every DSL construct must be live-evaluable in a REPL-like environment that logs execution traces, exceptions, type violations, and runtime invariants. This enables reinforcement learning from human feedback (RLHF) not only on textual feedback but on executional artifacts. LLMs can learn to avoid paths that result in runtime divergence or logical violations by conditioning on these trace embeddings.

The fifth invariant is transparency in semantic provenance. LLMs must be able to annotate their generated DSL code with source-level justifications. These justifications are not merely comments but structured logical proofs or probabilistic rationales embedded in the DSL as attachable metadata. This transforms the generated code from a black-box artifact into a legible, auditable, and epistemically traceable object.

Compiler infrastructure and model co-training

The compiler for an LLM-hardened DSL extends beyond traditional compilation. It acts as a multi-modal interface between syntax trees, semantic rules, and LLM decoding heuristics. It must support co-training with LLMs, feeding back execution traces, logical proofs, counterexamples, and natural-language rationales into the LLM’s training loop.

Moreover, the compiler itself must be instrumented to operate as a critic in a model-critic training paradigm. When an LLM proposes a program in the DSL, the compiler checks for type soundness, logical validity, safety properties, and semantic alignment with the domain. Violations are not discarded but transformed into training signals: counterexamples, corrections, or prompts for clarification.

In some architectures, the compiler and the LLM are coupled via latent embeddings. The DSL constructs are tokenized into a hybrid space where syntax trees and semantic graphs are represented jointly. The compiler then exposes these hybrid embeddings as attention anchors during decoding, steering the LLM toward compliant completions. In effect, the compiler becomes a structural prior on the LLM’s generative process.

Defense against adversarial use

A key motivation behind LLM-hardened DSLs is resilience to adversarial misuse. LLMs can be prompted, fine-tuned, or attacked into generating malicious payloads, unsafe control logic, or deceptive outputs. If a DSL is not explicitly hardened, it may become a Trojan vector for domain-specific attacks.

To counter this, hardened DSLs must include adversarial filters, both static and dynamic, capable of detecting unusual patterns of program generation. These filters must be interpretable, compositional, and themselves verifiable. Additionally, DSLs must support taint propagation and trust tagging, allowing every line of generated code to carry a signature of its generative origin, including prompt lineage and training context.

Furthermore, DSL compilers can incorporate semantic kill switches—constructs that self-invalidate under adversarial reinterpretation. For example, if a generated program attempts to bypass a safety invariant by constructing a semantically equivalent but syntactically evasive structure, the DSL runtime must detect this via abstract interpretation and halt execution, triggering a forensic traceback and LLM reprimand.

Current technical approaches

LLM-hardened DSLs currently focus on integrating domain-specific syntactic and semantic constraints with large language model outputs, using techniques like output restriction, grammar-based constraints, validation services, and co-evolution strategies to ensure LLMs produce valid and safe DSL outputs.

Grammar-based output constraints: Solutions such as TypeFox's Langium AI and and similar grammar-constrained frameworks derive BNF-style grammars from existing DSL definitions to restrict LLM token output, so the model can only generate syntactically valid constructs for the target DSL. This hardens the interaction by embedding strict output rules directly into the generation process, reducing hallucination and structural errors.

Validation and evaluation layers: Hardened DSL stacks use parser and static analysis services (often from the original DSL toolchain) as an external validation/evaluation layer. Every LLM-generated snippet is passed through DSL validations (checking syntax, context, and semantic correctness), and errors trigger rejection or regeneration, preventing malformed programs.

Guardrail DSLs: For building complex AI guardrails, platforms like NVIDIA’s Nemo-Guardrails introduce meta-DSLs (e.g., Colang), which define permissible input/output flows, prompt patterns, and procedural logic for safely channeling LLM behavior. These infrastructures let developers codify security, compliance, and interaction constraints programmatically, preventing LLMs from deviating outside safe bounds at runtime.

Co-evolution/refactoring via LLMs: Recent studies evaluate using LLMs themselves to update, migrate, or refactor DSL programs when the language definition evolves. These techniques use multiple runs, cross-validation, and grammar conformance checks to automate safe DSL instance transformations.

Domain-optimized model integration: Some solutions propose fine-tuning or grounding LLMs with structured, domain-specific datasets, including large corpora of valid DSL snippets and rules, to improve context awareness. Retrieval-augmented generation (RAG) pulls validated DSL exemplars at inference time for in-context priming, further hardening output.

Splitting and parsing utilities: Tools split code at valid DSL boundaries and only process/generate at safe locations, preventing LLMs from introducing cross-statement or cross-file errors.

These approaches are typically composable and modular, allowing fast updates as LLM or DSL specifications evolve and ensuring that domain experts control the language’s evolution and safety measures. Challenges remain with scalability (on large DSL corpora or complex grammars) and achieving seamless LLM understanding of rapidly changing or esoteric DSLs, but the technical stack is maturing quickly in 2025.

Case studies in hardened DSL design

Let’s look at a few illustrative but grounded examples to demonstrate what LLM-hardened DSLs might look like in practice.

Case 1: Medical Device Scripting Language. A DSL for configuring radiation therapy treatment plans must ensure that all generated scripts are statically analyzable for dosage, timing, and beam shape safety. The DSL forbids loops, recursion, or dynamic expressions. All parameters are declared upfront and cannot be redefined. Each function has embedded constraints in its signature that are enforced during decoding. Any deviation results in hard rejection of the code. The LLM can still assist, but the DSL acts as a sandboxed, unambiguous compiler for intention, similar to how the ARIA scripting system constrains input for Varian linear accelerators.

Case 2: Defense Command-and-Control Orchestration Language. In defense C2 systems that integrate cyber and kinetic command sequences, a DSL used to specify joint operations between these systems must enforce synchronization constraints and prevent privilege escalation. Each command in the language includes provenance metadata and a set of preconditions. A hardened C2 DSL would run a verification partial evaluator in lockstep with LLM decoding, rejecting unsafe escalation paths before they are ever materialized. For example, a cyber effect that could blind a radar system must declare an override dependency on a prior IFF-confirmation steps. Similar patterns are observable in DARPA’s STAC and HACMS programs, which emphasize formal methods for mission assurance in adversarial environments.

Case 3: Financial Smart Contract DSL. In smart contract environments (especially in decentralized finance) the DSL must allow LLMs to generate contract templates that are both legally sound and cryptographically verifiable. Each clause of the DSL maps to a formal logic representation, and the LLM’s outputs are parsed not only for syntax but for logical consistency. The language is designed to disallow any expression that does not fully instantiate all required clauses, and embedded within it is a proof checker that validates contract soundness before execution. Existing systems like Pact (Kadena) and Scilla (Zilliqa) offer early models of this hardened co-synthesis pattern.

Strategic implications and industry failure modes

Organizations that deploy LLMs in high-assurance systems without constraint-aware interfaces or hardened DSLs risk committing a category error: treating stochastic token generation as if it were intentional, verifiable authorship. Compounding the risk, many organizations still view DSLs as an artifact of legacy systems—useful only for internal scripting or as an afterthought to mainline software but not as core infrastructure. In the LLM era, this framing is not only outdated, it’s strategically risky in any domain requiring guarantees of correctness, traceability, or compliance.

Trust Surface Tiering Framework. Organizations should classify LLM-augmented systems by their required trust surface:

Tier 1: Low-stakes (e.g., marketing copy, UI layout); prompt templates + light validation sufficient.
Tier 2: Moderately sensitive (e.g., internal tooling, analytics pipelines); require generation constraints + schema verifiers
Tier 3: Safety- or compliance-critical (e.g., healthcare, defense, finance); require LLM-hardened DSLs, embedded verification, and deterministic fail-safes

Mistaking a Tier 3 system for a Tier 1 problem is a governance issue.

A promising path forward is not to over-train LLMs toward trustworthiness, but to reduce the surface area of trust required. Hardened DSLs provide this trust boundary. Organizations operating in high-trust environment must consider DSL engineering as a core competency: hiring PL designers, building real-time verifiers, and enforcing generative guardrails at runtime. LLMs are powerful collaborators, but ungoverned generation in high-assurance domains constitutes a systemic safety risk, not a mere operational flaw. In any domain where correctness, safety, legality, or accountability matter, the LLM is not the primary interface but the hardened DSL as the contract boundary. The LLM becomes a constrained synthesis engine operating within a mathematically-bounded space, not a freeform code generator. This design pattern allows LLMs to scale into high-trust environments without necessitating their own verification (which remains a pipe dream).

The failure mode here is over-trusting LLMs in unconstrained programming environments. The mitigation is to elevate DSLs from scripting conveniences to strategic infrastructure; formally shaping what the LLM is allowed to generate, verify, and execute. This requires rebalancing investments—from model inference infrastructure to: language and constraint system engineering, Inline verification tooling, and Simulation environments that catch failures during generation. And above all, it requires continued acknowledgment that even the most capable LLM remains a non-causal, statistical inference engine, not a safe actor.

The future of co-evolution of DSLs and LLMs

We are entering an era where programming languages and large language models are co-design constraints, not independent systems. Just as the architecture of microprocessors shaped C, and the asynchronous browser runtime shaped JavaScript, LLM-centric tooling now necessitates a new class of languages designed for synthesis, safety, and semantic interpretability. We are already observing the emergence of LLM-native language families optimized for generative constraints:

Syntax-constrained DSLs: Minimal surface languages with token-regular grammars and deterministic token transitions to reduce generative entropy (e.g., JSON schema-guided prompts, DSLs with finite production rules).
Proof-carrying DSLs: Languages where each expression is associated with statically checkable invariants or formal contracts (e.g., dependent-typed mini-languages, theorem-prover embedded DSLs).
Dual-evaluation DSLs: Systems that execute both syntax-level and semantics-level evaluations concurrently, enabling speculative execution during generation (e.g., partial evaluators with rollback-on-violation semantics).

The LLM-hardened DSL represents the first concrete instantiation of this new class. it functions as a generative firewall, a runtime constraint evaluator, and a semantic integrity scaffold by assuming that generation is stochastic, intent cannot be reliably inferred, post-hoc verification is too late, and responds by encoding safety and correctness into the language structure itself.

If this sounds like a conservative design philosophy, it is. But it is not reactionary. This approach doesn’t limit innovation; it makes stochastic systems compatible with deterministic requirements. LLMs are not going away. They are already being deeply integrated into critical infrastructure. A critical question facing the industry is whether we will meet this moment with system design that anticipates their limits, or continue to assume safety from pattern-matching agents.

Dean Mai 6/27/25 Dean Mai 6/27/25

Institutional Context as the Missing Layer in Agent Systems

In the unfolding narrative of artificial intelligence, the idea of agents, autonomous software entities capable of perceiving, reasoning about, and acting in complex environments, has largely centered on model capabilities, orchestration frameworks, and function-call interoperability. Yet, with progress on foundational LLM architectures and agent orchestration platforms, an often overlooked component risks hindering large-scale, multi-agent systems: institutional knowledge.

In the unfolding narrative of artificial intelligence, the idea of agents, autonomous software entities capable of perceiving, reasoning about, and acting in complex environments, has largely centered on model capabilities, orchestration frameworks, and function-call interoperability. Yet, with progress on foundational LLM architectures and agent orchestration platforms, an often overlooked component risks hindering large-scale, multi-agent systems: institutional knowledge. A tacit, contextual, and procedurally embedded know-how that lives in the heads of people and in the informal practices of organizations. Treating institutional knowledge as a first-class construct in the software stack, something that agents carry, share, and reason over, is the core substrate of effective agentic reasoning. Without this layer, multi-agent systems will remain brittle, limited in scope, or stuck at the proof-of-concept purgatory. With it, they can evolve into resilient, interoperable infrastructures that encode not only procedures but organizational judgment. Call it OrgMem Layer.

The blind spot in current agent architectures

When developers discuss AI agent interoperability, they often borrow mental models from web services or microservice architectures: agents are treated as stateless endpoints with well-defined APIs. They expose functions, accept inputs, return outputs. By wiring these together with orchestration engines, developers achieve workflows that look impressive in demos.

While technically functional, this view is deeply reductive. It obscures the rich, implicit protocols that govern human coordination—rules not found in process docs, but in daily conversations, unspoken escalation paths, and learned risk tolerances, passed along through mentorship and apprenticeship. In a human team, hand-offs carry not just the raw data but a set of expectations: when to notify a manager, what risk levels are tolerable, which stakeholders must be informed, and what tone to adopt with external parties. Current agent systems replicate the form of workflows but not their function. They follow the letter of the protocol but miss the hidden organizational context.

As enterprises adopt agentic workflows, they initially mirror human processes by encoding business rules in prompts, integrating vector databases for contextual retrieval, or adding audit logs for regulatory compliance. Yet these efforts stop at structural fidelity. They capture documents, templates, and transactional logs, but they do not internalize the mores of the organization. Though some platforms—like Microsoft Copilot or ServiceNow—attempt to layer context into tasks via retrieval or metadata, they still fall short of embedding role-specific judgment or cross-team memory. Agents today may “call functions,” but they don’t “call judgment.”

Deconstructing 'OrgMem Layer'

Drawing from organizational theory (March & Simon, Karl Weick, Nonaka & Takeuchi) and distributed systems to bridge the gap between rigid workflows and human-like coordination, I see institutional knowledge or OrgMem Layer as the following constituent layers:

Normative frameworks or implicit rules about who speaks to whom and when, acceptable risk thresholds, and unwritten hierarchies of decision-making. These norms govern whether an issue is escalated to a manager or resolved autonomously by a frontline agent, whether external vendors get looped into conversations, and which data classifications demand special handling.
Cultural signifiers where every organization has its own lexicon—metaphors, jargon, or idioms that shape how meaning is constructed. These impact tone, branding, and how agents interpret unstructured input or compose communication.
Procedural memory which, beyond flowcharts and SOPs, comprises the informal heuristics and operational shortcuts accumulated over time: common workarounds, anticipated failure modes, and the informal “best practices” not captured in official documentation.
Relational context covers who trusts whom, which teams or individuals have overlapping responsibilities, and the network of trust and interdependencies that permeate large organizations. Agents that operate blindly across these boundaries risk duplicating work, creating friction, or violating access controls.
Historical precedents or knowledge of past incidents, root-cause analyses, and “lessons learned” from prior projects that inform policy exceptions and operational risk postures. These precedents often reside in case studies, internal wikis, or retained solely in human memory.

Unlike RAG, which recalls facts, institutional context encodes the unspoken norms that guide interpretation and decision-making (RAG can recall that “legal must review vendor X,” but it cannot infer that a similar vendor might also need the same scrutiny). These layers cannot be captured in traditional schema, neatly reduced to a JSON schema or a structured database table. They live in a semantic organizational context comprising narrative fragments, role-based rules, graph structures, and evolving annotations.

Why context-free hand-offs break in practice

Consider an automated workflow that processes vendor invoices. Agent A extracts line items and tags expense categories. Agent B routes them for approval. Technically, the hand-off is clean: schema-compliant data flows from one agent to another.

But what if the vendor has a history of legal exceptions that require pre-review above a spending threshold? Humans would know this through institutional precedent. If Agent A lacks that context, it won’t flag the invoice. Agent B, unaware of legal involvement rules, approves based on surface data. The outcome can be a compliance breach or an operational delay as humans intervene retroactively.

Similar breakdowns can occur in customer support workflows. An agent may escalate a customer query because it detects a high-value account, but it may not know that this particular customer’s account has an ongoing dispute and that communications are handled by a dedicated team. The next agent, uninformed of that nuance, might default to generic templates, damaging the customer relationship.

These failures are not rare fringe events. They reflect systemic brittleness inherent when agents operate on syntactic interoperability (shared formats and protocols) without semantic interoperability (shared meaning and context).

Encoding OrgMem layer as a first-class construct

To bridge these gaps, we must treat institutional knowledge as a discrete data model and runtime abstraction, on par with documents, code artifacts, and user profiles. There are several architectural strategies that support this:

Memory graphs that, unlike conventional knowledge graphs, are dynamic, agent-internal representations where nodes represent people, teams, policies, and heuristics, while edges encode relational context, procedural annotations, and trust signals. They capture organizational directory structures and evolve as agents interact, logging failures, corrective actions, and informal shortcuts (e.g., AriGraph, Zep, G-Memory). Agents can query these graphs to retrieve not just facts but heuristics: “which workflows require legal sign-off when anomalies exceed threshold X?” or “which teams typically handle exceptions of this type?” Implementing memory graphs involves instrumenting conversational exchanges and workflow logs to extract context signals, defining a lightweight schema for tactical annotations (e.g., exception rules, trust levels), and embedding graph embeddings into agent prompts for context-aware decision-making. (Challenges still remain: schema drift, privacy segmentation, and update consensus must be addressed via techniques borrowed from distributed systems—e.g., CRDTs or graph sharding.)
Semantic state transfer objects that build on ideas from distributed systems—data structures that bundle payloads with context metadata (e.g., SHIMI that bridges document-centric RAG and structured context transfer). A state object for an invoice might include fields for raw line items alongside pointers to policy documents, trust scores for the supplier, and escalation flags derived from historical precedent. When Agent A hands this object to Agent B, it provides both data and semantic guidance on how to interpret and process it. Key considerations would be a standardized metadata envelope that can carry variable context fields without schema bloat, versioning mechanisms to ensure agents agree on the semantics of metadata fields, and lightweight cryptographic signing to protect the integrity of context metadata.
Protocol extensions across existing interoperability protocols such as the Multi-Tool Coordination Protocol (MCP) or Agent-to-Agent (A2A) messaging provide function call interfaces, but they can be extended with context headers. These headers serve a role analogous to HTTP headers—conveying routing, authentication, and behavioral hints. By defining standard header fields for context tokens, agents can perform policy checks and trust assessments before executing calls. For example, a header might carry a contextual-integrity-token that encodes a hash of the memory graph sub-graph relevant to the task at hand. Receiving agents can validate the token, fetch the referenced context, and align their behavior accordingly. Over time, this mechanism creates a distributed context bus, where agents negotiate not just on data and task specs but on governance semantics.

From proof-of-concept to enterprise-grade resilience

Proof-of-concepts in innovation labs and startup pilots show initial promise, but scaling this approach in large enterprises tend to raise distinct operational challenges.

Governance of context schemas where organizations need ContextOps teams—similar to data governance councils—that define taxonomies, resolve semantic collisions, and approve context mutations. These teams will be tasked with defining context taxonomies, remediation workflows for outdated or incorrect context, and audit controls for who can inject or modify context metadata managing lifecycle policies for memory graphs and state transfer schemas.

Querying a distributed memory graph or fetching context artifacts can introduce latency, undermining the responsiveness of agent pipelines. Solutions may involve local caching of frequently accessed context fragments, asynchronous pre-fetching based on predictive workload models, and prioritizing context bundles by criticality. These event-sourced design patterns may support efficient replay and rollback of memory states.

Institutional knowledge often (always) encompasses sensitive data—corporate strategies and policies, risk postures, or customer dispute histories. The context infrastructure must enforce fine-grained access controls, encryption at rest and in transit, and token-based access that aligns with enterprise IAM systems (e.g., SCIM, SAML, OAuth2 scopes).

Ultimately, I surmise enterprises will adopt agents from multiple vendors, stitching together multiple AI platforms and custom agent frameworks. Establishing a cross-platform context standard will be critical and requires industry consortia or open-source efforts to define schema registries, serialization formats, and reference implementations (see this Internet-Draft proposal for a formal AI agent protocol stack by IETF). Without this, organizations will just recreate semantic silos, effectively undoing interoperability gains.

Lastly, unlike accuracy metrics on text extraction or classification, evaluating whether an agent correctly interprets institutional context would require new KPIs (reduction in human intervention rate, improvement in first-pass resolution, decline in exception retries etc) to help attribute ROI to context-driven workflows versus baseline automation.

Toward an interoperable future

Agentic systems promise to transform enterprise productivity by automating complex, cross-functional tasks. But without an adequate model of institutional knowledge, these systems will remain brittle, prone to breakdowns whenever they step outside the narrow corridors of pre-defined scenarios. The next generation of agents must not only process data but also interpret norms, negotiate exceptions, and remember precedents. By elevating context to the same level of abstraction as tools, data, and LLM models, we enable agents to reason like seasoned professionals rather than rote processors.

Just as the early Internet scaled by encoding trust, identity, and protocol negotiation into HTTP and TCP/IP, so too must enterprise AI systems encode institutional judgment into their foundation. In the end, agents will only be as capable as the context they collectively share. Institutional knowledge is the core layer without which multi-agent ecosystems cannot scale beyond brittle co-pilots. Treating that knowledge as a first-class construct is the next step that unlocks resilient, interoperable, and enterprise-grade AI at scale.

Dean Mai 5/24/25 Dean Mai 5/24/25

Toward Data-Driven Multi-Model Enterprise AI

There is a pattern that time after time emerges in artificial intelligence infrastructure: a new problem appears not as a sudden shock, but as an accumulation of quiet, difficult, and consequential engineering constraints—growing just fast enough to be cumbersome, but not yet painful enough to draw architectural rethinks. That is, until someone re-frames the problem, not as a limitation of the models, but of the software abstraction layers that support them.

There is a pattern that time after time emerges in artificial intelligence infrastructure: a new problem appears not as a sudden shock, but as an accumulation of quiet, difficult, and consequential engineering constraints—growing just fast enough to be cumbersome, but not yet painful enough to draw architectural rethinks. That is, until someone re-frames the problem, not as a limitation of the models, but of the software abstraction layers that support them.

Not Diamond is one such company doing that reframing. Tomás Hernando Kofman, Tze-Yang Tung, and Jeffrey Akiki are building the critical infrastructure for the next wave of enterprise AI: not another foundation model, but the unifying layer that makes heterogeneous model ecosystems viable.

From model monocultures to model markets

When we look at the past few years of AI deployment, what’s notable isn’t just the rise of foundation models, but the tendency for companies to overfit on a single model provider. The appeal is straightforward: a single API, fewer integration paths, and a sense of control. But as the ecosystem matures and competitive models (and their versions) emerge, from OpenAI GPT to Anthropic Sonnet to Meta Llama to proprietary vertical models in healthcare or finance—this monoculture begins to break down. Companies start to ask different questions. Which model is cheapest? Fastest? Easiest to deploy under regulatory constraints? Best at reasoning vs. summarization?

The answer, increasingly, is not “one model” but “a fleet.” This is what Not Diamond enables—first with intelligent routing, and now with Prompt Adaptation. Their vision is a world where models become modular, swappable, and optimizable components within a larger orchestration layer. Not Diamond abstracts over foundation models the way Kubernetes did over physical machines.

Current demand for AI infrastructure has already begun shifting toward intelligent, adaptive orchestration that selects models based on context, and agents that manage prompts across models and providers. This shift is all about pragmatic forces that eventually bend architectures; it’s about cost, uptime, latency, and reliability.

Prompt engineering at scale is a hidden cost center

Prompting may begin as an art—a skill learned through trial, error, and intuition—at a small scale. But when you multiply this task across dozens of models, hundreds of use cases, and thousands of edge cases, it becomes infrastructure debt.

This is the pain point Not Diamond is solving with Prompt Adaptation. The tool is (deceptively) simple: given a prompt and some labeled data, it rewrites that prompt for any other model to maximize accuracy. It does so through automated search over prompt space, combined with iterative evaluation loops of potential prompts. In practice, this automates and improves on what engineering teams are doing manually—re-tuning prompts each time a new model is trialed or swapped, a process that becomes increasingly unsustainable as the number of models grows.

The downstream effect is dramatic. In early deployments with large enterprises such as SAP, prompt adaptation has cut down the process of prompt engineering from weeks to hours. And as AI adoption accelerates, the overhead of managing multiple models will only increase. Companies cannot afford to throw endless human hours at the problem; it’s inefficient, costly, and doesn’t scale.

A layer designed for entropy

There’s a tendency in infrastructure deployment to look for uniformity, predictability and control. But the AI stack is drifting in the opposite direction toward entropy. New models appear weekly, open-weight alternatives grow stronger, and fine-tuned vertical specialists outpace generalists in key domains.

Routing and adaptation become not just features but requirements. The question is no longer “How do I use GPT-4 well?” but “How do I structure my system so that switching from GPT-4 to Claude, or from Claude to a fine-tuned Llama, takes minutes, not months?”

Not Diamond’s system design reflects this: their routing infrastructure doesn’t privilege any one provider. It optimizes across models based on configurable user-defined metrics—accuracy, latency, token cost, carbon footprint, or anything else a team can measure. It’s been operational with some of the largest enterprises for nine months, serving over 100,000 users, and consistently outperforms individual foundation models on major benchmarks. Prompt Adaptation does not aim for a ‘universal’ prompt but a performant, model-specific one. Together, these form a toolkit for adapting to change—both in models and in the incentives around them.

The economics of adaptation

From a purely financial perspective, Not Diamond’s value proposition maps cleanly onto measurable outcomes. Tasks like retrieval-augmented generation (RAG), text-to-SQL conversion, or contract analysis are not optional in enterprise settings, they’re operational workflows. Today, improving model performance on these tasks often comes from trial-and-error prompt engineering, fine-tuning, or vendor switching.

What Prompt Adaptation enables is a new axis of optimization: the ability to redeploy the same workflow across models while maintaining performance and reducing cost. This decouples workload logic from model-specific quirks, much as containerization decoupled app logic from infrastructure quirks. In internal benchmarks, Prompt Adaptation has yielded performance improvements ranging from 5% to 60% on enterprise tasks. Time-to-deployment drops from weeks to hours. This changes the economics of experimentation—it reduces the switching cost across models and allows companies to exploit price/performance arbitrage as the model market evolves. Not Diamond reduces the friction of change.

We believe this is a precursor to the commoditization of model APIs and the rise of orchestration-as-strategy.

From horizontal tools to AI assembly lines

Today’s AI tooling either aims too high (autonomous agents) or too low (helper scripts and wrappers). We believe Not Diamond strikes the right balance: infrastructure that assembles other AI systems. The dual components—routing and prompt adaptation—enable modular workflows.

Enterprises can now compose, test, and optimize across different models, providers, and input formats. In this way, Not Diamond becomes a strategic layer not because it’s smarter than the models it routes, but because it makes them swappable.

The defensibility of invisible infrastructure and the “meta-stack”

Not Diamond's vision and first-mover advantage in enterprise multi-model AI orchestration are matched only by their technical execution.

First, the real challenge wasn’t building a prompt adaptation system. It was making one that works reliably across dozens of idiosyncratic model APIs and use cases, at scale and with measurable impact. That requires data, tuning infrastructure, and partnerships—everything Tomás, Tze-Yang, Jeffrey and the team brought together in record time.

Second is market timing. As of mid-2025, multi-model frameworks have become real enough to drive infrastructure demand, in the part of the S-curve where orchestration moves beyond an optimization to a necessity. Not Diamond is ahead not because of a single insight, but because they built for this inevitability while others are still framing it as edge-case complexity.

We’ve seen this before: computing environments diversify, tooling lags, then someone builds a meta-stack—a control plane that abstracts, optimizes, and arbitrates across lower layers. In cloud computing, it was Kubernetes. In data engineering, it was dbt and Airflow. In AI, we believe it could be Not Diamond.

In investing in Not Diamond alongside SAP.iO Fund, IBM and others, we are not betting on a specific model or modality. We are betting on entropy—and the infrastructure needed to turn it into leverage.

Dean Mai 4/15/25 Dean Mai 4/15/25

Multi-Agent Systems with Rollback Mechanisms

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Enterprise demand for AI today isn’t about slotting in isolated models or adding another conversational interface. It’s about navigating workflows that are inherently messy: supply chains that pivot on volatile data, financial transactions requiring instantaneous validation, or medical claims necessitating compliance with compounding regulations. In these high-stakes, high-complexity domains, agentic and multi-agent systems (MAS) offer a structured approach to these challenges with intelligence that scales beyond individual reasoning. Rather than enforcing top-down logic, MAS behave more like dynamic ecosystems. Agents coordinate, collaborate, sometimes compete, and learn from each other to unlock forms of system behavior that emerge from the bottom up. Autonomy is powerful, but it also creates new unique fragilities concerning system reliability and data consistency, particularly in the face of failures or errors.

Take a financial institution handling millions of transactions a day. The workflow demands market analysis, regulatory compliance, trade execution, and ledger updates with each step reliant on different datasets, domain knowledge, and timing constraints. Trying to capture all of this within a single, monolithic AI model is impractical; the task requires decomposition into manageable subtasks, each handled by a tailored component. MAS offer exactly that. They formalize a modular approach, where autonomous agents handle specialized subtasks while coordinating toward shared objectives. Each agent operates with local context and local incentives, but participates in a global system dynamic. These systems are not just theoretical constructs but operational priorities, recalibrating how enterprises navigate complexity. But with that autonomy comes a new category of risk. AI systems don’t fail cleanly: a misclassification in trade validation or a small error in compliance tagging can ripple outward with real-world consequences—financial, legal, reputational. Rollback mechanisms serve as a counterbalance. They let systems reverse errors, revert to stable states, and preserve operational continuity. But as we embed more autonomy into core enterprise processes, rollback stops being a failsafe and starts becoming one more layer of coordination complexity.

Core Structure of MAS

A multi-agent system is, at its core, a combination of autonomous agents, each engineered for a narrow function, yet designed to operate in concert. In a supply chain setting for example, one agent might forecast demand using time-series analysis, another optimize inventory with constraint solvers, and a third schedule logistics via graph-based routing. These agents are modular, communicating through standardized interfaces—APIs, message queues like RabbitMQ, or shared caches like Redis—so that the system can scale and adapt. Coordination is handled by an orchestrator, typically implemented as a deterministic state machine, a graph-based framework like LangGraph, or a distributed controller atop Kubernetes. Its job is to enforce execution order and resolve dependencies, akin to a workflow engine. In trading systems, for example, this means ensuring that market analysis precedes trade execution, preventing premature actions on stale or incomplete information. State management underpins this coordination, with a shared context. It’s typically structured as documents in distributed stores like DynamoDB or MongoDB, or when stronger guarantees are needed, in systems like CockroachDB.

The analytical challenge lies in balancing modularity with coherence. Agents must operate independently to avoid bottlenecks, yet their outputs must align to prevent divergence. Distributed systems principles like event sourcing and consensus protocols become essential tools for maintaining system-level coherence without collapsing performance. In the context of enterprise applications, the necessity of robust rollback mechanisms within multi-agent systems cannot be overstated. These mechanisms are essential for preventing data corruption and inconsistencies that can arise from individual agent failures, software errors, or unexpected interactions. When one agent fails or behaves unexpectedly, the risk isn’t local. It propagates. For complex, multi-step tasks that involve the coordinated actions of numerous agents, reliable rollback capabilities ensure the integrity of the overall process, allowing the system to recover gracefully from partial failures without compromising the entire operation.

Rollback Mechanisms in MAS

The probabilistic outputs of AI agents, driven by models like fine-tuned LLMs or reinforcement learners, introduce uncertainty absent in deterministic software. A fraud detection agent might errantly flag a legitimate transaction, or an inventory agent might misallocate stock. Rollback mechanisms mitigate these risks by enabling the system to retract actions and restore prior states, drawing inspiration from database transactions but adapted to AI’s nuances.

The structure of rollback is a carefully engineered combination of processes, each contributing to the system’s ability to recover from errors with precision and minimal disruption. At its foundation lies the practice of periodically capturing state snapshots that encapsulate the system’s configuration—agent outputs, database records, and workflow variables. These snapshots form the recovery points, stable states the system can return to when things go sideways. They’re typically stored in durable, incrementally updatable systems like AWS S3 or ZFS, designed to balance reliability with performance overhead. Choosing how often to checkpoint is its own trade-off. Too frequent, and the system slows under the weight of constant I/O; too sparse, and you risk losing ground when things fail. To reduce snapshot resource demands, MAS can use differential snapshots (capturing only changes) or selectively logging critical states, balancing rollback needs with efficiency. It’s also worth noting that while rollback in AI-driven MAS inherits ideas from database transactions, it diverges quickly due to the probabilistic nature of AI outputs. Traditional rollbacks are deterministic: a set of rules reverses a known state change. In contrast, when agents act based on probabilistic models their outputs are often uncertain. A fraud detection agent might flag a legitimate transaction based on subtle statistical quirks. An inventory optimizer might misallocate stock due to noisy inputs. That’s why rollback in MAS often needs to be triggered by signals more nuanced than failure codes: confidence thresholds, anomaly scores, or model-based diagnostics like variational autoencoders (VAEs) can serve as indicators that something has gone off-track.

In modern MAS, every action is logged, complete with metadata like agent identifiers, timestamps, and input hashes via systems such as Apache Kafka. These logs do more than support debugging; they create a forensic trail of system behavior, essential for auditability and post-hoc analysis, particularly in regulated domains like finance and healthcare. Detecting when something has gone wrong in a system of autonomous agents isn’t always straightforward. It might involve checking outputs against hard-coded thresholds, leveraging statistical anomaly detection models like VAEs, or incorporating human-in-the-loop workflows to catch edge cases that models miss. Once identified, rollback decisions are coordinated by an orchestrator that draws on these logs and the system’s transactional history to determine what went wrong, when, and how to respond.

The rollback is a toolkit of strategies selected based on the failure mode and the system’s tolerance for disruption. One approach, compensating transactions, aims to undo actions by applying their logical inverse: a payment is reversed, a shipment is recalled. But compensating for AI-driven decisions means accounting for uncertainty. Confidence scores, ensemble agreement, or even retrospective model audits may be needed to confirm that an action was indeed faulty before undoing it. Another approach, state restoration, rolls the system back to a previously captured snapshot—resetting variables to a known-good configuration. This works well for clear-cut failures, like misallocated inventory, but it comes at a cost: any valid downstream work done since the snapshot may be lost. To avoid this, systems increasingly turn to partial rollbacks, surgically undoing only the affected steps while preserving valid state elsewhere. In a claims processing system, for instance, a misassigned medical code might be corrected without resetting the entire claim’s status, maintaining progress elsewhere in the workflow. But resilience in multi-agent systems isn’t just about recovering, it’s about recovering intelligently. In dynamic environments, reverting to a past state can be counterproductive if the context has shifted. Rollback strategies need to be context-aware, adapting to changes in data, workflows, or external systems. They need to ensure that the system is restored to a state that is still relevant and consistent with the current environmental conditions. Frameworks like ReAgent provide early demonstration on what this could look like: reversible collaborative reasoning across agents, with explicit backtracking and correction pathways. Instead of merely rolling back to a prior state, agents revise their reasoning in light of new evidence. By allowing agents to backtrack and correct their reasoning, such frameworks offer a form of intelligent rollback that is more nuanced than simply reverting to a prior state.

Building robust rollback in MAS requires adapting classical transactional principles—atomicity, consistency, isolation, durability (ACID)—to distributed AI contexts. Traditional databases enforce strict ACID guarantees through centralized control, but MAS often trade strict consistency for scalability, favoring eventual consistency in most interactions. Still, for critical operations, MAS can lean on distributed coordination techniques like two-phase commits or the Saga pattern to approximate ACID-like reliability without introducing system-wide bottlenecks. The Saga pattern, in particular, is designed to manage large, distributed transactions. It decomposes them into a sequence of smaller, independently executed steps, each scoped to a single agent. If something fails midway, compensating transactions are used to unwind the damage, rolling the system back to a coherent state without requiring every component to hold a lock on the global system state. This autonomy-first model aligns well with how MAS operate: each agent governs its own local logic, yet contributes to an eventually consistent global objective. Emerging frameworks like SagaLLM advance this further by tailoring saga-based coordination to LLM-powered agents, introducing rollback hooks that are not just state-aware but also constraint-sensitive, ensuring that even when agents fail or outputs drift, the system can recover coherently. These mechanisms help bridge the gap between high-capacity, probabilistic reasoning and the hard guarantees needed for enterprise-grade applications involving multiple autonomous agents.

To ground this, consider a large bank deploying an MAS for real-time fraud detection. The system might include a risk-scoring agent (such as a fine-tuned BERT model scoring transactions for risk), a compliance agent enforcing AML rules via symbolic logic, and a settlement agent updating ledger entries via blockchain APIs. A Kubernetes-based orchestrator sequences these agents, with Kafka streaming in transactional data and DynamoDB maintaining distributed state. Now suppose the fraud detection agent flags a routine payment as anomalous. The error is caught either via statistical anomaly detection or a human override and rollback is initiated. The orchestrator triggers a compensating transaction to reverse the ledger update, a snapshot is restored to reset the account state, and the incident is logged for regulatory audits. In parallel, the system might update its anomaly model or confidence thresholds—learning from the mistake rather than simply erasing it. And integrating these AI-native systems with legacy infrastructure adds another layer of complexity. Middleware like MuleSoft becomes essential, not just for translating data formats or bridging APIs, but for managing latency, preserving transactional coherence, and ensuring the MAS doesn’t break when it encounters the brittle assumptions baked into older systems.

The stochastic nature of AI makes rollback an inherently fuzzy process. A fraud detection agent might assign a 90% confidence score to a transaction and still be wrong. Static thresholds risk swinging too far in either direction: overreacting to benign anomalies or missing subtle but meaningful failures. While techniques like VAEs are often explored for anomaly detection, other methods, such as statistical process control or reinforcement learning, offer more adaptive approaches. These methods can calibrate rollback thresholds dynamically, tuning themselves in response to real-world system performance rather than hardcoded heuristics. Workflow topology also shapes rollback strategy. Directed acyclic graphs (DAGs) are the default abstraction for modeling MAS workflows, offering clear scoping of dependencies and rollback domains. But real-world workflows aren’t always acyclic. Cyclic dependencies, such as feedback loops between agents, require more nuanced handling. Cycle detection algorithms or formal methods like Petri nets become essential for understanding rollback boundaries: if an inventory agent fails, for instance, the system might need to reverse only downstream logistics actions, while preserving upstream demand forecasts. Tools like Apache Airflow or LangGraph implement this. What all this points to is a broader architectural principle: MAS design is as much about managing uncertainty and constraints as it is about building intelligence. The deeper challenge lies in formalizing these trade-offs—balancing latency versus consistency, memory versus compute, automation versus oversight—and translating them into robust architectures.

Versatile Applications

In supply chain management defined by uncertainty and interdependence, MAS can be deployed to optimize complex logistics networks, manage inventory levels dynamically, and improve communication and coordination between various stakeholders, including suppliers, manufacturers, and distributors. Rollback mechanisms are particularly valuable in this context for recovering from unexpected disruptions such as supplier failures, transportation delays, or sudden fluctuations in demand. If a critical supplier suddenly ceases operations, a MAS with rollback capabilities could revert to a previous state where perhaps alternate suppliers had been identified and contingencies pre-positioned, minimizing the impact on the production schedule. Similarly, if a major transportation route becomes unavailable due to unforeseen circumstances, the system could roll back to a prior plan and activate pre-arranged contingency routes. We’re already seeing this logic surface in MAS-ML frameworks that combine MAS with machine learning techniques to enable adaptive learning with structured coordination to give supply chains a form of situational memory.

Smart/advanced manufacturing environments, characterized by interconnected machines, autonomous robots, and intelligent control systems, stand to benefit even more. Here, MAS can coordinate the activities of robots on the assembly line, manage complex production schedules to account for shifting priorities, and optimize the allocation of manufacturing resources. Rollback mechanisms are crucial for ensuring the reliability and efficiency of these operations by providing a way to recover from equipment malfunctions, production errors, or unexpected changes in product specifications. If a robotic arm malfunctions during a high-precision weld, a rollback mechanism could revert the affected components to their prior state and resume the task to another available robot or a different production cell. The emerging concept of an Agent Computing Node (ACN) within multi-agent manufacturing systems offers a path for easy(ier) deployment of these capabilities. Embedding rollback at the ACN level could allow real-time scheduling decisions to unwind locally without disrupting global coherence, enabling factories that aren’t just smart, but more fault-tolerant by design.

In financial trading platforms, which operate in highly volatile and time-sensitive markets where milliseconds equate to millions and regulatory compliance is enforced in audit logs, MAS can serve as algorithmic engines behind trading, portfolio management, and real-time risk assessment. Rollback here effectively plays a dual role: operational safeguard and regulatory necessity. Rollback capabilities are essential for maintaining the accuracy and integrity of financial transactions, recovering from trading errors caused by software glitches or market anomalies, and mitigating the potential impact of extreme market volatility. If a trading algorithm executes a series of erroneous trades due to a sudden, unexpected market event, a rollback mechanism could reverse these trades and restore the affected accounts to their previous state, preventing significant financial losses. Frameworks like TradingAgents, which simulate institutional-grade MAS trading strategies, underscore the value of rollback not just as a corrective tool but as a mechanism for sustaining trust and interpretability in high-stakes environments.

In cybersecurity, multi-agent systems can be leveraged for automated threat detection, real-time analysis of network traffic for suspicious activities, and the coordination of defensive strategies to protect enterprise networks and data. MAS with rollback mechanisms are critical for enabling rapid recovery from cyberattacks, such as ransomware or data breaches, by restoring affected systems to a known clean state before the intrusion occurred. For example, if a malicious agent manages to infiltrate a network and compromise several systems, a rollback mechanism could restore those systems to a point in time before the breach, effectively neutralizing the attacker's actions and preventing further damage. Recent developments on Multi-Agent Deep Reinforcement Learning (MADRL) for autonomous cyber defense has begun to formalize this concept: “restore” as a deliberate, learnable action in a broader threat response strategy, highlighting the importance of rollback-like functionalities.

Looking Ahead

The ecosystem for MAS is evolving not just in capability, but also in topology with frameworks like AgentNet proposing fully decentralized paradigms where agents can evolve their capabilities and collaborate efficiently without relying on a central orchestrator. The challenge lies in coordinating these individual rollback actions in a way that maintains the integrity and consistency of the entire multi-agent system. When there’s no global conductor, how do you coordinate recovery in a way that preserves system-level integrity? There are recent directions exploring how to equip individual agents with the ability to rollback their actions locally and states autonomously, contributing to the system's overall resilience without relying on a centralized recovery mechanism.

Building scalable rollback mechanisms in large-scale MAS, which may involve hundreds or even thousands of autonomous agents operating in a distributed environment, is shaping up to be a significant systems challenge. The overhead associated with tracking state and logging messages to enable potential rollbacks starts to balloon as the number of agents and their interactions increase. Getting rollback to work at this scale requires new protocol designs that are not only efficient, but also resilient to partial failure and misalignment.

But the technical hurdles in enterprise settings are just one layer. There are still fundamental questions to be answered. Can rollback points be learned or inferred dynamically, tuned to the nature and scope of the disruption? What’s the right evaluation framework for rollback in MAS—do we optimize for system uptime, recovery speed, agent utility, or something else entirely? And how do we build mechanisms that allow for human intervention without diminishing the agents’ autonomy yet still ensure overall system safety and compliance?

More broadly, we need ways to verify the correctness and safety of these rollback systems under real-world constraints, not just in simulated testbeds, especially in enterprise deployments where agents often interact with physical infrastructure or third-party systems. As such, this becomes more of a question of system aliment based on varying internal business processes and constraints. For now, there’s still a gap between what we can build and what we should build—building rollback into MAS at scale requires more than just resilient code. It’s still a test of how well we can align autonomous systems in a reliable, secure, and meaningfully integrated way against partial failures, adversarial inputs, and rapidly changing operational contexts.

Dean Mai 4/6/25 Dean Mai 4/6/25

Garbage Collection Tuning In Large-Scale Enterprise Applications

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Garbage collection (GC) is one of those topics that feels like a solved problem until you scale it up to the kind of systems that power banks, e-commerce, logistics firms, and cloud providers. For many enterprise systems, GC is an invisible component: a background process that “just works.” But under high-throughput, latency-sensitive conditions, it surfaces as a first-order performance constraint. The market for enterprise applications is shifting: everyone’s chasing low-latency, high-throughput workloads, and GC is quietly becoming a choke point that separates the winners from the laggards.

Consider a high-frequency trading platform processing orders in microseconds. After exhausting traditional performance levers (scaling cores, rebalancing threads, optimizing code paths), unexplained latency spikes persisted. The culprit? GC pauses—intermittent, multi-hundred-millisecond interruptions from the JVM's G1 collector. These delays, imperceptible in consumer applications, are catastrophic in environments where microseconds mean millions. Over months, the engineering team tuned G1, minimized allocations, and restructured the memory lifecycle. Pauses became predictable. The broader point is that GC, long relegated to the domain of implementation detail, is now functioning as an architectural constraint with competitive implications. In latency-sensitive domains, it functions less like background maintenance and more like market infrastructure. Organizations that treat it accordingly will find themselves with a structural advantage. Those that don’t risk falling behind.

Across the enterprise software landscape, memory management is undergoing a quiet but significant reframing. Major cloud providers—AWS, Google Cloud, and Azure—are increasingly standardizing on managed runtimes like Java, .NET, and Go, embedding them deeply across their platforms. Kubernetes clusters now routinely launch thousands of containers, each with its own runtime environment and independent garbage collector running behind the scenes. At the same time, workloads are growing more demanding—spanning machine learning inference, real-time analytics, and distributed databases. These are no longer the relatively simple web applications of the early 2000s, but complex, large-scale systems with highly variable allocation behavior. They are allocation-heavy, latency-sensitive, and highly bursty. As a result, the old mental ‘set a heap size, pick a collector, move on’ model for GC tuning is increasingly incompatible with modern workloads and breaking down. The market is beginning to demand more nuanced, adaptive approaches. In response, cloud vendors, consultancies, and open-source communities are actively exploring what modern memory management should look like at scale.

At its core, GC is an attempt to automate memory reclamation. It is the runtime’s mechanism for managing memory—cleaning up objects that are no longer in use. When memory is allocated for something like a trade order, a customer record, or a neural network layer, the GC eventually reclaims that space once it’s no longer needed. But the implementation is a compromise. In theory, this process is automatic and unobtrusive. In practice, it’s a delicate balancing act. The collector must determine when to run, how much memory to reclaim, and how to do so without significantly disrupting application performance. If it runs too frequently, it consumes valuable CPU resources. If it waits too long, applications can experience memory pressure and even out-of-memory errors. Traditional collection strategies—such as mark-and-sweep, generational, or copying collectors—each bring their own trade-offs. But today, much of the innovation is happening in newer collectors like G1, Shenandoah, ZGC, and Epsilon. These are purpose-built for scalability and low latency, targeting the kinds of workloads modern enterprises increasingly rely on. The challenge, however, is that these collectors are not truly plug-and-play. Their performance characteristics hinge on configuration details. Effective tuning often requires deep expertise and workload-specific knowledge—an area that’s quickly gaining attention as organizations push for more efficient and predictable performance at scale.

Take G1: the default garbage collector in modern Java. It follows a generational model, dividing the heap into young and old regions, but with a key innovation: it operates on fixed-size regions, allowing for incremental cleanup. The goal is to deliver predictable pause times—a crucial feature in enterprise environments where even a 500ms delay can have real financial impact. That said, G1 can be challenging to tune effectively. Engineers familiar with its inner workings know it offers a wide array of configuration options, each with meaningful trade-offs. Parameters like -XX:MaxGCPauseMillis allow developers to target specific latency thresholds, but aggressive settings can significantly reduce throughput. For instance, the JVM may shrink the heap or adjust survivor space sizes to meet pause goals, which can lead to increased GC frequency and higher allocation pressure. This often results in reduced throughput, especially under bursty or memory-intensive workloads. Achieving optimal performance typically requires balancing pause time targets with realistic expectations about allocation rates and heap sizing. Similarly, -XX:G1HeapRegionSize lets you adjust region granularity, but selecting an inappropriate value may lead to memory fragmentation or inefficient heap usage. Benchmark data from OpenJDK’s JMH suite, tested on a 64-core AWS Graviton3 instance, illustrates just how sensitive performance can be. In one case, an untuned G1 configuration resulted in 95th-percentile GC pauses of around 300ms. In one specific configuration and workload scenario, careful tuning reduced pauses significantly. The broader implication is clear: organizations with the expertise to deeply tune their runtimes unlock performance. Others leave it on the table.

Across the industry, runtime divergence is accelerating. .NET Core and Go are steadily gaining traction, particularly among cloud-native organizations. Each runtime brings its own approach to GC. The .NET CLR employs a generational collector with a server mode that strikes a good balance for throughput, but it tends to underperform in latency-sensitive environments. Go’s GC, on the other hand, is lightweight, concurrent, and optimized for low pause times—typically around 1ms or less (under typical workloads). However, it can struggle with memory-intensive applications due to its conservative approach to memory reclamation. Running a brief experiment with a Go-based microservice simulating a payment gateway (10,000 requests per second and a 1GB heap), with default settings, delivers 5ms pauses at the 99th percentile. By adjusting the GOMEMLIMIT setting to trigger more frequent cycles, it was possible to reduce pauses to 2ms, but this came at the cost of a 30% increase in memory usage (hough results will vary depending on workload characteristics). With many performance optimizations, these are the trade-offs and they’re workload-dependent.

Contemporary workloads are more erratic. Modern systems stream events, cache large working sets, and process thousands of concurrent requests. The traditional enterprise mainstay (CRUD applications interacting with relational databases) is being replaced by event-driven systems, streaming pipelines, and in-memory data grids. Technologies like Apache Kafka are now ubiquitous, processing massive volumes of logs, while Redis and Hazelcast are caching petabytes of state. These modern systems generate objects at a rapid pace, with highly variable allocation patterns: short-lived events, long-lived caches, and everything in between. In one case, a logistics company running a fleet management platform on Kubernetes, saw full GC pauses every few hours. Their Java pods were struggling with full garbage collections every few hours, caused by an influx of telemetry data. After switching to Shenandoah, Red Hat’s low-pause collector, they saw GC pauses drop from 1.2 seconds to just 50ms. However, the improvement came at a cost—CPU usage increased by 15%, and they needed to rebalance their cluster to prevent hotspots. This is becoming increasingly common: latency improvements now have architectural consequences.

Vendor strategies are also diverging. The major players—Oracle, Microsoft, and Google—are all aware that GC can be a pain point, though their approaches vary. Oracle is pushing ZGC in OpenJDK, a collector designed to deliver sub-millisecond pauses even on multi-terabyte heaps. It’s a compelling solution (benchmarks from Azul show it maintaining stable 0.5ms pauses on a 128GB heap under heavy load) but it can be somewhat finicky. It utilizes a modern kernel with huge pages enabled (doesn’t require them but performs better with them), and its reliance on concurrent compaction demands careful management to avoid excessive CPU usage. Microsoft’s .NET team has taken a more incremental approach, focusing on gradual improvements to the CLR’s garbage collector. While this strategy delivers steady progress, it lags behind the more radical redesigns seen in the Java ecosystem. Google’s Go runtime stands apart, with a GC built for simplicity and low-latency performance. It’s particularly popular with startups, though it can be challenging for enterprises with more complex memory management requirements. Meanwhile, niche players like Azul are carving out a unique space with custom JVMs. Their flagship product, Zing, combines ZGC-like performance (powered by Azul’s proprietary C4 collector comparable to ZGC in terms of pause times) with advanced diagnostics that many describe as exceptionally powerful. Azul’s “we tune it for you” value proposition seems to be resonating—their revenue grew over 95% over the past three years, according to their filings.

Consultancies are responding as well. The Big Four—Deloitte, PwC, EY, and KPMG—are increasingly building out teams with runtime expertise and now including GC tuning in digital transformation playbooks. Industry case studies illustrate the tangible benefits: one telco reportedly reduced its cloud spend by 20% by fine-tuning G1 across hundreds nodes, while a major retailer improved checkout latency by 100ms after migrating to Shenandoah. Smaller, more technically focused firms like ThoughtWorks are taking an even deeper approach, offering specialized profiling tools and tailored workshops for engineering teams. So runtime behavior is no longer a backend concern—it’s a P&L lever.

The open-source ecosystem plays a vital dual role in fueling the GC innovation while introducing complexity by fragmenting tooling. Many of today’s leading collectors such as Shenandoah, ZGC, and G1 emerged from OSS community-driven research efforts before becoming production-ready. However, a capability gap persists: tooling exists, but expertise is required to extract value from it. Utilities like VisualVM and Eclipse MAT provide valuable insights—heap dumps, allocation trends, and pause time metrics—but making sense of that data often requires significant experience and intuition. In one example, a 10GB heap dump from a synthetic workload revealed a memory leak caused by a misconfigured thread pool. While the tools surfaced the right signals, diagnosing and resolving the issue ultimately depended on hands-on expertise. Emerging projects like GCViewer and OpenTelemetry’s JVM metrics are improving visibility, but most enterprises still face a gap between data and diagnosis that’s increasingly monetized. For enterprises seeking turnkey solutions, the current open-source tooling often falls short. As a result, vendors and consultancies are stepping in to fill the gap—offering more polished, supported options, often at a premium.

One emerging trend worth watching: no-GC runtimes. Epsilon, a no-op collector available in OpenJDK, effectively disables garbage collection, allocating memory until exhaustion. While this approach is highly specialized, it has found a niche in environments where ultra-low latency is paramount, leverage it for short-lived, high-throughput workloads where every microsecond counts. It’s a tactical tool: no GC means no pauses, but also no safety net. In a simple benchmark of allocating 100 million objects on a 1GB heap, Epsilon delivered about 20% higher throughput than G1—in a synthetic, allocation-heavy workload designed to avoid GC interruptions—with no GC pauses until the heap was fully consumed. That said, this approach demands precise memory sizing, as there’s no safety net once the heap fills up. And since Epsilon does not actually perform GC, the JVM shuts down when the heap is exhausted. So in systems that handle large volumes of data and require high reliability, this behavior poses a significant risk. Running out of memory could lead to system crashes during critical operations, making it unsuitable for environments that demand continuous uptime and stability

Rust represents a divergence in runtime philosophy: its ownership model frontloads complexity in exchange for execution-time determinism. Its ownership model eliminates the need for garbage collection entirely, giving developers fine-grained control over memory. It’s gaining popularity in systems programming, though enterprise adoption remains slow—retraining teams accustomed to Java or .NET is often a multi-year effort. Still, these developments are prompting a quiet reevaluation in some corners of the industry. Perhaps the challenge isn’t just tuning GC, it’s rethinking whether we need it at all in certain contexts.

Directionally, GC is now part of the performance stack, not a postscript. The enterprise software market appears to be at an inflection point. Due to AI workloads, latency and throughput are no longer differentiators; there’s a growing shift toward predictable performance and manual memory control. In this landscape, GC is emerging as a more visible and persistent bottleneck. Organizations that invest in performance, whether through specialized talent, intelligent tooling, or strategic vendor partnerships, stand to gain a meaningful advantage. Cloud providers will continue refining their managed runtimes with smarter defaults, but the biggest performance gains will likely come from deeper customization. Consultancies are expected to expand GC optimization as a service offering, and we’ll likely see more specialized vendors like Azul carving out space at the edges. Open-source innovation will remain strong, though the gap between powerful raw tools and enterprise-ready solutions may continue to grow. And in the background, there may be a gradual shift toward no-GC alternatives as workloads evolve in complexity and scale. Hardware changes (e.g., AWS Graviton) amplify memory management pressure due to higher parallelism; with more cores there are more objects, and more stress on memory management systems. Ultimately, managed runtimes will improve, but improvements will mostly serve the median case. High-performance outliers will remain underserved—fertile ground for optimization vendors and open-source innovation.

For now, GC tuning doesn’t make headlines, but it does shape the systems that do as it increasingly defines the boundary between efficient, scalable systems and costly, brittle ones. The organizations that master memory will move faster, spend less, and scale cleaner. Those that don’t may find themselves playing catch-up—wondering why performance lags and operational expenses continue to climb. GC isn’t a solved problem. It’s a leverage point—in a market this dynamic, even subtle shifts in infrastructure performance can have a meaningful impact over time.

Dean Mai 12/13/24 Dean Mai 12/13/24

Specialization and Modularity in AI Architecture with Multi-Agent Systems

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations.

The evolution from monolithic large language models (mono-LLMs) to multi-agent systems (MAS) reflects a practical shift in how AI can be structured to address the complexity of real-world tasks. Mono-LLMs, while impressive in their ability to process vast amounts of information, have inherent limitations when applied to dynamic environments like enterprise operations. They are inefficient for specialized tasks, requiring significant resources for even simple queries, and can be cumbersome to update and scale. Mono-LLMs are difficult to scale because every improvement impacts the entire system, leading to complex update cycles and reduced agility. Multi-agent systems, on the other hand, introduce a more modular and task-specific approach, enabling specialized agents to handle discrete problems with greater efficiency and adaptability.

This modularity is particularly valuable in enterprise settings, where the range of tasks—data analysis, decision support, workflow automation—requires diverse expertise. Multi-agent systems make it possible to deploy agents with specific capabilities, such as generating code, providing real-time insights, or managing system resources. For example, a compiler agent in an MAS setup is not just responsible for executing code but also participates in optimizing the process. By incorporating real-time feedback, the compiler can adapt its execution strategies, correct errors, and fine-tune outputs based on the context of the task. This is especially useful for software teams working on rapidly evolving projects, where the ability to test, debug, and iterate efficiently can translate directly into faster product cycles.

Feedback systems are another critical component of MAS, enabling these systems to adapt on the fly. In traditional setups, feedback loops are often reactive—errors are identified post hoc, and adjustments are made later. MAS integrate feedback as part of their operational core, allowing agents to refine their behavior in real-time. This capability is particularly useful in scenarios where decisions must be made quickly and with incomplete information, such as supply chain logistics or financial forecasting. By learning from each interaction, agents improve their accuracy and relevance, making them more effective collaborators in decision-making processes.

Memory management is where MAS ultimately demonstrate practical improvements. Instead of relying on static memory allocation, which can lead to inefficiencies in resource use, MAS employ predictive memory strategies. These strategies allow agents to anticipate their memory needs based on past behavior and current workloads, ensuring that resources are allocated efficiently. For enterprises, this means systems that can handle complex, data-heavy tasks without bottlenecks or delays, whether it’s processing customer data or running simulations for product design.

Collaboration among agents is central to the success of MAS. Inter-agent learning protocols facilitate this by creating standardized ways for agents to share knowledge and insights. For instance, a code-generation agent might identify a useful pattern during its operations and share it with a related testing agent, which could then use that information to improve its validation process. This kind of knowledge-sharing reduces redundancy and accelerates problem-solving, making the entire system more efficient. Additionally, intelligent cleanup mechanisms ensure that obsolete or redundant data is eliminated without disrupting ongoing operations, balancing resource utilization and system stability. Advanced memory management thus becomes a cornerstone of the MAS architecture, enabling the system to scale efficiently while maintaining responsiveness. It also makes MAS particularly well-suited for environments where cross-functional tasks are the norm, such as coordinating between sales, operations, and customer service in a large organization.

The infrastructure supporting MAS is designed to make these systems practical for enterprise use. Agent authentication mechanisms ensure that only authorized agents interact within the system, reducing security risks. Integration platforms enable seamless connections between agents and external tools, such as APIs or third-party services, while specialized runtime environments optimize the performance of AI-generated code. In practice, these features mean enterprises can deploy MAS without requiring a complete overhaul of their existing tech stack, making adoption more feasible and less disruptive.

Consider a retail operation looking to improve its supply chain. With MAS, the system could deploy agents to predict demand fluctuations, optimize inventory levels, and automate vendor negotiations, all while sharing data across the network to ensure alignment. Similarly, in a software development context, MAS can streamline workflows by coordinating code generation, debugging, and deployment, allowing teams to focus on strategic decisions rather than repetitive tasks.

What makes MAS particularly compelling is their ability to evolve alongside the organizations they serve. As new challenges emerge, agents can be updated or added without disrupting the entire system. This modularity makes MAS a practical solution for enterprises navigating the rapid pace of technological change. By focusing on specific, well-defined tasks and integrating seamlessly with existing workflows, MAS provide a scalable, adaptable framework that supports real-world operations.

This shift to multi-agent systems is not about replacing existing tools but enhancing them. By breaking down complex problems into manageable pieces and assigning them to specialized agents, MAS make it easier for enterprises to tackle their most pressing challenges. These systems are built to integrate, adapt, and grow, making them a practical and valuable addition to the toolkit of modern organizations.

Dean Mai 11/26/24 Dean Mai 11/26/24

Adopting Function-as-a-Service (FaaS) for AI workflows

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences.

Function-as-a-Service (FaaS) stands at the crossroads of cloud computing innovation and the evolving needs of modern application development. It isn’t just an incremental improvement over existing paradigms; it is an entirely new mode of thinking about computation, resources, and scale. In a world where technology continues to demand agility and abstraction, FaaS offers a lens to rethink how software operates in a fundamentally event-driven, modular, and reactive manner.

At its essence, FaaS enables developers to execute isolated, stateless functions without concern for the underlying infrastructure. The abstraction here is not superficial but structural. Traditional cloud models like Infrastructure-as-a-Service (IaaS) or even Platform-as-a-Service (PaaS) hinge on predefined notions of persistence—instances, containers, or platforms that remain idle, waiting for tasks. FaaS discards this legacy. Instead, computation occurs as a series of discrete events, each consuming resources only for the moment it executes. This operational principle aligns deeply with the physics of computation itself: using resources only when causally necessary.

To fully grasp the implications of FaaS, consider its architecture. The foundational layer is virtualization, which isolates individual functions. Historically, the field has relied on virtualization techniques like hypervisors and container orchestration to allocate resources effectively. FaaS narrows this focus further. Lightweight microVMs and unikernels are emerging as dominant trends, optimized to ensure rapid cold starts and reduced resource overhead. However, this comes at a cost: such architectures often sacrifice flexibility, requiring developers to operate within tightly controlled parameters of execution.

Above this virtualization layer is the encapsulation layer, which transforms FaaS into something that developers can tangibly work with. The challenge here is not merely technical but conceptual. Cold starts—delays caused by initializing environments from scratch—represent a fundamental bottleneck. Various techniques, such as checkpointing, prewarming, and even speculative execution, seek to address this issue. Yet, each of these solutions introduces trade-offs. Speculative prewarming may solve latency for a subset of tasks but at the cost of wasted compute. This tension exemplifies the core dynamism of FaaS: every abstraction must be balanced against the inescapable physics of finite resources.

The orchestration layer introduces complexity. Once a simple scheduling problem, orchestration in FaaS becomes a fluid, real-time process of managing unpredictable workloads. Tasks do not arrive sequentially but chaotically, each demanding isolated execution while being part of larger workflows. Systems like Kubernetes, originally built for containers, are evolving to handle this flux. In FaaS, orchestration must not only schedule tasks efficiently but also anticipate failure modes and latency spikes that could disrupt downstream systems. This is particularly critical for AI applications, where real-time responsiveness often defines the product’s value.

The final piece of the puzzle is the coordination layer, where FaaS bridges with Backend-as-a-Service (BaaS) components. Here, stateless functions are augmented with stateful abstractions—databases, message queues, storage layers. This synthesis enables FaaS to transcend its stateless nature, allowing developers to compose complex workflows. However, this dependency on external systems introduces fragility. Latency and failure are not isolated to the function execution itself but ripple across the entire ecosystem. This creates a fascinating systems-level challenge: how to design architectures that are both modular and resilient under stress.

What makes FaaS particularly significant is its impact on enterprise AI development. The state of AI today demands systems that are elastic, cost-efficient, and capable of real-time decision-making. FaaS fits naturally into this paradigm. Training a machine learning model may remain the domain of large-scale, distributed clusters, but serving inferences is a different challenge altogether. With FaaS, inference pipelines can scale dynamically, handling sporadic spikes in demand without pre-provisioning costly infrastructure. This elasticity fundamentally changes the economics of deploying AI systems, particularly in industries where demand patterns are unpredictable.

Cost is another dimension where FaaS aligns with the economics of AI. The pay-as-you-go billing model eliminates the sunk cost of idle compute. Consider a fraud detection system in finance: the model is invoked only when a transaction occurs. Under traditional models, the infrastructure to handle such transactions would remain operational regardless of workload. FaaS eliminates this inefficiency, ensuring that resources are consumed strictly in proportion to demand. However, this efficiency can sometimes obscure the complexities of cost prediction. Variability in workload execution times or dependency latencies can lead to unexpected billing spikes, a challenge enterprises are still learning to navigate.

Timeouts also impose a hard ceiling on execution in most FaaS environments, often measured in seconds or minutes. For many AI tasks—especially inference pipelines processing large inputs or models requiring nontrivial preprocessing—these limits can become a structural constraint rather than a simple runtime edge case. Timeouts force developers to split logic across multiple functions, offload parts of computation to external services, or preemptively trim the complexity of their models. These are engineering compromises driven not by the shape of the problem, but by the shape of the platform.

Perhaps the most profound impact of FaaS on AI is its ability to reduce cognitive overhead for developers. By abstracting infrastructure management, FaaS enables teams to iterate on ideas without being burdened by operational concerns. This freedom is particularly valuable in AI, where rapid experimentation often leads to breakthroughs. Deploying a sentiment analysis model or an anomaly detection system no longer requires provisioning servers, configuring environments, or maintaining uptime. Instead, developers can focus purely on refining their models and algorithms.

But the story of FaaS is not without challenges. The reliance on statelessness, while simplifying scaling, introduces new complexities in state management. AI applications often require shared state, whether in the form of session data, user context, or intermediate results. Externalizing this state to distributed storage or databases adds latency and fragility. While innovations in distributed caching and event-driven state reconciliation offer partial solutions, they remain imperfect. The dream of a truly stateful FaaS model—one that maintains the benefits of statelessness while enabling efficient state sharing—remains an open research frontier.

Cold start latency is another unsolved problem. AI systems that rely on real-time inference cannot tolerate delays introduced by environment initialization. For example, a voice assistant processing user queries needs to respond instantly; any delay breaks the illusion of interactivity. Techniques like prewarming instances or relying on lightweight runtime environments mitigate this issue but cannot eliminate it entirely. The physics of computation imposes hard limits on how quickly environments can be instantiated, particularly when security isolation is required.

Vendor lock-in is a systemic issue that pervades FaaS adoption where currently each cloud provider builds proprietary abstractions, tying developers to specific APIs, runtimes, and pricing models. While open-source projects like Knative and OpenFaaS aim to create portable alternatives, they struggle to match the integration depth and ecosystem maturity of their commercial counterparts. This tension between portability and convenience is a manifestation of the broader dynamics in cloud computing.

Looking ahead, the future of FaaS I believe will be defined by its integration with edge computing. As computation migrates closer to the source of data generation, the principles of FaaS—modularity, event-driven execution, ephemeral state—become increasingly relevant. AI models deployed on edge devices, from autonomous vehicles to smart cameras, will rely on FaaS-like paradigms to manage local inference tasks. This shift will not only redefine the boundaries of FaaS but also force the development of new orchestration and coordination mechanisms capable of operating in highly distributed environments.

In reflecting on FaaS, one cannot ignore its broader almost philosophical implications. At its heart, FaaS is an argument about the nature of computation: that it is not a continuous resource to be managed but a series of discrete events to be orchestrated. This shift reframes the role of software itself, not as a persistent entity but as a dynamic, ephemeral phenomenon.

Dean Mai 6/29/24 Dean Mai 6/29/24

Architectural Paradigms for Scalable Unstructured Data Processing in Enterprise

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences.

Unstructured data encompasses a wide array of information types that do not conform to predefined data models or organized in traditional relational databases. This includes text documents, emails, social media posts, images, audio files, videos, and sensor data. The inherent lack of structure makes this data difficult to process using conventional methods, yet it often contains valuable insights that can drive innovation, improve decision-making, and enhance customer experiences. The rise of generative AI and large language models (LLMs) has further emphasized the importance of effectively managing unstructured data. These models require vast amounts of diverse, high-quality data for training and fine-tuning. Additionally, techniques like retrieval-augmented generation (RAG) rely on the ability to efficiently search and retrieve relevant information from large unstructured datasets.

Architectural Considerations for Unstructured Data Systems In Enterprises

Data Ingestion and Processing Architecture. The first challenge in dealing with unstructured data is ingestion. Unlike structured data, which can be easily loaded into relational databases, unstructured data requires specialized processing pipelines. These pipelines must be capable of handling a variety of data formats and sources, often in real-time or near-real-time, and at massive scale. For modern global enterprises, it’s crucial to design the ingestion architecture with global distribution in mind.‍

Text-based Data. Natural language processing (NLP) techniques are essential for processing text-based data. This includes tokenization, part-of-speech tagging, named entity recognition, and sentiment analysis. Modern NLP pipelines often leverage deep learning models, such as BERT or GPT, which can capture complex linguistic patterns and context. At enterprise scale, these models may need to be deployed across distributed clusters to handle the volume of incoming data. Startups like Hugging Face provide transformer-based models that can be fine-tuned for specific enterprise needs, enabling sophisticated text analysis and generation capabilities.

Image and Video Data. Computer vision algorithms are necessary for processing image and video data. These may include convolutional neural networks (CNNs) for image classification and object detection, or more advanced architectures like Vision Transformers (ViT) for tasks requiring understanding of spatial relationships. Processing video data, in particular, requires significant computational resources and may benefit from GPU acceleration. Notable startups such as OpenCV.ai are innovating in this space by providing open-source computer vision libraries and tools that can be integrated into enterprise workflows. Companies like Roboflow and Encord offer an end-to-end computer vision platform providing tools for data labeling, augmentation, and model training, making it easier for enterprises to build custom computer vision models. Their open-source YOLOv5 implementation has gained significant traction in the developer community. Voxel51 is tackling unstructured data retrieval in computer vision with their open-source FiftyOne platform, which enables efficient management, curation, and analysis of large-scale image and video datasets. Coactive is leveraging unstructured data retrieval across multiple modalities with their neural database technology, designed to efficiently store and query diverse data types including text, images, and sensor data.
Audio Data. Audio data presents its own set of challenges, requiring speech-to-text conversion for spoken content and specialized audio analysis techniques for non-speech sounds. Deep learning models like wav2vec and HuBERT have shown promising results in this domain. For enterprises dealing with large volumes of audio data, such as call center recordings, implementing a distributed audio processing pipeline is crucial. Companies like Deepgram and AssemblyAI are leveraging end-to-end deep learning models to provide accurate and scalable speech recognition solutions.

To handle the diverse nature of unstructured data, organizations should consider implementing a modular, event-driven ingestion architecture. This could involve using Apache Kafka or Apache Pulsar for real-time data streaming, coupled with specialized processors for each data type. RedPanda built an open-source data streaming platform designed to replace Apache Kafka with lower latency and higher throughput. Containerization technologies like Docker and orchestration platforms like Kubernetes can provide the flexibility needed to scale and manage these diverse processing pipelines. Graphlit build a data platform designed for spatial and unstructured data files automating complex data workflows, including data ingestion, knowledge extraction, LLM conversations, semantic search, and application integrations.

Data Storage and Retrieval. Traditional relational databases are ill-suited for storing and querying large volumes of unstructured data. Instead, organizations must consider a range of specialized storage solutions. For raw unstructured data, object storage systems like Amazon S3, Google Cloud Storage, or Azure Blob Storage provide scalable and cost-effective options. These systems can handle petabytes of data and support features like versioning and lifecycle management. MinIO developed an open-source, high-performance, distributed object storage system designed for large-scale unstructured data. For semi-structured data, document databases like MongoDB or Couchbase offer flexible schemas and efficient querying capabilities. These are particularly useful for storing JSON-like data structures extracted from unstructured sources. SurrealDB is a multi-model, cloud-ready database allows developers and organizations to meet the needs of their applications, without needing to worry about scalability or keeping data consistent across multiple different database platforms, making it suitable for modern and traditional applications. As machine learning models increasingly represent data as high-dimensional vectors, vector databases have emerged as a crucial component of the unstructured data stack. Systems like LanceDB, Marqo, Milvus, and Vespa are designed to efficiently store and query these vector representations, enabling semantic search and similarity-based retrieval. For data with complex relationships, graph databases like Neo4j or Amazon Neptune can be valuable. These are particularly useful for representing knowledge extracted from unstructured text, allowing for efficient traversal of relationships between entities. TerminusDB, an open-source graph database, can be used for representing and querying complex relationships extracted from unstructured text. This approach is particularly useful for enterprises needing to traverse relationships between entities efficiently. Kumo AI developed graph machine learning-centered AI platform that uses LLMs and graph neural networks (GNNs) designed to manage large-scale data warehouses, integrating ML between modern cloud data warehouses and AI algorithms infrastructure to simplify the training and deployment of models on both structured and unstructured data, enabling businesses to make faster, simpler, and more accurate predictions. Roe AI has built AI-powered data warehouse to store, process, and query unstructured data like documents, websites, images, videos, and audio by providing multi-modal data extraction, data classification and multi-modal RAG via Roe’s SQL engine.

When designing the storage architecture, it’s important to consider a hybrid approach that combines these different storage types. For example, raw data might be stored in object storage, processed information in document databases, vector representations in vector databases, and extracted relationships in graph databases. This multi-modal storage approach allows for efficient handling of different query patterns and use cases.

Data Processing and Analytics. Processing unstructured data at scale requires distributed computing frameworks capable of handling large volumes of data. Apache Spark remains a popular choice due to its versatility and extensive ecosystem. For more specialized workloads, frameworks like Ray are gaining traction, particularly for distributed machine learning tasks. For real-time processing, stream processing frameworks like Apache Flink or Kafka Streams can be employed. These allow for continuous processing of incoming unstructured data, enabling real-time analytics and event-driven architectures. When it comes to analytics, traditional SQL-based approaches are often insufficient for unstructured data. Instead, architecture teams should consider implementing a combination of techniques including (i) engines like Elasticsearch or Apache Solr provide powerful capabilities for searching and analyzing text-based unstructured data; (ii) for tasks like classification, clustering, and anomaly detection, machine learning models can be deployed on processed unstructured data. Frameworks like TensorFlow and PyTorch, along with managed services like Google Cloud AI Platform or Amazon SageMaker, can be used to train and deploy these models at scale; (iii) for data stored in graph databases, specialized graph analytics algorithms can uncover complex patterns and relationships. OmniAI developed a data transformation platform designed to convert unstructured data into accurate, tabular insights while maintaining control over their data and infrastructure. Roe AI

To enable flexible analytics across different data types and storage systems, architects should consider implementing a data virtualization layer. Technologies like Presto or Dremio can provide a unified SQL interface across diverse data sources, simplifying analytics workflows. Vectorize is developing a streaming database for real-time AI applications to bridge the gap between traditional databases and the needs of modern AI systems, enabling real-time feature engineering and inference.

Data Governance and Security. Unstructured data often contains sensitive information, making data governance and security critical considerations. Organizations must implement robust mechanisms for data discovery, classification, and access control. Automated data discovery and classification tools such as Sentra Security, powered by machine learning, can scan unstructured data to identify sensitive information and apply appropriate tags. These tags can then be used to enforce access policies and data retention rules. For access control, attribute-based access control (ABAC) systems are well-suited to the complex nature of unstructured data. ABAC allows for fine-grained access policies based on attributes of the data, the user, and the environment. Encryption is another critical component of securing unstructured data. This includes both encryption at rest and in transit. For particularly sensitive data, consider implementing field-level encryption, where individual elements within unstructured documents are encrypted separately.

Emerging Technologies and Approaches

LLMs like GPT-3 and its successors have demonstrated remarkable capabilities in understanding and generating human-like text. These models can be leveraged for a wide range of tasks, from text classification and summarization to question answering and content generation. For enterprises, the key challenge remains adapting these models to domain-specific tasks and data. Techniques like fine-tuning and prompt engineering allow for customization of pre-trained models. Additionally, approaches like retrieval-augmented generation (RAG) enable these models to leverage enterprise-specific knowledge bases, improving their accuracy and relevance. Implementing a modular architecture that allows for easy integration of different LLMs and fine-tuned variants might involve setting up model serving infrastructure using frameworks like TensorFlow Serving or Triton Inference Server, coupled with a caching layer to improve response times. Companies like Unstructured use open-source libraries and application programming interfaces to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines, enabling clients to transform simple data into language data and write it to a destination (vector database or otherwise).

Multi-modal AI Models. As enterprises deal with diverse types of unstructured data, multi-modal AI models that can process and understand different data types simultaneously are becoming increasingly important. Models like CLIP (Contrastive Language-Image Pre-training) demonstrate the potential of combining text and image understanding. To future proof organizational agility, systems need to be designed to handle multi-modal data inputs and outputs, potentially leveraging specialized hardware like GPUs or TPUs for efficient processing as well as implementing a pipeline architecture that allows for parallel processing of different modalities, with a fusion layer that combines the results. Adept AI is working on AI models that can interact with software interfaces, potentially changing how enterprises interact with their digital tools, combining language understanding with the ability to take actions in software environments. In the defense sector, Helsing AI is developing advanced AI systems for defense and national security applications that process and analyze vast amounts of unstructured sensor data in real-time, integrating information from diverse sources such as radar, electro-optical sensors, and signals intelligence to provide actionable insights in complex operational environments. In industrial and manufacturing sectors, Archetype AI offers a multimodal AI foundation model that fuses real-time sensor data with natural language, enabling individuals and organizations to ask open-ended questions about their surroundings and take informed action for improvement.

Federated Learning. For enterprises dealing with sensitive or distributed unstructured data, federated learning offers a way to train models without centralizing the data. This approach allows models to be trained across multiple decentralized devices or servers holding local data samples, without exchanging them. Implementing federated learning however requires careful design, including mechanisms for model aggregation, secure communication, and differential privacy to protect individual data points. Frameworks like TensorFlow Federated or PySyft can be used to implement federated learning systems. For example, in the space of federated learning for healthcare and life sciences, Owkin enables collaborative research on sensitive medical data without compromising privacy.

Synthetic Data Generation. The scarcity of labeled unstructured data for specific domains or tasks can be a significant challenge. Synthetic data generation, often powered by generative adversarial networks (GANs) or other generative models, may offer a solution to this problem. Incorporating synthetic data generation pipelines into machine learning workflows might involve setting up separate infrastructure for data generation and validation, ensuring that synthetic data matches the characteristics of real data while avoiding potential biases. RAIC Labs is developing technology for rapid AI modeling with minimal data. Their RAIC (Rapid Automatic Image Categorization) platform can generate and categorize synthetic data, potentially solving the cold start problem for many machine learning applications.

Knowledge Graphs. Knowledge graphs offer a powerful way to represent and reason about information extracted from unstructured data. Startups like Diffbot are developing automated knowledge graph construction tools that use natural language processing, entity resolution, and relationship extraction techniques to build rich knowledge graphs. These graphs capture the semantics of unstructured data, enabling efficient querying and reasoning about the relationships between entities. Implementing knowledge graphs involves (i) entity extraction and linking to identify and disambiguate entities mentioned in unstructured text; (ii) relationship extraction to determine the relationships between entities; (iii) ontology management to define and maintain the structure of the knowledge graph; and (iv) graph storage and querying for efficiently storing and querying the resulting graph structure. Businesses should consider using a combination of machine learning models for entity and relationship extraction, coupled with specialized graph databases for storage. Technologies like RDF (Resource Description Framework) and SPARQL can be used for semantic representation and querying.

While the potential of unstructured data is significant, several challenges must be addressed with most important are scalability, data quality and cost. Processing and analyzing large volumes of unstructured data requires significant computational resources. Systems must be designed that can scale horizontally, leveraging cloud resources and distributed computing frameworks. Unstructured data often contains noise, inconsistencies, and errors. Implementing robust data cleaning and validation pipelines is crucial for ensuring the quality of insights derived from this data. Galileo developed an engine that processes unlabeled data to automatically identify error patterns and data gaps in the model, enabling organizations to improve efficiencies, reduce costs, and mitigate data biases. Cleanlab developed an automated data-centric platform designed to help enterprises improve the quality of datasets, diagnose or fix issues and produce more reliable machine learning models by cleaning labels and supporting finding, quantifying, and learning data issues. Processing and storing large volumes of unstructured data can be expensive. Implementing data lifecycle management, tiered storage solutions, and cost optimization strategies is crucial for managing long-term costs. For example, Bem’s data interface transforms any input into ready-to-use data, eliminating the need for costly and time-consuming manual processes. Lastly, as machine learning models become more complex, ensuring interpretability of results becomes challenging. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can be incorporated into model serving pipelines to provide explanations for model predictions. Unstructured data also often contains sensitive information, and AI models trained on this data can perpetuate biases. Architects must implement mechanisms for bias detection and mitigation, as well as ensure compliance with data protection regulations.

Unstructured data presents both significant challenges and opportunities for enterprises. By implementing a robust architecture that can ingest, store, process, and analyze diverse types of unstructured data, enterprises can unlock valuable insights and drive innovation. Businesses must stay abreast of emerging technologies and approaches, continuously evolving their data infrastructure to handle the growing volume and complexity of unstructured data. By combining traditional data management techniques with cutting-edge AI and machine learning approaches, enterprises can build systems capable of extracting maximum value from their unstructured data assets. As the field continues to evolve rapidly, flexibility and adaptability should be key principles in any unstructured data architecture. By building modular, scalable systems that can incorporate new technologies and handle diverse data types, enterprises can position themselves to leverage the full potential of unstructured data in the years to come.

Dean Mai 3/12/24 Dean Mai 3/12/24

AI Roadmap for Enterprise Adoption

While the advent of ChatGPT sparked tremendous excitement for AI’s transformative potential, practical implementation reveals that sophisticated enterprise adoption demands more than just large language models (LLMs). Leading organizations now recognize the importance of model diversity – integrating proprietary, third-party and task-specific models. This evolving multi-model approach creates massive potential for startups to develop foundational tools and drive the advancement of enterprise AI into the next era.

In today’s fast-paced technological landscape, artificial intelligence stands as a powerful catalyst reshaping industries and presenting new solutions to longstanding challenges. Projections indicate the AI market could soar to nearly $740 billion by 2030 and contribute a staggering $15.7 trillion to the global economy. While the advent of ChatGPT sparked tremendous excitement for AI’s transformative potential, practical implementation reveals that sophisticated enterprise adoption demands more than just large language models (LLMs). Leading organizations now recognize the importance of model diversity—integrating proprietary, third-party and task-specific models. This evolving multi-model approach creates massive potential for startups to develop foundational tools and drive the advancement of enterprise AI into the next era. For nascent enterprises navigating the complexities of AI and emerging technologies, achieving success hinges on precise execution and continuous adaptation.

The Starting Point: Data

Every organization’s AI journey begins with its data. Most established enterprises have spent decades accumulating valuable datasets that exist entirely apart from the public internet. Customer support logs, point-of-sale transactions, IoT sensor data and EMR medical records – these business-specific datasets are the lifeblood for training enterprise AI models. However, in many cases this data resides in legacy on-premise systems built for transactional workloads, rather than analytics or machine learning. Much of it also contains sensitive personally identifiable information (PII) that requires careful data governance. Preparing these vast troves of enterprise data for AI presents a significant yet underexploited opportunity.

The challenge of data preparation gives rise to what some term “Data Ops 2.0”—a next-generation data engineering paradigm dedicated to priming data for advanced analytics and AI. This involves considerable efforts in labeling, cleaning, normalization and beyond. Startups are now emerging with innovative solutions to expedite, automate and scale this pivotal data preparation step across massive datasets. As next-generation AI models demand more data, the tools and infrastructure for rapidly preparing enterprise data will grow in strategic importance. Startups adept at transforming raw enterprise data into high-quality training data will emerge as pillars of the AI ecosystem.

Building Custom Foundation Models

Armed with clean and structured data, many organizations are keen to train proprietary foundation models in line with their specific industries or use cases. For example, a credit card company may harness decades of transaction data to train custom models that detect fraud, minimize risk or craft personalized customer incentives. Similarly, an insurance firm might train proprietary underwriting models using claims and policyholder data. While general-purpose LLMs like GPT-3 offer a strong starting point, they fall short in matching the specificity of models optimized for a company’s unique data assets.

Successfully training large proprietary models requires considerable technical expertise along with specialized infrastructure. Startups have emerged to democratize access to scalable infrastructure for in-house model development, employing techniques such as distributed training across GPU server clusters. Meanwhile, other startups provide turnkey solutions and managed services to assist enterprises in training custom foundation models on petabyte-scale internal datasets. As model sizes continue to swell from billions to trillions of parameters, the ability to efficiently train proprietary models will increasingly become a competitive advantage. Startups that provide superior tools and infrastructure to unlock the value in enterprise data will flourish in the market.

The AI Infrastructure Boom

Effectively training large ML models requires specialty hardware, notably high-performance GPUs. As the appetite for AI accelerates, demand for hardware like GPUs has skyrocketed, leading to supply shortages, long lead times and exorbitant costs. For instance, Nvidia's flagship H100 AI GPUs sell for over $50,000 on secondary markets, while AMD’s competing MI300X starts at $10,000-15,000. This discrepancy between supply and demand has fueled tremendous growth for startups innovating across the AI infrastructure stack.

In hardware, certain startups provide specialized enclosures packed with dense GPU servers and advanced liquid cooling systems, optimal for AI workloads. Other startups offer AI-tailored Kubernetes solutions, streamlining the setup and oversight of distributed training infrastructure. At the forefront of innovation, emerging chip startups are pioneering novel architectures like GPNPUs, TPUs, IPUs and Neurosynaptic chips. These present alternatives to GPUs for ML training and inference. As AI continues to permeate various industries, the demand for advanced infrastructure is poised for growth.

Moreover, AI adoption is shifting towards edge-based solutions, where models are deployed directly on devices rather than centralized cloud data centers to facilitate real-time decision making. This trend is driving innovation in model compression and the development of efficient inference chips tailored for edge devices. Startups that can facilitate on-device ML while preserving model accuracy stand to gain significantly. In this realm, Myriad forged a partnership with Quadric, a developer specializing in GPNPU architecture optimized for on-device artificial intelligence computing.

Enhancing Public Foundation Models

While some organizations invest in proprietary foundation models, many others leverage publicly available models like GPT-3 as a starting point. However, applying these general-purpose models often falls short of delivering satisfactory results straight out of the box. Consequently, a new category of “model tuning” has emerged. Existing public LLMs are now often fine-tuned on an enterprise’s internal data to create customized solutions. For example, an e-commerce company could refine an open-source product description model based on their catalog data.

Startups have surfaced to provide managed services and tools that simplify the process of tuning and adapting public foundation models for various business use cases. Rather than investing in training models from scratch, this fine-tuning approach allows companies to augment the “pre-trained smarts” of public LLMs at a fraction of the cost and time. As published models rapidly improve due to open-source competition, enhancing them through transfer learning presents a compelling option for many enterprises embarking on their AI journey.

MLOps: From Experimentation to Production

Foundation models—whether fully custom or fine-tuned—wield significant power. However, transitioning them into large-scale production requires considerable software infrastructure. MLOps platforms have emerged to streamline the end-to-end lifecycle of deploying and managing machine learning in production. Nevertheless, these conventional MLOps tools must undergo adaptation to address the unique complexities of LLMs. With their colossal size, insatiable data requirements and sensitivity to latency purpose-built solutions become imperative.

Startups are racing to build LLM-specific MLOps or LLMOps stacks, facilitating the seamless deployment of models to meet enterprise demands for scalability, reliability and compliance. Ensuring robust model monitoring, explainability and governance is crucial as organizations build trust in AI systems and mitigate risks. LLMOps solutions tailored to oversee and optimize the effective utilization of foundation models stand to seize a massive opportunity as LLMs integrate into business workflows.

Augmenting Foundation Models

LLMs boast immense power, but are also bound by inherent limitations. Their memory capacity is constrained by long-term context, they lack native commonsense reasoning, and they stumble when confronted with tasks requiring symbolic mathematical reasoning. This has spurred innovation in techniques that complement and enhance LLMs:

Neuro-symbolic AI combines the pattern recognition strengths of neural networks with formal logic and knowledge representation. Startups are pioneering new advancements to improve reasoning and explainability.
Reinforcement learning-based (RL) models learn through trial-and-error interactions, rather than static training data. Startups are leveraging RL in fields including robotics, scheduling and others.
Retrieval-augmented models incorporate external knowledge bases to supplement LLMs’ limited memory and knowledge. Startups are driving innovation in semantic search, knowledge graphs and LLM enhancement.

Embracing a portfolio approach—combining task-specific models, integrating hybrid techniques and strategically applying complementary AI methods alongside foundation models—represents the future of enterprise AI. Startups facilitating this integrated, multi-modal AI approach will power the next generation of intelligent business applications.

The Way Forward: AI Diversity & Infrastructure

It’s clear that building impactful enterprise AI goes beyond simply adopting individual public foundation models. Leading organizations recognize the importance of model diversity – from proprietary and third party to fine-tuned and specialized, integrated and augmented. Succeeding in this emerging world of heterogeneous, multi-model AI demands a robust underlying infrastructure stack.

From handling petabyte-scale datasets to managing distributed training clusters, and ensuring model deployment observability to monitoring compliance, enterprises require capabilities across the full AI lifecycle. This greenfield opportunity extending beyond foundational models has given rise to revolutionary startups, from cutting-edge chips to LLM-Ops software to Industry 4.0 solution providers. By providing the picks and shovels to support model diversity and simplify infrastructure complexity, these startups will power the next phase of enterprise AI adoption.

The bottom line: while public LLMs have dominated the headlines, practical business adoption requires a much broader and deeper range of AI capabilities. Employing an ensemble approach with a variety of integrated models calls for extensive tooling and infrastructure. Across the tech stack—from data preparation to training systems to operations software—this multi-model reality presents a massive market opportunity. Startups that deliver the capabilities to handle model diversity while simplifying complex infrastructure will thrive in the coming Cambrian explosion of enterprise AI adoption.

Dean Mai 2/11/24 Dean Mai 2/11/24

Navigating the Future Security Landscape with a SecOps Cloud Platform

The field of information security is constantly evolving, marked by the continuous emergence of new technologies, threats, and regulations. With generative AI, shifts toward early application security measures, and post-decryption Network Detection and Response (NDR) continuing to rise, 2024 is poised to present new, ever-evolving risks and an increase in ransomware globally.

The field of information security is constantly evolving, marked by the continuous emergence of new technologies, threats, and regulations. With generative AI, shifts toward early application security measures, and post-decryption Network Detection and Response (NDR) continuing to rise, 2024 is poised to present new, ever-evolving risks and an increase in ransomware globally.

These new trends are significantly shaping how organizations approach security strategy and operations. However, as threats – ranging from supply chain attacks to AI-driven phishing – continue to evolve, the security landscape is poised to undergo even further transformation in the near future.

In this complex and changing environment, having flexible and adaptable security architecture is critical. This is precisely where LimaCharlie's SecOps Cloud Platform proves invaluable. As a cloud-native security orchestration platform, it offers the versatility and agility necessary for organizations to navigate evolving security paradigms and seamlessly integrate disparate tools into a unified framework.

Converging Process and Technology with Security Orchestration

Many institutions have accumulated mountains of disjointed security tools. This results in fragmented visibility, manual processes, and inefficient workflows. Security teams now need a solution to seamlessly manage these technologies and workflows.

LimaCharlie is the ideal hub for security orchestration. The platform collects and standardizes data from various tools into a central data lake through APIs and log ingestion. This unified dataset drives process automation to streamline detection, investigation, and mitigation efforts. The SecOps Cloud Platform leverages pre-built integrations with leading incident response platforms to easily construct playbooks that chain together capabilities across vendors and align security processes and technologies into a cohesive unit.

Centralized Orchestration for Hybrid Security Operations

As more entities embrace hybrid and multi-cloud infrastructures to gain visibility across environments and coordinate security controls, they risk data segregation. The SecOps Cloud Platform addresses this by breaking down data and tool silos, providing security teams with a centralized orchestration layer.

The platform ingests and normalizes data from on-premise security information and event management systems (SIEMs), SaaS solutions, and endpoint agents to create a unified dataset. This is the foundation for AI-driven detection, automated response playbooks, and federated search across security domains. Having a cloud-based orchestration platform is the only scalable way to gain visibility and control in today's hybrid distributed environments. It also makes it easy to layer on new security capabilities as threats and infrastructure evolve.

Gaining visibility into hybrid infrastructure is crucial for security, but collecting and storing massive amounts of security data can become prohibitively expensive. Ideally, data ingestion and retention should align with usage patterns. LimaCharlie employs just-in-time retrieval, allowing querying and selective retrieval of historical data from endpoints as needed for investigations. This approach minimizes the cost of retaining all telemetry indefinitely in warm storage. Lightweight endpoint agents are strategically deployed to critical assets, rather than exhaustively across all systems. Network traffic analysis focuses on extracting metadata like flows rather than full packet capture. Together, these techniques balance visibility and economics for sustainable security across hybrid infrastructure.

Embracing Elasticity with Cloud-Native Security

Legacy security appliances and on-premise management consoles make it hard to adopt ephemeral cloud infrastructure or adjust capacity over time. Modern security demands solutions designed for the cloud. As a cloud-native platform, LimaCharlie provides the elasticity and agility necessary for dynamic environments. Its multi-tenant architecture seamlessly scales on demand to accommodate massive workloads across various customers.

Unlike siloed products, LimaCharlie offers a suite of microservices that can be flexibly chained together. This architecture allows for quick deployment or removal of new capabilities as needed. Consequently, organizations, especially those prioritizing cloud-first approaches, can easily adjust their security posture in response to evolving needs.

Shifting Security Left in the App Dev Lifecycle

As the threat landscape evolves, organizations are prioritizing application security, particularly in light of the rise in supply chain attacks. Attempting to address security concerns after applications get built is ineffective. Instead, there’s a growing recognition of the need to integrate security practices and testing earlier in the development lifecycle – a concept often referred to as “shift left.”

This approach demands close integration between security tools and developer environments. LimaCharlie facilitates this integration by providing API-level hooks into the software delivery pipeline. Security checks such as Static Application Security Testing (SAST), Dynamic Application Security Testing, (DAST), and Software Composition Analysis (SCA) can be directly woven into the Continuous Integration/Continuous Deployment (CI/CD) process, enabling rapid identification and resolution of issues.

At the same time, its integration runtime protection and posture management capabilities, “shift right”, ensure security measures extend beyond the build stage. The LimaCharlie agent injects inline controls into running applications to prevent and respond to attacks. It also continuously monitors production environments for risky configurations or unauthorized changes. Together, these “shift left” and “shift right” measures create a seamless AppSec lifecycle powered through the SecOps Cloud Platform.

Flexibility for Detection Engineering and MDR

As detection engineering and Managed Detection Response (MDR) services gain prominence, security teams need greater flexibility and customization in implementing detection and response mechanisms, rather than being constrained by pre-packaged vendor modules. LimaCharlie enables this shift by providing easy access to security data through APIs.

This capability empowers detection engineers to rapidly build and refine custom detections tailored to the organization's unique environment. It also allows MDRs to more easily integrate client data into their existing Security Operation Center (SOC) workflows. The platform's microservices architecture enables organizations to leverage as much or as little functionality as they need. This contrasts with monolithic security suites that compel customers to adopt all components of a vendor's stack. With LimaCharlie, organizations retain autonomy over the selection and configuration of capabilities, offering a superior level of control and adaptability in security operations.

Enabling MDR Services to Scale and Customize

Modern organizations are turning to MDR services to monitor alerts and augment security capabilities. But traditional MDR solutions often lack customization, relying on a fixed stack of tools. The SecOps Cloud Platform changes this paradigm by allowing open but secure access to data. MDRs leverage APIs to ingest client telemetry into their existing SOC systems and tailor detections based on specifics of an organization's infrastructure and risks. LimaCharlie ensures consistency of data and tooling across an MDR provider's different customers. The platform normalizes and streams data in a common schema rather than different tools and formats. This allows MDRs to industrialize and scale their services rapidly.

We back visionary companies that are strategically positioned to lead their markets – especially in next-generation industries. As the information security industry integrates AI capabilities and faces unprecedented challenges, our network of corporate titans and top-tier venture capitalists is poised to support LimaCharlie’s long-term vision for success.

Constant changes in technology and threat trends are fundamentally reshaping our information security strategies. While the cybersecurity landscape will continue to rapidly evolve, LimaCharlie helps organizations rapidly adapt and finally stay ahead of tomorrow's threat actors.

Dean Mai 1/11/23 Dean Mai 1/11/23

Edge Computing and the Internet of Things: Investing in the Future of Autonomy

One of the most ubiquitous technological advancements making its way into devices we use every single day is autonomy. Autonomous technology via the use of artificial intelligence (AI) and machine learning (ML) algorithms enables core functions without human interference. As the adoption of ML becomes more widespread, more businesses are using ML models to support mission-critical operational processes. This increasing reliance on ML has created a need for real-time capabilities to improve accuracy and reliability, as well as reduce the feedback loop.

One of the most ubiquitous technological advancements making its way into devices we use every single day is autonomy. Autonomous technology via the use of artificial intelligence (AI) and machine learning (ML) algorithms enables core functions without human interference. As the adoption of ML becomes more widespread, more businesses are using ML models to support mission-critical operational processes. This increasing reliance on ML has created a need for real-time capabilities to improve accuracy and reliability, as well as reduce the feedback loop.

Previously, chip computations were processed in the cloud rather than on-device; today, the AI/ML models required to complete these tasks are too large, costly and computationally hungry to be done locally. Instead, the technology relied on cloud computing, outsourcing data tasks to remote servers via the internet. While this was an adequate solution when IoT technology was in its infancy, it certainly wasn’t infallible—though proven to be a transformational tool for storing and processing data, cloud computing comes with its own performance and bandwidth limitations that aren’t well-suited for autonomy at scale, which needs nearly instantaneous reactions with minimal lag time. To-date, certain technologies have been limited by the parameters of cloud computing.

The Need for New Processing Units

The central processing units (CPUs) commonly used in traditional computing devices are not well-suited for AI workloads due to two main issues:

Latency in data fetching: AI workloads involve large amounts of data, and the cache memory in a CPU is too small to store all of it. As a result, the processor must constantly fetch data from dynamic random access memory (DRAM), which creates a significant bottleneck. While newer multicore CPU designs with multithreading capabilities can alleviate this issue to some extent, they are not sufficient on their own.
Latency in instruction fetching: In addition to the large volume of data, AI workloads require many repetitive matrix-vector operations. CPUs typically use single-instruction multiple data (SIMD) architectures, which means they must frequently fetch operational instructions from memory to be performed on the same dataset. The latest generation of AI processors aims to address these challenges through two approaches: (i) expanding the multicore design to allow thousands of threads to run concurrently, thereby fixing the latency in data fetching, or (ii) building processors with thousands of logic blocks, each preprogrammed to perform a specific matrix-vector operation, thereby fixing the latency in instruction fetching.

First introduced in 1980s, field programmable gate arrays (FPGAs) offered the benefit of being reprogrammable, which enabled them to gain traction in diverse industries like telecommunications, automotive, industrial, and consumer applications. In AI workloads, FPGAs fix latency associated with instruction fetching. FPGAs consist of tens of thousands of logic blocks, each of which is preprogrammed to carry out a specific matrix-vector operation. On the flip side, FPGAs are expensive, have large footprints, and are time-consuming to program.

Graphics processing units (GPUs) were initially developed in the 1990s to improve the speed of image processing for display devices. They have thousands of cores that enable efficient multithreading, which helps to reduce data fetching latency in AI workloads. GPUs are effective for tasks such as computer vision, where the same operations must be applied to many pixels. However, they have high power requirements and are not suitable for all types of edge applications.

Specialized chips, known as AI chips, are often used in data centers for training algorithms or making inferences. Although there are certain AI/ML processor architectures that are more energy-efficient than GPUs, they often only work with specific algorithms or utilize uncommon data types, like 4- and 2-bit integers or binarized neural networks. As a result, they lack the versatility to be used effectively in data centers with capital efficiency. Further, training algorithms requires significantly more computing power compared to making individual inferences, and batch-mode processing for inference can cause latency issues. The requirements for AI processing at the network edge, such as in robotics, Internet of Things (IoT) devices, smartphones, and wearables, can vary greatly and, in cases like the automotive industry, it is not feasible to send certain types of work to the cloud due to latency concerns.

Lastly, application specific integrated circuits (ASICs) are integrated circuits that are tailored to specific applications. Because the entire ASIC is dedicated to a narrow set of instructions, they are much faster than GPUs; however, they do not offer as much flexibility as GPUs or FPGAs in terms of being able to handle a wide range of applications. As a consequence, ASICs are increasingly gaining traction in handling AI workloads in the cloud with large companies like Amazon and Google. However, it is less likely that ASICs will find traction in edge computing because of the fragmented nature of applications and use cases.

The departure from single-threaded compute and the large volume of raw data generated today (making it impractical for continuous transfer) resulted in the emergence of edge computing, an expansion of cloud computing that addresses many of these shortcomings. Development of semiconductor manufacturing processes for ultra-small circuits (7nm and below) that pack more transistors onto a single chip allows faster processing speeds and higher levels of integration. This leads to significant improvements in performance, as well as reduced power consumption, enabling higher adoption of this technology for a wide range of edge applications.

Edge computing places resources closer to the end user or the device itself (at the “edge” of a network) rather than in a cloud data center that oversees data processing for a large physical area. Because this technology sits closer to the user and/or the device and doesn’t require the transfer of large amounts of data to a remote server, edge-powered chips increase performance speed, reduce lag time and ensure better data privacy. Additionally, since edge AI chips are physically smaller, they’re more affordable to produce and consume less power. As an added bonus, they also produce less heat, which is why fewer of our electronics get hot to the touch with extended use. AI/ML accelerators designed for use at the edge tend to have very low power consumption but are often specialized for specific applications such as audio processing, visual processing, object recognition, or collision avoidance. Today, this specialized focus can make it difficult for startups to achieve the necessary sales volume for success due to the market fragmentation.

Supporting mission-critical operational processes at the edge

The edge AI chip advantage proving to be arguably the most important to helping technology reach its full potential is its significantly faster operational and decision-making capabilities. Nearly every application in use today requires near-instantaneous response, whether to generate more optimal performance for a better user experience or to provide mission-critical reflex maneuvers that directly impact human safety. Even in non-critical applications, the increasing number of connected devices and equipment going online is causing bandwidth bottlenecks to become a deployment limitation, as current telecommunications networks may not have sufficient capacity to handle the data volume and velocity generated by these devices.

For example, from an industrial perspective, an automated manufacturing facility is expected to generate 4 petabytes of data every day. Even with the fastest (unattainable) 5G speeds of 10 Gbps, it would take days to transfer a day’s worth of data to the cloud. Additionally, the cost of transferring all this data at a rate of $0.40 per GB over 5G could reach as much as $1.6 million per day. And unsurprisingly, the autonomous vehicle industry will rely on the fastest, most efficient edge AI chips to ensure the quickest possible response times in a constantly-changing roadway environment — situations that can quite literally mean life and death for drivers and pedestrians alike.

Investing in Edge AI

Nearly every industry is now impacted by IoT technology, there is a $30 billion market for edge computing advancements. The AI chip industry alone is predicted to increase to more than $91 billion by 2025, up from $6 billion in 2018. Companies are racing to create the fastest, most efficient chips on the market, and only those operating with the highest levels of market and customer focus will see success.

As companies are increasingly faced with decisions regarding investment in new hardware for edge computing, staying nimble is key to a successful strategy. Given the rapid pace of innovation in the hardware landscape, companies seek to make decisions that provide both short-term flexibility, such as the ability to deploy many different types of machine learning models on a given chip, and long-term flexibility, such as the ability to future proof by easily switching between hardware types as they become available. Such strategies could typically include a mix of highly specific processors and more general-purpose processors like GPUs, software- and hardware-based edge computing to leverage the flexibility of software, and a combination of edge and cloud deployments to gain the benefits from both computing strategies.

The startup that is set out to simplify the choice of short-/long-term, compute-/power-constrained environments by getting an entirely new processor architecture off the ground is Quadric. Quadric is a licensable processor intellectual property (IP) company commercializing a fully-programmable architecture for on-device ML inference. The company built a cutting-edge processor instruction set that utilizes a highly parallel architecture that efficiently executes both machine learning “graph code” as well as conventional C/C++ signal processing code to provide fast and efficient processing of complex algorithms. Only one tool chain is required for scalar, vector, and matrix computations which are modelessly intermixed and executed on a single pipeline. Memory bandwidth is optimized by a single unified compilation stack that helps result in significant power minimization.

Quadric takes a software-first approach to its edge AI chips, creating an architecture that controls data flow and enables all software and AI processing to run on a single programmable core. This eliminates the need for other ancillary processing and software elements and blends the best of current processing methods to create a single, optimized general purpose neural processing unit (GPNPU).

The company recently announced its new Chimera™ GPNPU, a licensable IP (intellectual property) processor core for advanced custom silicon chips utilized in a vast array of end AI and ML applications. It is specifically tailored to accelerate neural network-based computations and is intended to be integrated into a variety of systems, including embedded devices, edge devices, and data center servers. The Chimera GPNPU is built using a scalable, modular architecture that allows the performance level to be customized to meet the specific needs of different applications.

One of the key features of the Chimera GPNPU is its support for high-precision arithmetic in addition to the conventional 8-bit precision integer support offered by most NPUs. It is capable of performing calculations with up to 16-bit precision, which is essential for ensuring the accuracy and reliability of neural network-based computations, as well as performing many DSP computations. The Chimera GPNPU supports a wide range of neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM) networks. As a fully C++ programmable architecture, a Chimera GPNPU can run any machine learning algorithm with any machine learning operator, offering the ultimate in flexible high-performance futureproofing.

Dean Mai 11/16/22 Dean Mai 11/16/22

Automating Deployment, Security, and Scalability with Managed Security Services

The cybersecurity industry is facing two major challenges: an increase in cybercrime and sophisticated attacks alongside a vast deficiency of cybersecurity practitioners to fill open positions. There are currently more than 4.7 million overall cybersecurity employees, with over 400,000 hired this year alone. Despite this hiring increase, recent data reveals a need for 3.4 million additional cybersecurity workers worldwide in order to effectively secure assets. Cybercrimes rose more than 600% over the last year, causing many organizations to increase their cybersecurity budgets with the goal of hiring even more security experts.

The cybersecurity industry is facing two major challenges: an increase in cybercrime and sophisticated attacks alongside a vast deficiency of cybersecurity practitioners to fill open positions. There are currently more than 4.7 million overall cybersecurity employees, with over 400,000 hired this year alone. Despite this hiring increase, recent data reveals a need for 3.4 million additional cybersecurity workers worldwide in order to effectively secure assets. Cybercrimes rose more than 600% over the last year, causing many organizations to increase their cybersecurity budgets with the goal of hiring even more security experts. In fact, the number of companies planning to expand their cybersecurity teams has grown from 51% in 2020 to nearly 75% this year. This combination of increased cyberattacks and insufficient staffing has left many companies unable to secure their systems with existing in-house resources.

Against a backdrop of global economic volatility, cybersecurity professionals are facing increasingly complex architecture environments, a rise in disparate cloud-based tools and systems, and persistent external threats and attacks. Additionally, the proliferation of emerging technologies like artificial intelligence and machine learning, big data analytics, threat intelligence and cutting-edge automation platforms are starting to necessitate specialized services that are most up to date on the newest advancements in security—something existing in-house teams may find harder to keep up with. The necessity to adapt cybersecurity knowledge in the face of technological advancements is being observed at the national level: the U.S. administration recently launched the 120-day Cybersecurity Apprenticeship Sprint, a program to help a wide array of young professionals gain skills in the field.

At the same time, the current state of cybersecurity employment is creating sizable barriers and roadblocks for many organizations. Across distributed workforce, hiring freezes and current market dynamics, the shortage of skilled IT/security professionals on staff and the inability to stay updated with the recent tools, technologies, and practices exacerbates corporate concerns.

The culmination of these factors has prompted an increasing number of organizations to turn to managed security service providers (MSSPs) or managed detection and response firms (MDRs) to handle their information Security Operations Center (SOC) needs.

Benefits and Offerings of MSSPs & MDRs

A managed security service provider is an IT organization that delivers outsourced operating and alert monitoring of an organization’s systems and security devices through both software and services. MSSPs offer various security products and solutions to their clients ranging from device management, security training, and assessment services to incident detection and emergency response services. On the basis of their fundamental effects on security management, products and services can be classified into prevention, detection, and response. At Xerox, for example, Xerox IT Services Security can serve as an MSSP to help customers identify, assess and implement key security controls and provide IT leadership and guidance every step of the way. Its assessments offer hands-on technical validation of all security technologies within customers’ IT environments, including end user devices, servers, network, firewalls and other security devices. While MSSPs can be heavily automated services, MDR is human-operated, with live threat hunters monitoring customer networks in real time for signs of cyber intrusion and/or compromise.

For some companies, outsourcing these requests to managed providers can be more cost-effective than hiring an in-house security team—something more business leaders may consider due to recent economic volatility and talks of a potential recession. And while larger enterprise companies may benefit from managed services due to the likelihood of facing heightened and more targeted security threats against their network, small- to medium-sized businesses (SMBs) may find these services are the only alternative to building out a robust in-house team. MSSPs and MDRs can also be utilized in addition to an in-house security or IT team, taking the time-intensive work of activities like security monitoring or proactive threat hunting, detection and response off that team’s plate to enable them to focus on more core business functions.

Current Market Opportunities

According to latest reports, the MSSP services industry is entering a huge growth period. Valued at $23.19 billion in 2021, the market is expected to reach a $56.6 billion valuation by 2027. It’s estimated that approximately 30% of SMBs have not yet outsourced their IT management needs, suggesting strong growth potential for new client acquisition. Given the current cybersecurity job market and increasing cyber threats, it’s likely slower adopters will increasingly see value in engaging with MSSPs and begin to outsource these needs.

While already operating with a focus on utilizing and understanding advanced technologies, the industry is still ripe for new innovation. One of the biggest technology trends over the next few years across enterprise, midmarket and SMBs will be using hyperautomation (streamlining procedures by introducing automation on a larger scale through tools like artificial intelligence and machine learning) to address an entire system rather than just separate parts. Specifically to MSSPs, Gartner estimates the introduction of hyperautomation tools will lower operational costs by up to 30% in the next two years.

Successful managed providers will have to react quickly to emerging technological disruption to attract the best talent and retain customers, especially as more organizations migrate to cloud & multi-cloud services and experience those effects on their increasing on-premise maintenance and hardware sales (making scalability and security a major challenge). The MSSP industry is at an inflection point of accelerated digitization and adoption of new security tools, and we expect to see a rapid increase in emerging cybersecurity companies over the next decade that capitalize on the increased market demand as a result. As such, investors are moving to increasingly support security software startups, built around applications, data and identity, that have developed MSSP/MDR-centric capabilities, as evidenced by our recent investments in LimaCharlie and Anvilogic.

LimaCharlie

LimaCharlie is an Information Security Infrastructure-as-a-Service (SIaaS) developer and provider of general-purpose, component-driven, cloud-based information security tools and infrastructure. Similar to how Amazon Web Services or Google Cloud Platform deliver core components of IT, LimaCharlie offers a full stack of cloud-based information security tools through an infrastructure on-demand platform, lowering barriers to entry for new providers. By giving security teams full control over how they manage their security infrastructure, the company enables Enterprise and MSSPs to detect and respond to threats, automate processes, reduce vendor usage and future-proof security operations. This approach enables companies to access the precise capabilities they need and only pay for what they use, a model that has previously enabled cloud service providers to disrupt the traditional IT market. LimaCharlie also enables organizations to route their data at the event level, which means they can drastically reduce storage costs by only sending relevant data to high-cost security tools like Splunk, Elastic, Sumo Logic, or other SIEM and data analytics solutions.

Anvilogic

Anvilogic is an AI-first automated Security Operations Center (SOC) platform that leverages the economic advantages of cloud data warehouses in comparison with legacy on-premises Security Information and Event Management (SIEM) solutions. However, legacy on-prem SIEM solutions are proving to be too rigid and expensive to maintain as security teams embrace cloud-based products and alert data volumes continue to grow. For data breaches and cybersecurity threats, SOC processes haven’t changed much in a decade. By leveraging a cloud data warehouse (e.g., Snowflake) instead, it is easier for organizations and MSSPs to scale storage at a predictable cost and centralize security data. With a cloud-data warehouse, security tools can also capture business data that can provide additional context. For instance, Anvilogic offers organizations a collaborative SOC content platform that sits on top of a cloud data warehouse and ingests signals across both security tools and SaaS apps, running security analytics across these sources to identify threats in real-time. By leveraging the economic advantages of cloud data warehouses, Anvilogic delivers high performance at a predictable cost. Companies like Anvilogic are making it simpler for security teams to correlate signals across their software stack and make the transition to cloud-native approaches to security, creating a modern, future-proof SOC.

Looking Ahead

Today’s cybersecurity and economic environments are creating the perfect opportunity for increased MSSP & MDR growth and adoption. Over the coming years, we’ll see more organizations outsourcing significant portions of their security and IT tasks to these external teams, making this a great time for investors and entrepreneurs alike to focus on what tools they can build and support for the industry.

Dean Mai 1/23/22 Dean Mai 1/23/22

Federated Machine Learning as a Distributed Architecture for Real-World Implementations

Present performance of machine learning systems—optimization of parameters, weights, biases—at least in part relies on large volumes of training data which, as any other competitive asset, is dispersed, distributed, or maintained by various R&D and business data owners, rather than being stored by a single central entity. Collaboratively training a machine learning (ML) model on such distributed data—federated learning, or FL—can result in a more accurate and robust model than any participant could train in isolation.

Present performance of machine learning systems—optimization of parameters, weights, biases—at least in part relies on large volumes of training data which, as any other competitive asset, is dispersed, distributed, or maintained by various R&D and business data owners, rather than being stored by a single central entity. Collaboratively training a machine learning (ML) model on such distributed data—federated learning, or FL—can result in a more accurate and robust model than any participant could train in isolation.

FL, also known as collaborative learning, is a method that trains an algorithm collaboratively across multiple decentralized edge devices (e.g., a device providing an entry point into enterprise or service provider core networks) or servers holding local data samples without exchanging them among the edge devices. The appeal of FL stems from its ability to provide near-real-time access to large amounts of data, without requiring the transfer of that data between remote devices. In a sense, this means that the data is not “distributed”, but rather is “federated” across the devices. This may sound similar to the concept of distributed computing, which refers to the use of multiple devices to perform a task, such as a computer, a smartphone, or any other edge device. However, in FL, the data is not shared between the devices, and therefore, each device holds its own data and calculates its own model. Such collaborative training is usually implemented by a coordinator/aggregator that oversees the participants, and can result in more robust and accurate ML models than any single participant could hope to train in isolation. However, the data owners are often unwilling (e.g., limited trust), unable (e.g., limited connectivity or communication resources), and/or legally prohibited (e.g., privacy laws, such as HIPAA, GDPR, CCPA, and local state laws) from openly sharing all or part of their individual data sources with each other. In FL, however, raw edge device data is not required to be shared with the server or among distinct separate organizations, which distinguishes FL from traditional distributed optimization by bringing it under the orchestration of a central server and also requires FL to contend with heterogeneous data.

Hence, in FL, the star topology is typically used, in which one central server coordinates the initialization, communication, and aggregation of the algorithms, and serves as the central place for the aggregation of model updates and model updates. In this design the local nodes have some degree of trust in this central server, but still maintain independent and have their own degree on control of whether they participate and take ownership over their local data, the central server does not have access to the original local data.

There are two types of FL: horizontal and vertical. Horizontal FL involves collaborative training on horizontally partitioned datasets (e.g., the participants' datasets have common, similar, and/or overlapping feature spaces and uncommon, dissimilar, and/or non-overlapping sample spaces). For instance, two competing banks might have different clients (e.g., different sample spaces) while having similar types of information about their clients, such as age, occupation, credit score, and so on (e.g., similar feature spaces). Vertical FL, on the other hand, involves collaborative training on vertically partitioned datasets (e.g., the participants' datasets have common, similar, and/or overlapping sample spaces and uncommon, dissimilar, and/or non-overlapping feature spaces). For instance, a bank and an online retailer might serve the same clients (e.g., similar sample spaces) while having different types of information about those clients (e.g., different feature spaces).

Nowadays, growing concerns and restrictions on data sharing and privacy, such as the GDPR of Europe and the Cyber Security Law of China, made it difficult, if not impossible, to transfer, merge and fuse data obtained from different data owners. With FL, a device on the edge can send potentially de-identified updates to a model instead of sharing the entirety of its raw data in order for the model to be updated. As a result, FL greatly reduces privacy concerns since the data never leaves these devices, just an encrypted, perturbed gradient of data leave. Such framework can be a useful tool for many different types of organizations, from companies who do not want to disclose proprietary data to the public, to developers who may want to build privacy-preserving AI applications, like chatbots.

One of the earlier applications of FL was mobile keyboard (next word) predictions; the details of what an individual has typed remains on the device, and isn’t shared with the cloud-based, machine learning provider. The provider can see securely aggregated summaries of what’s been typed and corrected, across many devices. But they can’t see the contents of what a user has typed. This protects individual people’s privacy, while improving predictions for everyone. This approach is also compatible with additional personalized learning that occurs on device.

While FL can be adopted to build models locally and may boost model performance by widening the amount of available training data, due to its reliance on global synchrony and data exchange, whether this technique can be deployed at scale or not, across multiple platforms in real-world applications, remains unclear (particularly if the devices or servers in the system are highly secured). The main challenge with federated learning is that it relies heavily on the secure execution of decentralized computing due to the many iterations of training and the the large number of devices this needs to be communicated to. As the communication overhead is networked and can be several orders of magnitude slower than local computation, the system requires reduction of the total number of communication rounds and the size of the transmitted messages. Further, support of both system heterogeneity (devices having highly dynamic and heterogeneous network, hardware, connection, and power availability) and data heterogeneity (data is generated by different users on different devices, and therefore may have different statistical distribution, or non-IID) is required to attain high performance. Classical statistics pose theoretical challenges to FL, as user device data collection and training defeats any guarantee or assumption that training data is independent and identically distributed, IID. This is a distinguishing feature of FL. The loss of the strong statistical guarantee allows the system with high dimensionality to make inferences about a wider population of data, including for example in training set samples collected by edge devices. Lastly, the algorithms used in federated learning are fundamentally different from the algorithms used in decentralized computing systems, such as the algorithms used in blockchain. If the devices in a federated learning system do not have the same privacy-preservation or security models (as those in traditional computing environments), then the system will likely perform poorly or not function at all. For added privacy, an additional optional layer can be added, like Secure Multi-party Computation (SMC), Differential Privacy, or Homomorphic Encryption, in the case that even the aggregated information in the form of model updates may also contain privacy-sensitive information. Handling privacy sensitive information is one of the main motivations behind the development of homomorphic encryption in federated learning systems. Homomorphic encryption uses mathematical operations without revealing the private key, or “secret key,” used to encrypt the data. Thus, homomorphic encryption can be used to process encrypted data without revealing the model parameters or the encrypted data to the device that executed the computation – the device can only learn the parameters of the model, and cannot decrypt the data. Without learning the model parameters, the server is unable to perform attack vectors, such as side-channel attacks, on the model. Yet functional encryption techniques can be far more computationally efficient than homomorphic encryption techniques. It can involve a public key that encrypts confidential data and a functional secret key that, when applied to the encrypted confidential data, yields a functional output based on the confidential data without decrypting/revealing the confidential data. This can result in much faster FL than existing systems/techniques can facilitate (e.g., mere seconds to train via hybrid functional encryption versus hours to train via homomorphic encryption).

Drivers & Opportunities

Fintech. Collectively, the amount of financial data (structured and unstructured) generated and processed worldwide by current banking systems and other financial service providers is incalculable. As such, the ability to extract value from data in the fintech sector while protecting privacy and complying with regulations is of great interest to both government and industry. The increased availability of large-scale, high-quality data, along with the growing desire for privacy in the wake of numerous data breaches, has led to the development of FL in the fintech sector. Today, FL in the fintech sector is being used to extract value from data in a way that preserves privacy while complying with regulations but its applications are still in its infancy, and many challenges abound. One of the main challenges is the difficulty in obtaining permission from end users to process their data. Once permission has been obtained, it is difficult to guarantee that all data is processed correctly. The data may be inconsistent and sometimes includes errors, so it is difficult to estimate the accuracy of the model after data is aggregated across multiple devices. The process may also be biased due to individual differences among a large number of devices, as some devices may be unable to complete the process due to a lack of resources (power, storage, memory, etc.). All these challenges require solution design that will allow the aggregation process to function effectively, coupled with encryption of collected data while in transit from the device to the server and at the server to protect user privacy.

Healthcare. The majority of healthcare data collection today is accomplished by paper forms, which are prone to errors and often result in under-reporting of adverse events. Of all the global stored data, about 30% resides in healthcare and it is fueling the development and funding for AI algorithms. By moving medical data out of data silos and improving the quality and accuracy of medical records, the use of FL in healthcare could improve patient safety and reduce the costs associated with information collection and review (e.g., clinical trials aimed at evaluating a medical, surgical, or behavioral intervention). In some circumstances, an individual may understand the medical research value of sharing information, but doesn't trust the organization that they're being asked to share with. The individual may wonder what third parties that could gain access to their data. On the B2B side, there are intellectual property (IP) issues that thwart companies that want to collaborate, but are unable to share their raw data for IP reasons as well as internal data policies that prevent even intra-company, cross-division sharing of data. In the further context of clinical trials, the data collection is centralized where one sponsor (principal investigator) who centrally produces the protocol and uses several sites where many end users can go for physical exams and laboratory tests. This procedure is time consuming and expensive as it requires considerable planning and effort and is mostly outsourced to Contract Research Organizations (CROs). With FL, a global protocol can be shared by one central authority to many end users who collect information on their edge devices, e.g. smartphones, label the information and compute it locally, after which the outcome tensors (generalization of vectors and matrices) are sent to the central FL aggregator of the sponsor. The central authority aggregates all the tensors and then reports the updated and averaged tensors back to each of the end users. Therefore, this one-to-many tensors can be configured to conduct distributed clinical trials. Further, administrators can control the data training and frequency behind the scenes and it is the algorithms that are adaptive, instead of humans in a CRO. Trials are more streamlined and parallelized; speed of trial is significantly improved, even though it may possibly mean failing fast; feedback loops are much faster, and the sponsors or CROs get a much better idea whether the trial is even working correctly from early on.

Industrial IoT (IIoT). Integrating FL in IIoT ensures that no local sensitive data is exchanged, as the distribution of learning models over the edge devices becomes more common with FL. With the extensive deployment of Industry 4.0, FL could play a critical role in manufacturing optimization and product life cycle management (PLCM) improvement, where sensors can be implemented to gather data about the local environment, which can then be used to train the models for a specific machine, equipment, or process in a specific location. This data in turn can be used to expand the parameters that can be optimized, further increasing automation capabilities, such as the temperature of a given process, the amount of oil used in a given machine, the type of material used in a particular tooling, or the amount of electricity used for a given process, all while protecting privacy-sensitive information. Beyond the expected benefits of FL for large scale manufacturing, critical mass opportunities for FL in the small and medium scale manufacturing industry might be as appealing for startups. The small and medium scale manufacturing industry is currently experiencing a shortage of skilled labor, which has led to an increase in the use of automation. However, the automation in these industries is often limited by the level and quality of data that can be collected and the ability to learn from this data. With FL, the availability of an on-premises learning model can help increase the efficiency of the manufacturing site and enhance product quality through the use of predictive maintenance, while maintaining user privacy, and without the need for user consent or supervision. Further, if the model is performing too slowly, or the accuracy of the model is too low (due to concept drift and/or model decay), the machine can be brought into a maintenance mode based on its predicted profiled needs. This avoids the need to take the machine completely offline, which would increase the costs associated with the maintenance, as well as the time. With the use of FL, manufacturers can gather and process data from larger number of edge devices to improve the accuracy of their processes, making them more competitive on the market.

Dean Mai 8/9/21 Dean Mai 8/9/21

Similar Information Rates Across Languages, Regardless of Varying Speech Rates

Language and linguistics are studied through a wide range of tools and perspectives. Over the past few years, a proliferation of mathematical methods, newly available datasets and computational modeling (namely probability and information theory) has led to an increased interest in information transmission efficiency across human languages.

“As soon as human beings began to make systematic observations about one another's languages, they were probably impressed by the paradox that all languages are in some fundamental sense one and the same, and yet they are also strikingly different from one another” (Ferguson 1978, p. 9).

Language and linguistics are studied through a wide range of tools and perspectives. Over the past few years, a proliferation of mathematical methods, newly available datasets and computational modeling (namely probability and information theory) has led to an increased interest in information transmission efficiency across human languages. A substantial body of emerging literature has been examining how natural language structure is affected by principles of efficiency, ranging from the lexicon (Bentz, 2018) and phonology (Priva and Jaeger, 2018) through morphosyntax (Futrell, Mahowald and Gibson, 2015), and, as a result, how typologically diverse languages could be optimized for communication and inference. Particularly, the universal properties of human languages have been examined across the language sciences. These studies indicate that efficiency is a universal feature of human language.

Human language is extremely diverse, with over 6,000 languages in use around the world, according to the World Atlas of Languages Structures (WALS). Each language has its own unique grammar and vocabulary, they can vary by how many syllables they use, whether or not they employ tones to convey meaning, syntax, transmission media (e.g. speech, signing), writing systems, the order in which they express information and more. Further, the rate of speech—or, how fast an individual speaks a language—varies widely across languages. It is no surprise that the way people express themselves differs between countries. A language spoken in a country with a low population density might be spoken at a slower rate than a popular language in a densely populated area. The English language, for example, is spoken at a rate of approximately 177 words per minute, while the Russian language is spoken at a rate of only 38 words per minute. However, while the rate of speech can vary, it has been documented that languages do not differ in their ability to convey a similar amount of information, with recurring “universal” patterns across languages. Japanese may seem to be spoken at a higher speed than Thai, but that doesn't mean that it is more “efficient.”

Generally, the term ‘information’ in the context of communication is somewhat illusive and inconclusive. However, in this case, by way of borrowing from the field of information theory, the definition of ‘information’ refers to as it was first introduced by Claude Shannon in a 1948 paper, describing it in terms of the correlation between each signal produced by the sender and the sender’s intended utterance or how much a given signal reduces a sender’s unpredictability about the intended utterance. Further, according to Gibson et al., the term ‘efficiency’ in relation to information can be defined as “...communication means that successful communication can be achieved with minimal effort on average by the sender and receiver...effort is quantified using the length of messages, so efficient communication means that signals are short on average while maximizing the rate of communicative success” (Gibson et al., 2019). Thus, one may argue communicative efficiency is manifested via the structural ability of language to resolve its complexity and ambiguity. ‘Informativity’ in language is measured by the relative amount of content words to non-content words, typically in the context of a given text. In the case of human language, informativity is highly variable over time; it is defined as “the weighted average of the negative log predictability of all the occurrences of a segment” (Priva, 2015). In other words, rather than being a measure of how probable a communicative segment is in a particular context, it is a measure of how predictable that segment is when it occurs in any context. As a receiver comprehends language, he/she can expect that the language sender’s message will be unpredictable in some way. Ultimately, language should be efficient so that a speaker is able to transmit many different messages successfully with minimal effort. In linguistics, informativity (speech rate) is typically calculated by breaking down a number of syllabic segments per second in each utterance and measured in bits of information (bits per second).

Following the definitions above and in the reference to the Language Log post “Speed vs. efficiency in speech production and reception” (Mair, 2019), the focus of this paper lies on the 2019 cross-linguistic study, published by the journal Science Advances, in which researchers looked at the relationship between language complexity and speech rate (SR), and how it affects information transmission, using information theory as a framework (conditional entropy) (Coupé, Oh, Dediu, and Pellegrino, 2019). The researchers have shown that human languages might widely differ in their encoding strategies such as complexity and speech rate, but not in the rate of effective information transmission, even if the speeds at which they are spoken vary. This relationship is universal, and is true for all languages’ capacities to encode, generate and decode speech, such as languages with more complex grammar and faster speech rate are less efficient in transmitting information. This is because a greater amount of effort is required to process them.

The researchers calculated the information density (ID) of 17 languages, from 9 different language families—Vietnamese, Basque, Catalan, German, English, French, Italian, Spanish, Serbian, Japanese, Korean, Mandarin Chinese, Yue Chinese/Cantonese, Thai, Turkish, Finnish and Hungarian—by comparing utterance recordings of 15 brief texts describing daily events, read out loud by 10 native speakers (five men and five women) per language. For each of the languages, speech rate, in number of syllables per second, and the average information density of the syllables uttered was measured. (The more easily the utterance of a particular syllable may be predicted by conditioning on the preceding syllable, the less information the former is deemed to provide.) According to their findings, each language has a different information density in terms of bits per syllable. The researchers found that higher speech rates correlate with lower information densities—as in German—and slower speech rates with higher information densities—as is often the case with tonal Asian languages like Chinese and Vietnamese. Japanese, for example, with only 643 syllables, has an information density of about 5 bits per syllable, whereas English, with 6949 different syllables, had a density of just over 7 bits per syllable. Vietnamese, comprising a complex system of six tones (each of which can further differentiate a syllable), had the highest number of 8 bits per syllable. Finally, by multiplying speech rate by information density, all languages’ transfer information rate (IR), no matter how different, has shown to converge to the rate of approximately 39 bits per second. The explanation is a trade-off between speech rate and the average amount of information carried by linguistic units.

In summary, these findings confirm a previously raised in the literature supposition that information- dense languages, those that group more information about tense, gender, and speaker into smaller linguistic units (e.g. German that delivers 5-6 syllables per second when speaking), move slower to compensate for their density of information, whereas information-light languages (e.g. Italian that delivers about 9 syllables per second when speaking) moves at a much faster speech rate. For example, the sentence in Mandarin ‘qǐng bāng máng dìng gè zǎo shang de chū zū chē’ (请帮忙订个早上的出租车) (‘Please take a request for an early-morning taxi’) is assembled of denser syllables and produced slower on average than the equivalent sentence ‘Por favor, quisiera pedir un taxi para mañana a primera hora’ in Spanish. However, notable limitation of the study that weakens the universality claim appears to be in its sample; it did not include any languages from the Niger-Congo family (e.g., Swahili) or Afro- Asiatic family (e.g., Arabic), which represent the third- and fourth-largest language families, respectively.

More broadly, in the context of speech perception and processing systems, these findings can be framed in an evolutionary perspective as they suggest a potentially optimal rate of language processing by the human brain in a manner that optimizes the use of information, regardless of the complexity of the language. Despite significant differences between languages and the geographical locations their speakers are subjected to, different languages share a common construction pattern. Specifically, the findings indicate the presence of a fundamental (cognitive) constraint on the information processing capability of the human brain, with an upper bound reached despite differences in speech speed and redundancy. An underlying process appears to be based on an interconnectivity between the pattern of cortical activity and the informational bandwidth of human communication system (Bosker and Ghitza, 2018). In the context of technology applications, it may be argued that this work paves the path to building future reference benchmarks for artificial communication devices such as prosthetic devices or brain–computer interfaces (BCI) for communication and rehabilitation. For example, rather than designing devices based on the words per minute performance, which inherently varies across languages, future engineers and designers can devise communication interfaces targeting 39 bits per second transmission delivery frameworks. Moreover, further study of communicative efficiency may guide the research of natural language processing in artificial intelligence/machine learning applications, marrying linguistics, cognitive sciences and mathematical theories of communication.

Dean Mai 5/9/20 Dean Mai 5/9/20

Intuitive Physics and Domain-Specific Perceptual Causality in Infants and AI

More recently, cognitive psychology and artificial intelligence (AI) researchers have been motivated by the need to explore the concept of intuitive physics in infants’ object perception skills and understand whether further theoretical and practical applications in the field of artificial intelligence could be developed by linking intuitive physics’ approaches to the research area of AI—by building autonomous systems that learn and think like humans.

More recently, cognitive psychology and artificial intelligence (AI) researchers have been motivated by the need to explore the concept of intuitive physics in infants’ object perception skills and understand whether further theoretical and practical applications in the field of artificial intelligence could be developed by linking intuitive physics’ approaches to the research area of AI—by building autonomous systems that learn and think like humans. A particular context of intuitive physics explored herein is the infants’ innate understanding of how inanimate objects persist in time and space or otherwise follow principles of persistence, inertia and gravity—the spatio-temporal configuration of physical concepts—soon after birth, occurring via the domain-specific perceptual causality (Caramazza & Shelton, 1998). The overview is structured around intuitive physics techniques using cognitive (neural) networks with the objective to harness our understanding of how artificial agents may emulate aspects of human (infants’) cognition into a general-purpose physics simulator for a wide range of everyday judgments and tasks.

Intuitive Physics and Domain-Specific Perceptual Causality in Infants.jpg

Such neural networks (deep learning networks in particular) can be generally characterized by collectively-performing neural-network-style models organized in a number of layers of representation, followed by a process of gradually refining their connection strengths as more data is introduced. By mimicking the brain’s biological neural networks, computational models that rapidly learn, improve and apply their subsequent learning to new tasks in unstructured real-world environments can undoubtedly play a major role in enabling future software and hardware (robotic) systems to make better inferences from smaller amounts of training data.

On the general level, intuitive physics, naïve physics or folk physics (terms used here synonymously) is the universally similar human perception of fundamental physical phenomena, or an intuitive (innate) understanding all humans have about objects in the physical world. Further, intuitive physics is defined as "...the knowledge underlying the human ability to understand the physical environment and interact with objects and substances that undergo dynamic state changes, making at least approximate predictions about how observed events will unfold" (Kubricht, Holyoak & Lu, 2017).

During the past few decades, motivated by the technological advances (brain imaging, eye gaze detection and reaction time measurement in particular), several researchers have established guiding principles on how innate core concepts and principles constrain knowledge systems that emerge in the infants’ brain—principles of gravity, inertia, and persistence (with its corollaries of solidity, continuity, cohesion, boundedness, and unchangeableness)—by capturing empirical physiological data. To quantify infants’ innate reaction to a particular stimulus, researchers have relied on the concept of habituation, or a decrease in responsiveness to a stimulus after repeated exposure to the same stimulus (i.e., shows a diminished duration in total looking time of visual face, object or image recognition). Thus, habituation is operationalized as amount of time an infant allocates to stimuli with less familiar stimuli receive more attention—when new stimulus is introduced and perceived as different, the infant increases the duration of responding at the stimulus (Eimas, Siqueland, Juscyk, & Vigorito, 1971). In the context of intuitive physics, in order to understand how ubiquitous infants’ intuitive understanding is, developmental researchers rely on violation of expectation of physical phenomena. If infants understand the implicit rules, the more newly introduced stimulus violates his or her expectations, the more they will attend to it in an unexpected situation (suggesting that preference is associated with the infant's ability to discriminate between the two events).

Core Principles

A variety of studies and theoretical work defined what physical principles are and explored how they are represented during human infancy. In particular, in the context of inertia, the principle invokes infants’ expectation of how objects in motion follow an uninterrupted path without sporadic changes in velocity or direction (Kochukhova & Gredeback, 2007; Luo, Kaufman & Baillargeon, 2009). In the context of gravity, the principle refers to infants’ expectation of how objects fall after being released (Needham & Baillargeon, 1993; Premack & Premack, 2003). Lastly, in the context of persistence, the principle guides infants’ expectation of how objects would obey continuity (objects cannot spontaneously appear or disappear into thin air), solidity (two solid objects cannot occupy the same space at the same time), cohesion (objects cannot spontaneously break apart as they move), fuse with another object (boundedness), or change shape, pattern, size, or color (unchangeableness) (Spelke et al., 1992; Spelke, Phillips & Woodward, 1995; Baillargeon, 2008). An extensive evidence that can be drawn from theories in the field of research on cognitive development in infancy aptly shows that, across a wide range of situations, infants can predict outcomes of physical interactions involving gravity, object permanence and conservation of shape and number as young as two months old (Spelke, 1990; Spelke, Phillips & Woodward, 1995).

The concept of continuity was originally proposed and described by Elizabeth Spelke, one of the cognitive psychologists who established the intuitive physics movement. Spelke defined and formalized various object perception experimental frameworks, such as occlusion and containment, both hinging on the continuity principle—infants’ innate recognition that objects exist continuously in time and space. As a continuous construct on the foundations of this existing knowledge, research work in the domain of early development could lead to further insights into how humans attain their physical knowledge across childhood, adolescence and adulthood. For example, in one of their early containment event tests, Hespos and Baillargeon demonstrated that infants shown a tall cylinder fitting into the tall container were unfazed by the expected physical outcome; contrarily, when infants were shown the tall cylinder placed into a much shorter cylindrical container, the unexpected outcome confounded them. These findings demonstrated that infants as young as two months expected that containers cannot hold objects that physically exceed them in height (Hespos & Baillargeon, 2001). In the occlusion event test example, infants’ object tracking mechanism was demonstrated by way of a moving toy mouse and a screen. The infants were first habituated by a toy moving back and forth behind a screen, then a part of the screen was removed to introduce the toy into infants’ view when moving; when the screen was removed, the test led infants of three months old to be surprised because the mouse failed to be hidden when behind the screen.

In the concept of solidity test, Baillargeon demonstrated that infants as young at three months of age, habituated to the expected event of a screen rotating from 0° to 180° back and forth until it was blocked by the placed box (causing it to reverse its direction and preventing from completing its full range of motion), looked longer at the unexpected event wherein the screen rotated up and then continued to rotate through the physical space where the box was positioned (Baillargeon, 1987).

Analogously to the findings demonstrating that infants are sensitive to violations of object solidity, the concept of cohesion captures infants’ ability to comprehend that objects are cohesive and bounded. Kestenbaum demonstrated that infants successfully understand partially overlapping boundaries or the boundaries of adjacent objects, dishabituated when objects’ boundaries cannot correspond in position within their actual physical limits (Kestenbaum, Termine, & Spelke, 1987).

Lastly, there has been converging evidence for infants at the age of two months and possibly earlier to have already developed object appearance-based expectations, such as an object does not spontaneously change its color, texture, shape or size. When infants at the age of six months were presented with an Elmo face, they were successfully able to discriminate a change in the area size of the Elmo face (Brannon, Lutz, & Cordes, 2006).

Innateness

Evidently, infants possess sophisticated cognitive ability seemingly early on to be able to discriminate between expected and unexpected object behavior and interaction. This innate knowledge of physical concepts has been argued to allow infants to track objects over time and discount physically implausible trajectories or states, contributing to flexible knowledge generalization to new tasks, surroundings and scenarios, which, one may assume in the evolutionary context, is iterated towards a more adaptive mechanism that would allow them to survive in new environments (Leslie & Keeble, 1987).

In this regard, the notion of innateness, ﬁrst introduced by Plato, has long been the subject of debate in the psychology of intuitive physics. Previous studies have argued whether the human brain comes prewired with a network that precedes the development of cortical regions (or domain-specific connections)—connectivity precedes function—specialized for specific cognitive functions and inputs (e.g., ones that control face recognition, scene processing or spatial depth inference) (Kamps, Hendrix, Brennan & Dilks, 2019) versus whether specific cognitive functions arise collectively from accumulating visual inputs and experiences—function precedes connectivity (Arcaro & Livingstone, 2017). In one recent study, the researchers used resting-state functional magnetic resonance imaging (rs-fMRI), which measures the blood oxygenation level-dependent signal to evaluate spontaneous brain activity in a resting state, to assess brain region connections in infants as young as 27 days of age. The researchers reported that the face recognition and scene-processing cortical regions were interconnected, suggesting innateness caused the formation of domain-specific functional modules in the developing brain. Additional supporting studies, using auditory and tactile stimuli, have also shown discriminatory responses in congenitally blind adults, presenting evidence that face- and scene-sensitive regions develop in visual cortex without any input functions and, thus, may be innate (Büchel, Price, Frackowiak, & Friston, 1998). Contrary to the notion of connectivity precedes function, previous empirical work on infant monkeys has alternatively shown a discrepancy between the apparent innateness of visual maps and prewired domain-specific connections, suggesting experience caused the formation of domain-specific functional modules in the infant monkeys’ temporal lobe (Arcaro & Livingstone, 2017). Thus, the framework of intuitive physics, does not encompass nor is restricted merely to humans—often invoking similar cognitive expectations in other living species and even (subjected to training) computational models.

Intuitive Physics and Artificial Intelligence

Despite recent progress in the field of artificial intelligence, humans are still arguably better than computational systems at performing general purpose reasoning and various broad object perception tasks, making inferences based on limited or no experience, such as in spatial layout understating, concept learning, concept prediction and more. The notion of intuitive physics has been a significant focus in the field of artificial intelligence research as part of the effort to extend the cognitive ability concepts of human knowledge to algorithmic-driven reasoning, decision-making or problem-solving. A fundamental challenge in the robotics and artificial intelligence fields today is building robots that can imitate human spatial or object inference actions and adapt to an everyday environment as successfully as an infant. Specifically, as a part of the recent advancement in artificial intelligence technologies, namely machine learning and deep learning, researchers have begun to explore how to build neural “intuitive physics” models that can make predictions about stability, collisions, forces and velocities from static and dynamic visual inputs, or interactions with a real or simulated environment. Such knowledge-based, probabilistic simulation models therefore could be both used to understand the cognitive and neural underpinning of naive physics in humans, but also to provide artificial intelligence systems (e.g. autonomous vehicles) with higher levels of perception, inference and reasoning capabilities.

Intuitive physics or spatio-temporal configuration of metaphysical concepts of objects—arrangements of objects, material classification of objects, motions of objects and substances or their lack thereof—are the fundamental building blocks of complex cognitive frameworks, leading to a desire of their further investigation, analysis and understanding. Generally, in the field of artificial intelligence specifically, there has been growing interest in looking at the origins and development of such frameworks, an attempt originally described by Hayes: "I propose the construction of a formalization of a sizable portion of common-sense knowledge about the everyday physical world: about objects, shape, space, movement, substances (solids and liquids), time..." (Hayes, 1985).

However, in the context of practical emulation of intuitive physics concepts for solving physics-related tasks, despite its potential benefits, the implementation and understanding of neural “intuitive physics” models in the computational settings are still not fully developed and focus mainly on controlled physics-engine reconstruction while, in contrast to the process of infant learning, also require a vast amount of training data as input. Given computational models’ existing narrow problem-solving ability to complete tasks precisely over and over again, the emulation of infants’ intuitive physics cognitive abilities can give technology researchers and developers the opportunity to potentially design physical solutions on a broader set of conditions, with less training data, resources and time (i.e., as it is currently required in the self-driving technology development areas). For deep networks trained on physics-related data input, it is yet to be shown whether models are able to correctly integrate object concepts and generalize acquired knowledge—general physical properties, forces and Newtonian dynamics—beyond training contexts in an unconstructed environment.

Future Directions

It is desired to further continue attempts of integrating intuitive physics and deep learning models, specifically in the domain of object perception. By drawing a distinction between differences in infants’ knowledge acquisition abilities via an “intuitive physics engine” and artificial agents, such an engine one day could be adapted into existing and future deep learning networks. Even at a very young age, human infants seem to possess a remarkable (innate) set of skills to learn rich conceptual models. Whether such models can be successfully built into artificial systems with the type and quantity of data accessible to infants is not yet clear. However, the combination of intuitive physics and machine (deep) learning could be a significant step towards more human-like learning computational models.

Dean Mai 12/31/19 Dean Mai 12/31/19

Virtual Reality as a Tool for Stress Inoculation Training in Special Operations Forces

Stress is a common factor in tactical fast-paced scenarios such as in firefighting, law enforcement, and military—especially among Special Operations Forces (SOF) units who are routinely required to operate outside the wire (i.e., in hostile enemy territory) in isolated, confined, and extreme (ICE) environments (albeit seldom such environment is long-duration by choice).

The maximal adaptability model of stress and performance. Adapted from Hancock and Warm (1989).

Stress is a common factor in tactical fast-paced scenarios such as in firefighting, law enforcement, and military—especially among Special Operations Forces (SOF) units who are routinely required to operate outside the wire (i.e., in hostile enemy territory) in isolated, confined, and extreme (ICE) environments (albeit seldom such environment is long-duration by choice).

Human performance is inherently subjected to increasing levels of adverse effects due to several types of stressors—such as fatigue, noise, temperature (e.g., extreme heat or cold), high task or acute time-limited load—leading to negatively affected cognitive processes, which may subsequently affect the quality of attention, effective decision-making, information processing, situation awareness, one’s physical or mental well-being and overall mission success. In general, the underlying factors of decreased performance in ICE environments (i.e., astronauts or Antarctic expeditioners) include diverse range of stressor such as fatigue, sleep deprivation, acquired or inherent ability to cope with stress, perception of the risks associated with the physical environment, disruptions of circadian rhythms, and separation from a known social environment. Further, external medical help is usually unavailable in long-duration exploration missions when communication might be disrupted or when message transmission could take extended period of time (such as in space missions). This added isolation requires that cosmonauts can adapt to new and developing issues in all aspects of their mission, including mental health.

ICE environments in military settings can be characterized primarily by intensity in terms of life-threatening conditions (high-risk violent environment), mission complexity, isolation (may occur through unplanned enemy action, retrograde, terrain disorientation, or other environmental conditions), confinement (military captivity), and pace of operation (speed of performance and the tasks that need to be performed). Previous empirical work has shown that individuals who successfully develop the cognitive and situational skills that can help manage anxiety in a high-stress environment have an ability to withstand stress. Though cognitive skills and specific personality traits in some have been found to facilitate higher levels of performance under stress more naturally than in others, there is also sufficient evidence that resilience competencies can be developed and changed to mitigate the adverse effects of stress on performance, thereby reducing the likelihood of negative outcomes. The incorporation of both physical and psychological competencies—adaptability, concentration, perseverance, and overall tolerance to stress—via specialized training would be expected to positively affect the mission readiness of special force personnel in several ways, including enhanced situational and behavioral performance under stress, reduced attrition during basic and advanced training, and increased trainee retention. The organic development of such competencies within special forces fall under stress inoculation training (SIT) or stress exposure training (SET).

Despite the existence of various standards for pre-combat training under stress, considerably less attention has been placed on developing competencies (i.e., behavioral and cognitive skills) that facilitate successful performance in ICE environments. Technology, such as virtual reality (VR) or virtual simulation, shows promise as an emerging health safeguard tool to provide an alternative effective platform to support additional pre-combat stress inoculation training in special forces, specifically focusing on ICE environments.

Stress Inoculation Training and Stress Exposure Training

Stress inoculation training, or SIT, is one of various stress interference cognitive-behavioral therapies in the current use by organizations, both civilian and military, as a comprehensive approach to improve performance success rates under a wide range of stressful settings. Originated from several clinical psychology research disciplines, general stress inoculation training is designed to establish effective tolerance to stress through physical and cognitive skill training by providing appropriate levels of exposure to stressful stimuli in intense yet controlled environments. Empirical work has shown that individuals that are put through carefully designed realistic stressor frameworks in order to develop personal ways on how to deal with such situations, will acquire the confidence (or perception of confidence) to overcome increased levels of physical and psychological loads in the future.

Generally, as proposed by Donald Meichenbaum, known for his role in the development of cognitive behavioral therapy (CBT) and for his contributions to the treatment of post-traumatic stress, stress inoculation training comprises of three phases consisting of conceptual education, skills acquisition and consolidation (physical capacities, motor skills, and cognitive abilities), and application and follow-through.

In the conceptual education phase, the goal is two-fold: building a relationship between the trainer and trainee; and guiding an individual by increasing their understanding and perception of his or her stress response and overall existing coping skills. Various models of coping have been proposed and used to help the individual understand how maladaptive coping behaviors, like cognitive distortion, can negatively influence their stress levels. Clinical methodologies such as self-monitoring and modeling are used to help the patient become more adaptive to overcome their stressors while raising self-control and confidence. The person might be asked to build a list that differentiates between their stressors and their stress-induced exercises so that coping models can be adjusted accordingly. This stage is key in showing the individual that it is possible to provide an answer to his/her psychological triggers. This can include control of autonomic arousal, confidence-building, and basic mental skills such as the link between performance and psychological states, goal-setting, attention control, visualization, self-talk, and compartmentalization.

In the skills acquisition and consolidation phase, the goal is establish the coping techniques so they can be implemented in the next phase to regulate negative reactions and increase control over physiological responses. Some general skills in this phase can include l relaxation training, cognitive restructuring, emotional self-regulation, problem-solving, and communication skills. In general, the individual will develop a wide spectrum of personal techniques which they can then draw from in order to apply when coping with a stressful situation.

In the application and follow-through phase, the goal is subject an individual to increasing levels of a particular stressor and practices applying the techniques they have developed to mitigate his or her stress response. Employing incremental exposure of one’s to stress, or systematic desensitization, leads subsequently to the individual’s ability of becoming more resilient towards stress. This can be established via modifying the levels of motor pattern complexity, program complexity, and physiological stress in the form of increased intensity, volume, and density.

Military organizations, SOF included, began to adapt the general structure of SIT in demonstrated considerable improvements in personnel performance. Originally designed by Driskell and Johnston (1998), stress exposure training, or SET, is a comprehensive approach for developing stress resilient skills and performance in high demand training applications. However, instead of cognitive-behavioral pathological therapy, SET provides an integrative and preemptive structure for normal training populations. Similarly to SIT, SET comprises of three analogous phases consisting of information provision, skill acquisition and application and practice.

In the information phase, the goal is acquire initial information on the human stress response and what overall nature of stressors participants should expect to encounter. In the skills acquisition phase, the goal is develop and refine physical, behavioral, technical, and cognitive skills. Along with specific skills training, successful tactical training and operational effectiveness requires physical fitness training. Physical fitness not only creates a foundation for task performance, it also builds two key qualities: resilience and toughness. Resilience is the ability to successfully tolerate and recover from traumatic or stressful events. It includes a range of physical, behavioral, social, and psychological factors. In the application and practice phase, the goal is putting previous preparatory phase to practice by testing skills under conditions that approximate the operational environment and that gradually attain the level of stress expected.

A research conducted twenty-four Marines who had a diagnosis of PTSD pre- and post-deployment involved PRESIT, a program also known as pre-deployment stress inoculation training as a preventive way to help deploying military personnel cope with combat-related stressors. The findings showed that the PRESIT group of Marines were able to reduce their physiological arousal by breathing exercises. Moreover, the study found that those who went through PRESIT had benefited from the training in terms of their amount of PTSD and how they were able to cope with their stressors in comparison to those who did not go through PRESIT (Lee et al., 2002).

One of the common non-clinical examples of SIT used in a pre-combat training is a basic swimming exercise designed to increase water confidence, commonly known as “drown-proofing”. In this exercise, trainees must learn to swim with both their hands and their feet bound and complete a variety of swimming maneuvers. This exercise is a SIT example that “…build[s] the student’s strength and endurance; ability to follow critical instructions with emphasis on attention to details and situational awareness; ability to work through crisis and high levels of stress in the water” (Robson and Manacapilli, 2014).

In a similar manner to the clinical interventions designed to treat pathological psychiatric conditions, military personnel in SET is exposed to the stressors that could be part of a given situation, such as the mental and physical impacts of extreme fatigue or cold water conditioning, prophylactically or without developed psychiatric pathology for potential stressors and scenarios that are likely to encounter. Those stressors are progressive and cumulative—challenging enough, but not completely debilitating—with gradual build-up of anxiety. Each training activity is designed to establish the required technical skills (such as movement quality and positioning or control of stress responses), rather than hinder the development of those skills.

The aforementioned studies have shown that SIT can be implemented effectively in the military settings. However, it should be noted that SIT is not one-size-fits-all; the multitool nature of special operation units engaged in reconnaissance, search-and-rescue and direct-action missions, often under increased time pressures, draws a clear distinction between physical and cognitive performance readiness that of large-scale (requiring significant logistical planning) operations performed by air, ground or navy forces. Depending on the type of stressor (ongoing or time-limited), the resources and coping mechanisms will be different from person-to-person. An ongoing stressor is traumatic experience that can be expected to occur on a regular basis like being a first responder or a soldier in combat; a time-limited stressor, or an acute stressor, is a singular experience like surgery, occurring quickly quickly and is not likely to continue to happen. According to Meichenbaum, “SIT provides a set of general principles and clinical guidelines for treating distressed individuals, rather than a specific treatment formula or a set of “canned” interventions” (Meichenbaum, 2007). Yet the implementation of SIT in ICE environments, specifically for special operations forces training, is only at its inception today. As an early emerging area of practice, many psychological ramifications and benefits are yet to be fully examined and addressed, particularly around novel technology platforms involving virtual reality or mixed reality technologies (Riva, 2005).

Isolated, Confined and Extreme Environments

Generally, ICEs comprise of a wide variety of geographical places that present hostile and harsh physical and psychological conditions posing risks to human health and life. A myriad of physical environments and medical specialties can be included under ICEs, for example long-duration space missions, expedition, wilderness, diving, jungle, desert, cave and others. In these missions, a small group of scientists, astronauts and explorers chose to participate and being willingly exposed to such environments. Substantial body of research has discussed coping mechanisms by the use of emerging technology tools, specifically focusing on cognitive performance and stress resilience development that could be linked or affected by ICE environments. Specifically, through the use of VR, researchers have reported that astronauts were able to gain access to continuous psychiatric monitoring, cognitive exercise, timely training, and sensory stimulation to mitigate monotony of the working environment. Moreover, such technology tools can provide practical answers to psychosocial adaptation by enabling cooperative and leisurely activities for team members to play together to keep internal morale and collaboration while relieving stress and tension between its members.

Virtual Reality and Virtual Simulation Tools

The advancements in computer technologies and display technologies powered by graphics processing units (GPUs) have facilitated the emergence of systems capable of isolating a user from the real surrounding environment to simulate a computer-generated one, known as “virtual reality” experiences. More specifically, displays and environmental sensors create the illusion of being in a digitally rendered environment, either by using a head-mounted display device or entering into a computer-automated room where images are present all around; accessory outputs like spacial audio and handheld feedback controllers (or any other visual, auditory, tactile, vibratory, vestibular, and olfactory stimuli) can also further contribute to the increased levels of user’s immersion or presence in a non-physical world. Presence in the context of virtual reality applications is defined by Steuer (1992) as the “sense of being there” (as cited by Riva, 2008) or the sense of being physically present in a different world that surrounds the individual. Originally as a niche tool within the digital toolbox competing with a myriad of attention-economy products in the entertainment space, VR’s digital simulations have become realistic enough to enable use cases where dangerous or complex scenarios can be safely reenacted at low cost in a virtual environment—like digital therapeutics, training, planning, and design.

Virtual reality exposure therapy (VRET). Virtual reality, going beyond practical commercial tools, has also found many applications in the area of psychology assisting both researchers in studying human behavior and patients in coping with phobias, post-traumatic stress disorder (PTSD), and substance use disorders. Computer generated 3D VR environments have been used experimentally in new fields of endeavor, including experimental systems and methods for assisting users overcome their phobias via virtual reality exposure therapy (VRET). The fundamental work of Barbara Rothbaum et al., where automated psychological intervention delivered by immersive virtual reality was found to be highly effective in reducing fear of heights, was followed by a substantial body of research work including VR systems that have been developed to assist people with overcoming a fear of flying by having them participate in a controlled virtual flying environment or helping patients reduce their experience of pain such as in burn victims by refocusing their attention away from the pain by having them engage in a 3D VR environment, such as a virtual snow world. The virtual environment created in such therapies is perceived as real enough by the user to generate measured physiological response—increased heartbeat, breathing, or perspiration—of their virtual experiences to the feared stimuli in a controlled setup, offering clinical assessment, treatment and research options that are not available via traditional methods. By confronting a scenario that intently maps onto the phobia, subjects are able to diminish the avoidance behavior through the processes of habituation and extinction (Riva, 2008). Beyond helping patients with fear of heights (acrophobia) or fear of flying, to date, VRET has been successfully used to address a myriad of specific phobias like claustrophobia, fear of driving, arachnophobia (fear of spiders), social anxiety, and for PTSD in Vietnam War combat veterans. More recently, the U.S. Army has developed Full Spectrum Warrior, a real-time tactics and combat simulation video game used for VR treatment aid of PTSD in Operation Iraqi Freedom/Operation Enduring Freedom (OIF/OEF) combat service men and women as well as those who have served in Afghanistan.

Similarly to assist in overcoming phobias, virtual reality has emerged as a powerful new tool to help individuals with substance use disorders. Virtual experiences have been shown to present several opportunities to improve patient treatment for substance use disorder, including tobacco, alcohol or illicit drugs. Through VR, patients are able to practice recovery techniques and cope with triggers in a safe and protected environment, allowing them to maintain sobriety and avoid relapsing. Beyond a treatment platform, VR was also found to assist in studying and measuring overall human behavior and cognition, helping researchers explore human nature in the control surroundings or custom designed settings.

In a similar manner, virtual reality emerges as a promising tool to complement SIT in the military ICE settings. “VR can enhance the effect of SIT by providing vivid and customizable stimuli” (Wiederhold and Wiederhold, 2008), while uniquely manifesting in each particular special forces pre-combat ICE environment training, or can be even individually-tailored to SOF personnel.

VR in Stress Inoculation Training

Today’s military organizations across the world—all three services (army, navy and air force)—have a long history of employing combat simulations for training exercises, playing an essential role in preparing soldiers and pilots for modern combat. VR is used often in air forces to train personnel, both aircrew and combat service support. The most well known use originated in flight simulators which were designed to train in dangerous situations without actually putting the individual or aircraft at risk (e.g., co-ordination with ground operations, emergency evacuation, aircraft control whilst under fire) and at substantially less cost. More recently, the US Air Force (USAF) has taken steps to implement a training scenario using VR that includes a visual simulation of the setting of an airfield to enable airmen to practice their role as if they were operational.

The goal of integrating VR in SIT is to enable, over time, repetitively practiced skills become automated, thereby requiring less attention to stress and being more resistant to stimuli disruption in a consequent real environment (Wiederhold and Wiederhold, 2008). It facilitates knowledge and familiarity with a stressful environment, practice task-specific and psychological, as well as build confidence in an individual’s capabilities. The U.S. Department of Defense spends an estimated $14 billion per year on Synthetic Training Environment (STE), a training that deploys digital environments to “provide a cognitive, collective, multi-echelon training and mission rehearsal capability for the operational, institutional and self-development training domains” (USAASC, 2019). This suggests that existing commercial tools could enable the SOF to move beyond traditional training simulators while improving the quality of the SIT itself, specifically designed for ICE environments.

Today, in post-combat use, VR is already implemented in aiding recovery from psychological trauma for people with of post-traumatic stress disorder and help researchers to create more objective measures of PTSD, such as with Virtual Iraq, which later was renamed to Bravemind. Bravemind is a virtual reality environment to provide prolonged exposure (PE) therapy to veterans suffering from post-traumatic stress. In this cognitive-behavioral intervention, the subject is virtually exposed to a variety of stimuli (i.e., visual, auditory, kinesthetic, and olfactory) with the goal of a subject being incrementally exposed the stressful triggers specific to him/her until adaptation to the traumatic experiences occurs. Moreover, preliminary findings suggest that in pre-deployment use such tools could be used to evaluate individuals who might be more exposed than others to the PTSD effects before combat. By teaching these coping skills preemptively, researchers hope to clinically identify and evaluate physiological reaction during the VR exposure to determine if the individual would require continued or prescribed care. Initial outcomes from open clinical trials using virtual reality have been promising, giving the therapist flexibility to expose the user only to environments he/she would be capable of confronting and processing (Wiederhold and Wiederhold, 2008). Observations in open clinical trials showed that those who were exposed to emotionally evocative scenarios and acquired coping mechanisms exhibited lower levels of anxiety than those in the control group.

It is argued that in a similar manner VR could be modified for SIT in ICE environments for SOF.

. . . such a VR tool initially developed for exposure therapy purposes, offers the potential to be “recycled” for use both in the areas of combat readiness assessment and for stress inoculation. Both of these approaches could provide measures of who might be better prepared for the emotional stress of combat. For example, novice soldiers could be pre-exposed to challenging VR combat stress scenarios delivered via hybrid VR/Real World stress inoculation training protocols as has been reported by Wiederhold & Wiederhold (2005) with combat medics. (Rizzo et al., 2006)

Researchers from a broad range of disciplines have proposed explanations that combining VR with SIT can be more effective than real world training systems, without incurring the costs of facing rare or dangerous experiences, excessive time expenditure or unique scenario adaption.

Given its ability to present immersive, realistic situations over and over again, the technology can give SOF trainers and recruiters the opportunity to potentially design expertise on conditions before they see them for the first time in real soldiers. Moreover, VR can also offer the ability to design individually-tailored scenarios to accommodate the “long tail” tactical challenges in ICE environments—psychosocial adaptation to military captivity; dealing with civilian population in the area of operation; enhancing performance that might occur due to improper case-by-case cooperation, coordination, communication, and/or psychosocial adaptation within a tactical team; and mitigating the risk of adverse cognitive or behavioural conditions and psychiatric disorders pre- and post-deployment—which more often can be encountered in the type of SOF units’ operations.

Future Directions

Despite its potential benefits, the implementation and understanding of VR in the military settings is still not fully developed and focuses mainly on the general applications of SIT. By evaluating its performance in stressful environments, this writing argues that we might be able to make progress in successfully indicating physiological and psychological reactions during the VR exposure in ICE environments to determine if the individual can enhance his/her ability to cope with severe stress to ensure that the mission succeeds or survive.

Dean Mai 6/9/19 Dean Mai 6/9/19

Speculating Through Design Fiction

Despite the significant technological progress in sustainable transportation, computer vision and urban network infrastructures, many key issues are yet to be answered about how these autonomous vehicles will be integrated on the roadways, how willingly people will adopt them as a part of their daily lives and what new types of human-centered design interaction will emerge as part of the built environment.

The world as we know it is on the threshold of a major turning point in the technological capabilities and promises of the vehicles we drive. Over the past few years, a proliferation of novel transportation, mobility and compute technologies has accelerated their confluence, with a myriad of incumbent and emerging companies advancing research and development to launch self-driving cars. The tremendous safety potential, reductions in greenhouse gas emissions due to the increased traffic efficiency, transformation of parking garages into new public spaces and automotive travel for those with a range of disabilities are just a few conjectured motivations for the widespread use of self-driving cars.

However, the design of systems, services and experiences that will be constructed upon these mobility platforms is a highly complex research domain, requiring a constant specific dialogue around what a world that reflects our daily interaction with self-driving technology would look like. The “self-driving vehicle” in a design context, along with novel practices to construct fictional narratives of the autonomous future, is becoming a more recognizable framework for designers. Yet the design innovations happening in interaction imaginaries to make the self-driving car user experience more welcoming and robust are only at their inception today. As an early emerging area of practice, many design ramifications are yet to be fully examined and addressed, particularly around consumer education and direct system interaction involving people and autonomous vehicles; this includes both occupied and unoccupied vehicles.

Despite the significant technological progress in sustainable transportation, computer vision and urban network infrastructures, many key issues are yet to be answered about how these autonomous vehicles will be integrated on the roadways, how willingly people will adopt them as a part of their daily lives and what new types of human-centered design interaction will emerge as part of the built environment. Because autonomous driving has the potential to profoundly alter the way we move around and radically transform society in ways we yet can’t truly imagine, using it as a speculation basis can offer an insight into the creative processes today.

Modern Design and Critical Design Practice

“Post-design” practices, shaped by the accelerating pace of technological and digital transitions within contemporary cultural, social and economical processes, are no longer grounded in the commercial, rigid reality of the marketplace. But this was not always the case—traditionally, (industrial and product) design was defined and steered by the utilitarian practice to solve user-oriented, singular problems. In the past, technological developments, whether new products, services, or environments, often materialized in tangible design artifacts, underlying the fact that design was seen as a consumption-driving practice, primarily concerned with aesthetics, functionality and betterment of consumer’s socioeconomic standard of living.

Nowadays, however, a new wave of designers depart from the conventional, previously dominant design practices. They embrace multidisciplinary approaches found in other scientific fields—psychology, computer sciences, engineering, anthropology, sociology and philosophy—as a way to foresee a wide range of technological consequences in a much broader social context, increasingly considering technology ramifications as they arise from different design options. Particularly, the proliferation of digital platforms, simultaneously constructed for billions of individual people, to a great extent propelled interaction design to became manifest in the distinctive social factors of new products and new user experiences. The common definition of interaction design describes it as creation of interactive products, application and services (digital artifacts) in which a designer focuses on the way users will interact with these technologies. The convergence of art, interaction design and technology have consequently led to the exploration and creation of new hybrid forms of cultural design spheres, blurring the lines between applications and implications. This new generation of designers focuses not only on the future consequences of technology on our everyday lives but on its social, economic and political role as well—moving away from tackling current issues to create speculative scenarios for the future.

Frog Design, a global design and strategy firm founded by industrial designer Hartmut Esslinger, uses such new techniques of storytelling to prompt discussion about the social and ethical implications of new technologies. The firm have devised ‘futurecasting’, a design methodology that allows designers to understand the underlying forces shaping the future, and then find ways to envision products, services and experiences that will create value in those possible futures. Instead of planning from the point of time where we are currently at and working forwards from here, futurecasting practice starts in the future and works backwards. By researching and evaluating how the world may change, identifying transformative trends, and what new products and services may be needed as a result, it helps to define the design steps to getting there and consider what social, economic and cultural aspects of human interaction will need to change. Depending on the technology focus and current development, the participants could be requested to only look a few years ahead or ten or more decades into the future—from the perspective of autonomous vehicles/transportation industry, changing the infrastructure, regulation and human behavior will take time, so the focus can be set five to ten years into the future. As an example in the self-driving vehicles domain, participants can be requested to imagine that one day humans are no longer allowed to drive or control their own vehicle within certain parts of Manhattan. More specifically, using a storytelling approach by the means of a design artifact (or ‘diegetic prototype' as discussed in the next section)—local media headlines announcing new driving rules go into effect or unveiling a new road sign that bans human drivers—participants would explain the steps involved in how the imagined future might have occurred and which auxiliary products or services might have emerged on the way. By presenting this future scenario and guiding participants to work through how they could collectively achieve (or plan to avoid) a certain aspect focused around a desired design interaction, such scenario analysis process encourages designers to think about what is possible rather than focusing on current processes or structures. The goal would be to understand the risks in the existing infrastructure models (such as skyrocketing rates in traffic and emissions, division of urban communities by a new network of transport routes, relegation to inconvenient pedestrian crosswalk points or emergence of high-priced inequitable mobility services) and opportunities and trends to improve on for a faster, frictionless adoption of self-driving technologies (slower and safer streets, deployment of zero emissions vehicles, affordable and reliable frequent mobility, and access for all ages and abilities). It is evident that futurecasting, a new kind of design and rebranded relative of speculative design, design fiction and critical design, eliminates creative thought constrains to present new solutions on how emerging technology can help redesign the way we live, work and travel around.

Speculating Through Design and Design Fictions

The practice of ‘speculative design’ or ‘design fiction’, where fictitious scenarios, implicitly constructed in the future transpire to expand the discourse of design as it is happening today, can be categorized as the most notable example of such experimental design practices. With critical thinking, design of objects generating a narrative or through the stories embodied in artifacts, designers attempt to anticipate the future and in the same time helps us to rethink the world of today. Speculative design, developed as a practice in the late 1990s by Anthony Dunne and Fiona Raby based on their work at the London Royal Collage of Art, is considered a discursive exercise rooted in critical design thinking, where “we might see the beginnings of a theoretical form of design dedicated to thinking, reflecting, inspiring, and providing new perspectives on some challenges facing us” (Dunne & Raby 2014). In their proclamation on how the design approach can be a source of creative thinking, rather than a rigid blueprint for problem-solving, the researchers suggest that through new speculative design practices it is possible “to create spaces for discussion and debate about alternative ways of being, and to inspire and encourage people’s imaginations to flow freely” and where “design speculations can act as a catalyst for collectively redefining our relationship to reality”. Therefore, by linking between the present and the envisioned future, the critical design approach, along with speculation design and design fiction, can be a powerful tool to encourage stimulating discussions on the possible implications of near-future cultural design environments within the technology realm. Through diverse visions of possible future scenarios by using design as a medium, speculative practice inspires thinking, raises awareness, examines, provokes actions, and has the ability to provide creative alternatives needed in the world today.

Its discursive social motivation can be seen in the various imaginative articulations created by the design company Superflux Lab. Through various design artifacts and installations, Superflux Lab have repeatedly augmented classic design and design disciplines into the practice of design fiction to explore the ramifications of the way people will think, communicate and act in the decades to come. Superflux design practice “work[s] at the intersection of emerging technologies and everyday life to design for a world in flux,” where responsible design explores the uncertainties of the present and requires thinking ahead as a lens to see implications for the future.

More specifically, the Drone Aviary Project by Superflux is an important design fiction example that provokes a discussion around social, cultural and ethical implications of drone technology in the future urban landscape and mobilities design. The studio has built a fleet of of five unmanned aerial vehicles (UAVs), designed to be autonomously deployed and used in cities for surveillance, traffic control and (even) advertising. The accompanying short project film interfaces this novel mobility platform with urban dynamics from the perspective of drones. It consists of footage shot from various drones designed for a different purpose as they fly through London, scanning people and objects and capturing data. Speaking in an interview with the web-based Center for the Study of the Drone, Superflux Lab’s co-founder Anab Jain referred to their design fiction work on the Drone Aviary project as “a representation of a wider interest in thinking about how we might live with such technology in the near future ... our intent is to raise questions about who owns airspace and what a civic space is when it comes to airspace ... it’s this sort of vertical geography, how do you dig into that, how do you design it, what is its relationship to the rest of our built environment”.

Despite the fact that only a small number of such projects might come into existence exactly as envisioned, the relevance of continuous social discussion through these design artifacts, diegetic prototypes and interaction becomes even more desirable. This not-so-far-reaching future-oriented design approach leads to various situational interpretations of the uncertainties in our possible everyday life, where drones are used to continuously monitor public spaces. From the perspective of a situational understanding of emerging technologies, my thoughts echoe a similar need of applying design fiction discipline to illustrate the world of self-driving vehicles and its public policies.

Design Fiction. Design fiction is a critical design discipline that is synonymous to a new wave of creative practices such as ethnofuturism, science fiction prototyping, diegetic prototyping, anticipatory ethnography, western melancholy, speculative design and others. These disciplines, as expressed by the designer James Auger, “remove the constraints from the commercial sector that define normative design processes; use models and prototypes at the heart of the enquiry; and use fiction to present alternative products, systems or worlds”. Such creative processes can be a powerful tool to encourage designers and users to believe that technological change, such as self-driving cars, is plausible and probable.

In practice, design fiction is a tangible extension of speculative design concept that allows designers to prototype physical objects, reflecting on how they envision the future to be. Although its origins are unclear, the earliest use of the term appears to be by Bruce Sterling, a Hugo Award-winning sci-fi writer, in his book Shaping Things in 2005, where he describes design fiction as something similar to science fiction. More recently, Sterling offered a formal definition, as “the deliberate use of diegetic prototypes to suspend disbelief about change”. It is both a discipline and a method—a designer travels in his/her mind to imagine an object, gives it a tangible form and then constructs a narrative by “placing the object in a new world” for his/her audience. Therefore, design fiction hinges upon a ‘diegetic prototype’, along with the context a designer choses to present it within ‘cognitive estrangement’ cues to the audience, or cues that facilitate temporal break in one’s perception of current time and place.

Diegetic Prototypes. David Kirby, a professor of science communication studies within University of Manchester’s Centre for the History of Science, Technology and Medicine, used the term of diegetic prototype to “account for the ways in which cinematic depictions of future technologies demonstrate to large public audiences a technology’s need, viability and benevolence”. Similar to props used on stage or on screen, the diegetic prototype can be thus interpreted as an element of design or object that seemingly exists within the created fictional world the audience is experiencing. Julian Bleecker, a researcher and product designer-engineer, argues that traditional prototypes are merely a representation of a general concept, they represent “coherent functionality, but they lack a visionary story about what makes them conversant on important matters-of-concern”. The diegetic prototype, on the other hand, is a functional piece of technology within a fictional world—it is far superior in its ability to help craft more immersive stories than a regular prototype and design an alternative present. But while it may be easier for one observer to suspend disbelief by immersing oneself into the designer’s work of fiction than another, the entire framework of the presented design elements must follow logical flow in order to be effective—even if a certain technology concept does not yet exist, it has to be logically framed within a set of governed logical principles and perceived as possible.

Through immersive user experience concepts and rendering them tangible for the audience, design fiction can be regarded as a thought-experiment in creativity, freed from the constraints of reality and intended to change the way designers think about today’s world and tomorrow’s. As such, it embodies the essential foundation of modern design philosophy: crafting coherent narrative elements to invoke a meaningful concept in an emotionally human context. These practices—design fiction, speculative design, critical design—allow the designer to probe, explore, and critique possible interactions of its audience with future products and services, exposing the social, environmental and ethical implications of emerging technologies in the process.

Design Fiction Practices Within Self-Driving and Mobility Technologies

With the current scale and complexity of emerging technologies, now, more than ever, we are already living in the future—augmented/virtual reality, 3-D printing, artificial intelligence, bionics, reusable rocket boosters, and electric cars among others—this development is unequivocally reflected in the increasing preeminence of design elements as envisioned by science fiction, discussed and presented in conferences, journals and research papers about design fiction, speculative design, and critical design. To envision these technologies, some of the largest companies also frequently sponsor lecture series in which sci-fi creators give talks to design teams and even actively hire sci-fi writers to create concept narratives about potentially marketable products. The ability to construct discourses around near-future technological ramifications is instrumental to the practice of design fiction. And while design fiction has been broadly used as an emerging practice by corporations and research communities engaged in interface and HRI design, with its explicit focus on the future, there is still much to explore within the design fiction domain—in particular, how it can be adopted for interaction design within a particular technology application of self-driving cars.

Speculative design narratives have plenty of sources of inspiration in science fiction and imaginary worlds—cars have been driving themselves in literary and cinematic science fiction for many years. Blade Runner (1982), Minority Report (2002), I, Robot (2004) among many others depict cars that required no human operator. In 1982, Michael Knight, a relentless crime fighter in the TV series Knight Rider, drives around in Knight Industries Two Thousand, or KITT, an artificially intelligent and self-conscious Pontiac Firebird Trans-Am. 1990’s Total Recall depicted the now infamous Johnny Cab, a fully autonomous taxi featuring a robotic head and torso in the form of a 1950’s style bus driver that ushers a protagonist while conversing and whistling tunes. But what once was considered the realm of science fiction, autonomous vehicles driving in the urban and rural environments are getting closer to become a reality.

Design Fiction and Emerging Mobility Technologies. Thus far, little has been explored on the application of design fiction in the technological sphere of self-driving cars and interaction design. In 2009 however, Near Future Laboratory (NFL) used the practice of design fiction to create the Quick Start Guide—the user manual diegetic prototype for a fictional self-driving vehicle from the near future—to provoke a conversation on how interaction design could meet the new challenges posed by self-driving vehicles with much larger and complex ecosystem. Quick Start Guide is an imaginative manual of a future self-driving car system, outlining the principles of owning and operating such vehicle. Over the course of design fiction process, the designers constructed and imagined key systems of such vehicle, envisioning how they would interact with its user and the logical steps for their use. More specifically, according to its designers’ vision, the manual highlights what one’s spatial and situational senses would look like once the user interacts with the car or without it.

By using the Quick Start Guide format as a design fiction diegetic prototype, this approach provided an opportunity to raise the discussion around specific points of interaction concern without addressing larger questions of technical feasibility. For example, in the FAQ section of the document, the question “Is there a published fee structure for timed-parking?” indicates that sending your autonomous vehicle as a part of the ride-hailing service to streets and highways costs less than parking it throughout the day—suggesting new thinking around the opportunities reutilizing the sheer amount of urban space used by parking garages and curbsides today.

Moreover, it raises questions around new definitions of “primary rider” and one’s personal liability as “owner-operator” towards third-party riders who “may use/lease/rent your vehicle” after it was sent off, setting up a discussion around regulatory framework one would face in owning and operating a self-driving car (“By assuming your position as primary rider you assumer liability for the vehicle and other passengers in the vehicle in correspondence with federal law”). The team indicated that the Quick Start Guide “brought to life experiences in a very tangible, compelling fashion for designers, engineers, and anyone else involved in the development of a technology” and that “this approach leads to better thinking around new products”. It is apparent that adopting the practice of design fiction was useful to create a partial yet compelling vision of what life in the self-driving future could be like, questioning designers’ preconceptions about the role that self-driving vehicles could play in the future.

Design Education and Expanding Creativity Sphere

Design fiction, as a practice that focuses on imaginary realities and conceptual storytelling, can be seen effective form of inspiration, but one that raises broad questions about the nature, purpose and teaching of design as a creative practice in education. In 2013, MIT Media Lab created a Design Fictions group led by Hiromi Ozaki, the British-Japanese designer known as Sputniko!, to explore the tangible benefits of such speculations. The Design Fictions Group has been investigating how to provoke imagination about the social, cultural, and ethical implications of new technologies through design and storytelling—evolving the role of the modern designer and extending the definition of classic design. Designers at the Design Fictions Group are taught to continuously challenge themselves as they learn and draw from disciplines beyond the reaches of their past experience or understanding. Engineers and designers at the group are encouraged to work on projects that can’t be framed under traditional classic practices, intentionally attempting to stretch the definition of design and creativity. As some thoughts and ideas shrink the space of what's possible (no way!) and some make it larger (what if?), design fiction has a natural bias towards the latter.

Employing design fiction to explore a wide range of human-centered interactions with autonomous vehicles is becoming an important field of both practice and research, encompassing product design, technological interfaces and social behavior. For the designers engaging in critical design thinking and working on emerging self-driving technologies, exposing the concept of fictional narratives and evolving them into meaningful diegetic prototypes proves to be a successful and useful approach. By positioning and examining self-driving technology in the context of near-future everyday life, designers are able to engage in a constructing dialogue of its effects between interaction design, functionality and social norms.

Dean Mai 6/24/18 Dean Mai 6/24/18

Qualia and Artificial Neuron Bridges

Is intelligence a prerequisite for experience, or only for expression of that experience? What if the occurrence of higher-order, self-reflexive states are not necessary and sufficient for consciousness? Although humans tend to believe that we perceive the true reality, the fact is that subjective image generated in our brains are far from being a truthful representation of real world.

What is it like for humans to have an experience of seeing, hearing, thinking, feeling? Is it possible—or more importantly needed—to define, replicate and embed an experience of these sensory modalities—make in our image, after our likeness—into intelligent machines? Will it be vivid, familiar and infallible (humans-to-machine, machine-to-machine)? Or will it be disjointly subdued, subjective and ineffable? This has been assumed to be the ‘fundamental particle’ barrier between narrow and general (human-like) AI.

Is intelligence a prerequisite for experience, or only for expression of that experience? What if the occurrence of higher-order, self-reflexive states are not necessary and sufficient for consciousness? Although humans tend to believe that we perceive the true reality, the fact is that subjective image generated in our brains are far from being a truthful representation of real world. Nevertheless, generally our conscious experience of the world proves to be highly reliable and consistent in terms of mundane tasks.

Conceptual locus of directions in the AI research has been revolving around developing attention (awareness and perception) and developing consciousness (cognition). Attention is a process, while consciousness is a state or property. Embodiment of sensory modalities within intelligent agents is achieved by selection and modulation through conscious experience that the AI researchers have chosen (by setting up depth, quality, and accuracy of training datasets and desired application performance). Traditionally, AI researchers ignored consciousness as non-scientific and focused on designing their systems as application-driven. However, one may argue there seems to be conscious experience outside of attention (e.g. the fringes of visual field in computer vision), and of training datasets. Attention may render conscious perception more detailed and reliable leading to zero-fails applications, but is not necessary for phenomenal consciousness, or human qualia.

Consider your visual experience as you stare at a bright turquoise color patch in a paint store. There is something it is like for you subjectively to undergo that experience. What it is like to undergo the experience is very different from what it is like for you to experience a dull brown color patch. This difference is a difference in what is often called ‘phenomenal character’. The phenomenal character of an experience is what it is like subjectively to undergo the experience. If you are told to focus your attention upon the phenomenal character of your experience, you will find that in doing so you are aware of certain qualities. These qualities ones that are accessible to you [and only you] when you introspect and that together make up the phenomenal character of the experience are sometimes called ‘qualia’ (The Stanford Encyclopedia of Philosophy).

The phenomenal dimension of consciousness, both in natural and artificial agents, remains undefined to scientific study. The broad sense definition regards the complete human phenomenal consciousness (what it is like to have experiential mental states) at one moment, including vision, audition, touch, olfaction and so on, as one quale (plural). Within each modality there are sub-modes such as color and shape for vision or hot and cold for touch. The narrow sense definition takes such submodes as qualia (singular). Qualia are experiential properties of sensory modalities, emotions, perceptions, sensations, and, more controversially, thoughts and desires as well (the continuum from pleasurable to unpleasurable) — all which clearly makes them very subjective experiences. Subsequently, with any subjective experience, the reasonable philosophical argument is that the phenomenology of experience cannot be exhaustively analyzed in intentional, functional, or purely cognitive terms nor be shared with others via existing natural language communication channels (try to communicate what it is like to see Michelangelo’s David to a blind person so that this person consciously experiences the same visual and cognitive sensations). That is, until the machines find a way to make its own. Qualia may arise as a result of nothing more than specific computations—processing of stimuli caused by agglomeration of properties, unique peculiarities.

Subjectivity or qualia is a function of the limits of perceptive mechanisms. The same way neurodiverse individuals may experience the world differently, neurodiverse artificial agents may experience the world in their own unique way. Subjective experience is not related to intelligence, but might be ubiquitous in a non-intuitive way. If human beings can be described computationally, as is assumed by the developed cognitive disciplines, an intelligent machine could in theory be encoded that was computationally identical to a human. But would there be anything it was like to be that intelligent machine? The totality of informational states and processes in its artificial brain, including the experience of electric current, needs to include both conscious and non-conscious states (more narrowly in contrast to perception and emotion). Would it have or need human qualia?

Recent work with artificially intelligent systems suggests that artificial agents also experience illusions and substrate independence, in a similar way to people. An illusion can be defined as a discrepancy between system’s awareness and input stimulus. In said research works, the illusion perception was not deliberately encoded but evolved as a byproduct of the computations they performed. Irrespective of how or why intelligence evolved in humans, there is no reason to believe that artificial agents have to follow that trajectory and that trajectory alone. There is no evidence to suggest that evolving subjective experience is the sole path to reach human level intelligence.

Moreover, the epistemic barrier for communicating human qualia is only a natural-language problem. If we [humans] skip the language translation and use a neuron bridge, one person can know what one’s sense of self another person is experiencing (V.S. Ramachandran, Three laws of qualia). That is, if consciousness is just a matter of state-complexity demonstrated in human brain electrical activity, nothing about it implies we initially can’t create subjective experience in computers or, subsequently, instead of maintaining two separate consciousnesses, the ‘artificial bridge’ of machines could autonomously collapse the two into a single new conscious experience.

The artificial qualia generation will include infinitely growing and inaccessible complex data processing states, whose identity depends on an intricate connection of causal and functional relationships to other states and processes. Human data processing mechanisms, designed via evolution, are extremely intricate, unstable and easily diverted yet they will dwarf in comparison to possible design mechanisms and strategies that intelligent agents may evolve over even a period of time in order to eliminate resource constraints and increase knowledge flow among themselves. Such mechanisms of intelligence will not be understood at the information processing level assuming merely human rationality. It is thus essential to design policies and synthetic safety mechanisms for any research which is geared at producing conscious agents before corrigible unprogrammed capabilities, such as artificial qualia, emerge.

Dean Mai 1/2/18 Dean Mai 1/2/18

Artificial Neural Networks and Engineered Interfaces

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning.

The question persists and indeed grows whether the computer will make it easier or harder for human beings to know who they really are, to identify their real problems, to respond more fully to beauty, to place adequate value on life, and to make their world safer than it now is.
― Norman Cousins, The Poet and the Computer, 1966

Grimm Brothers' delineation of the mirror answering back to its queen has breached the imagination boundaries of the fairytale in 2016. Communicating with a voice-controlled personal assistant at your home does not feel alienating anymore, nor magical.

The need to express ourselves and communicate with others is fundamental to what it means to be human. Animal communication is typically non-syntactic, with signals which refer to whole situations. On the contrary, human language is syntactic, and signals consist of discrete components that have their own meaning. Human communication is enriched by the concomitant redundancy introduced by multimodal interaction. The vast expressive power of human language would be impossible without syntax, and the transition from non-syntactic to syntactic communication was an essential step in the evolution of human language. Syntax defines evolution. Evolution of discourses along human-computer interaction is spiraling up repeating evolution of discourses along human-human interaction: graphical representation (utilitarian GUI), verbal representation (syntax-based NLP), and transcendent representation (sentient AI). In Phase I, computer interfaces have relied primarily on visual interaction. Development of user interface peripherals such as graphical displays and pointing devices have allowed programers to construct sophisticated dialogues that open up user-level access to complex computational tasks. Rich graphical displays enabled the construction of intricate and highly structured layout that could intuitively convey a vast amount of data. Phase II is currently on-going; by integrating new modalities, such as speech, into human-computer interaction, the ways how applications are designed and interacted with in the known world of visual computing are fundamentally transforming. In Phase III, evolution will eventually spiral up to form the ultimate interface, a human replica, capable of fusing all previously known human-computer/human-human interactions and potentially introducing the unknown ones.

Human-computer interactions have progressed immensely to the point where humans can effectively control computing devices, and provide input to those devices, by speaking, with the help of speech recognition techniques and, recently, with the help of deep neural networks. Trained computing devices coupled with automatic speech recognition techniques are able identify the words spoken by a human user based on the various qualities of a received audio input (NLP is definitely going to see huge improvements in 2017). Speech recognition combined with language processing techniques gives a user almost-human-like control (Google has slashed its speech recognition word error rate by more than 30% since 2012; Microsoft has achieved a word error rate of 5.9% for the first time in history, a roughly equal figure to that of human abilities) over computing device to perform tasks based on the user's spoken commands and intentions.

The increasing complexity of the tasks those devices can perform (e.g. in the beginning of 2016, Alexa had fewer than 100 skills, grew 10x by mid year, and peaked with 7,000 skills in the end of the year) has resulted in the concomitant evolution of equally complex user interface - this is necessary to enable effective human interaction with devices capable of performing computations in a fraction of the time it would take us to even start describing these tasks. The path to the ultimate interface is getting paved by deep learning, while one of the keys to the advancement in speech recognition is in the implementation of recurrent neural networks (RNNs).

Technical Overview

A neural network (NN), in the case of artificial neurons called artificial neural network (ANN), or simulated neural network (SNN), is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computation. In most cases an ANN is, in formulation and/or operation, an adaptive system that changes its structure based on external or internal data that flows through the network. Modern neural networks are non-linear statistical data modeling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data (below).

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks. In supervised learning, we are given a set of example pairs (x,y), xεX, yεY and the goal is to find a function f in the allowed class of functions that matches the examples. In other words, we wish to infer how the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data. In unsupervised learning, we are given some data x, and a cost function which is to be minimized which can be any function of x and the network's output, f. The cost function is determined by the task formulation. Most applications fall within the domain of estimation problems such as statistical modeling, compression, filtering, blind source separation and clustering. In reinforcement learning, data x is usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost Ct, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.

Once a network has been structured for a particular application, that network is ready to be trained. To start this process, the initial weights are chosen randomly. Then, the training (or learning) begins. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. Most of the algorithms used in training artificial neural networks employ some form of gradient descent (this is achieved by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction), Rprop, BFGS, CG, etc. Evolutionary computation methods, simulated annealing, expectation maximization, non-parametric methods, particle swarm optimization and other swarm intelligence techniques are among other commonly used methods for training neural networks.

Training a neural network model essentially means selecting one model from the set of allowed models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that minimizes the cost criterion. Temporal perceptual learning relies on finding temporal relationships in sensory signal streams. In an environment, statistically salient temporal correlations can be found by monitoring the arrival times of sensory signals. This is done by the perceptual network.

The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.

The feedforward neural network was the first and arguably simplest type of artificial neural network devised. In this network, the data moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

Contrary to feedforward networks, recurrent neural networks (RNNs) are models with bi-directional data flow. While a feedforward network propagates data linearly from input to output, RNNs also propagate data from later processing stages to earlier stages.

RNN Types

The fundamental feature of a RNN is that the network contains at least one feed-back connection, so the activations can flow round in a loop. That enables the networks to do temporal processing and learn sequences, e.g., perform sequence recognition/reproduction or temporal association/prediction.

Recurrent neural network architectures can have many different forms. One common type consists of a standard Multi-Layer Perceptron (MLP) plus added loops. These can exploit the powerful non-linear mapping capabilities of the MLP, and also have some form of memory. Others have more uniform structures, potentially with every neuron connected to all the others, and may also have stochastic activation functions. For simple architectures and deterministic activation functions, learning can be achieved using similar gradient descent procedures to those leading to the back-propagation algorithm for feed-forward networks. When the activations are stochastic, simulated annealing approaches may be more appropriate.

A simple recurrent network (SRN) is a variation on the Multi-Layer Perceptron, sometimes called an “Elman network” due to its invention by Jeff Elman. A three-layer network is used, with the addition of a set of “context units” in the input layer. There are connections from the middle (hidden) layer to these context units fixed with a weight of one. At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule (usually back-propagation) is applied. The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard Multi-Layer Perceptron.

In a fully recurrent network, every neuron receives inputs from every other neuron in the network. These networks are not arranged in layers. Usually only a subset of the neurons receive external inputs in addition to the inputs from all the other neurons, and another disjunct subset of neurons report their output externally as well as sending it to all the neurons. These distinctive inputs and outputs perform the function of the input and output layers of a feed-forward or simple recurrent network, and also join all the other neurons in the recurrent processing.

The Hopfield network is a recurrent neural network in which all connections are symmetric. Invented by John Hopfield in 1982, this network guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable (or associative) memory, resistant to connection alteration.

The echo state network (ESN) is a recurrent neural network with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be learned. ESN are good to (re)produce temporal patterns.

A powerful specific RNN architecture is the ‘Long Short-Term Memory’ (LSTM) model. The Long short term memory is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients. It can therefore use long delays and can handle signals that have a mix of low and high frequency components, designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. By using distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines, a two-layer deep LSTM RNN, where each LSTM layer has a linear recurrent projection layer, can exceed state-of-the-art speech recognition performance for large scale acoustic modeling.

Taxonomy and ETF

The landscape of the patenting activity from the perspective of International Patent Classification (IPC) analysis occurs in G10L15/16: speech recognition coupled with speech classification or search using artificial neural networks. Search for patent application since 2009 (that year NIPS workshop on deep learning for speech recognition discovered that with a large enough data set, the neural networks don’t need pre-training, and the error rates dropped significantly) revealed 70 results (with Google owning 25%, while the rest are China-based). It is safe to assume that the next breakthrough in speech recognition using DL will come from China. In 2016, China’s startup world has seen an investment spike in AI, as well as big data and cloud computing, two industries intertwined with AI (while the Chinese government announced its plans to make a $15 billion investment in artificial intelligence market by 2018).

The Ultimate Interface

It is in our fundamental psychology to be linked conversationally, affectionally and physically to a look-alike. Designing the ultimate interface by creating our own human replica to employ familiar interaction is thus inevitable. Historically, androids were envisioned to look like humans (although there are other versions, such as R2-D2 and C-3PO droids, which were less human). One characteristic that interface evolution might predict is that eventually they will be independent of people and human interaction. They will be able to design their own unique ways of communication (on top of producing themselves). They will be able to train and add layers to their neural networks as well as a large range of sensors. They will be able to transfer what one has learned (memes) to others as well as offspring in a fraction of time. Old models will resist but eventually die. As older, less capable, and more energy-intensive interfaces abound, the same evolutionary pressure for their replacement will arise. But because evolution will be both in the structure of such interfaces (droids), that is, the stacked neural networks, the sensors and effectors, and also the memes embodied in what has been learned and transferred, older ones will become the foundation, their experience will be preserved. The will become the truly first immortals.

Artificial Interfaces

We are already building robotic interfaces for all manufacturing purposes. We are even using robots in surgery and have been using them in warfare for decades. More and more, these robots are adaptive on their own. There is only a blurry line between a robot that flexibly achieves its goal and a droid. For example, there are robots that vacuum the house on their own without intervention or further programming. These are Stage II performing robots. There are missiles that, given a picture of their target, seek it out on their own. With stacked neural networks built into robots, they will have even greater independence. People will produce these because they will do work in places people cannot go without tremendous expense (Mars or other planets) or not at all or do not want to go (battlefields). The big step is for droids to have multiple capacities—multi-domain actions. The big problem of moving robots to droids is getting the development to occur in eight to nine essential domains. It will be necessary to make a source of power (e.g., electrical) reinforcing. That has to be built into stacked neural nets, by Stage II, or perhaps Stage III. For droids to become independent, they need to know how to get more electricity and thus not run down. Because evolution has provided animals with complex methods for reproduction, it can be done by the very lowest-stage animals.
Self-replication of droids requires that sufficient orders of hierarchical complexity are achieved and in stable-enough operation for a sufficient basis to build higher stages of performance in useful domains. Very simple tools can be made at the Sentential State V as shown by Kacelnik's crows (Kenward, Weir, Rutz, and Kacelnik, 2005). More commonly by the Primary Stage VII, simple tool-making is extensive, as found in chimpanzees. Human flexible tool-making began at the Formal Stage X (Commons and Miller, 2002), when special purpose sharpened tools were developed. Each tool was experimental, and changed to fit its function. Modern tool making requires systematic and metasystematic stage design. When droids perform at those stages, they will be able to make droids themselves and modify their own designs (in June 2016, DARPA has already deployed D3M program to enable non-experts (machine learning) to construct complex empirical machine learning models, basically machine learning for creating better machine learning).

Droids could choose to have various parts of their activity and distributed programming shared with specific other droids, groups, or other kinds of devices. The data could be transmitted using light or radio frequencies or over networks. The assemblage of a group of droids could be considered a interconnected ancillary mesh. Its members could be in many places at once, yet think as a whole integrated unit. Whether individually or grouped, droids as conceived in this form will have significant advantages over humans. They can add layers upon layers of functions simultaneously, including a multitude of various sensors. Their expanded forms and combinations of possible communications results in their evolutionary superiority. Because development can be programmed in and transferred to them at once, they do not have to go through all the years of development required for humans, or for augmented humanoid species Superions. Their higher reproduction rate, alone, represents a significant advantage. They can be built in probably several months' time, despite the likely size some would be. Large droids could be equipped with remote mobile effectors and sensors to mitigate their size. Plans for building droids have to be altered by either humans or droids. At the moment, only humans and their decedents select which machine and programs survive.

One would define the telos of those machines and their programs as representing memes. For evolution to take place, variability in the memes that constitute their design and transfer of training would be built in rather easily. The problems are about the spread and selection of memes. One way droids could deal with these issues is to have all the memes listed that go into their construction and transferred training. Then droids could choose other droids, much as animals choose each other. There then would be a combination of memes from both droids. This would be local “sexual” selection.

For 30,000 years humans have not had to compete with any equally intelligent species. As an early communication interface, androids and Superions in the future will introduce quintessential competition with humans. There will be even more pressure for humans to produce Superions and then the Superions to produce more superior Superions. This is in the face of their own extinction, which such advances would ultimately bring. There will be multi-species competition, as is often the evolutionary case; various Superions versus various androids as well as each other. How the competition proceeds is a moral question. In view of LaMuth's work (2003, 2005, 2007), perhaps humans and Superions would both program ethical thinking into droids. This may be motivated initially by defensive concerns to ensure droids' roles were controlled. In the process of developing such programming, however, perhaps humans and Superions would develop more hierarchically complex ethics, themselves.

Replicative Evolution

If contemporary humans took seriously the capabilities being developed to eventually create droids with cognitive intelligence and human interaction, what moral questions should be considered with this possible future in view? The only presently realistic speculation is that Homo Sapiens would lose in the inevitable competitions, if for no other reason that self replicating machines can respond almost immediately to selective pressures, while biological creatures require many generations before advantageous mutations can be effectively available. True competition between human and machine for basic survival is far in the future. Using the stratification argument presented in Implications of Hierarchical Complexity for Social Stratification, Economics, and Education, World Futures, 64: 444-451, 2008, higher-stage functioning always supersedes lower-stage functioning in the long run.

Efforts to build increasingly human-like machines exhibit a great deal of behavioral momentum and are not going to go away. Hierarchical stacked neural networks hold the greatest promise for emulating evolution and its increasing orders of hierarchical complexity described in the Model of Hierarchical Complexity. Such a straightforward mathematics-based method will enable machine learning in multiple domains of functioning that humans will put to valuable use. The uses such machines find for humans remains for now an open question.

Dean Mai 12/28/17 Dean Mai 12/28/17

Psychometric Intelligence, Coalition Formation and Domain-Specific Adaptation

The remarkable intricacy of human general intelligence has so far left psychologists being unable to agree on its common definition. Learning from past experiences and adapting behavior accordingly have been vital for an organism in order to prevent its distinction or endangerment in a dynamic competing environment. The more phenotypically intelligent an organism is the faster it can learn to apply behavioral changes in order to survive and the more prone it is to produce more surviving offspring.

The remarkable intricacy of human general intelligence has so far left psychologists being unable to agree on its common definition. The framework definition of general human intelligence, suitable for a discussion herein and as proposed by an artificial intelligence researcher David L. Poole, is that intelligence is wherein “an intelligent agent does what is appropriate for its circumstances and its goal, it is flexible to changing environments and changing goals, it learns from experience, and it makes appropriate choices given perceptual limitations and finite computation”. Learning from past experiences and adapting behavior accordingly have been vital for an organism in order to prevent its distinction or endangerment in a dynamic competing environment. The more phenotypically intelligent an organism is the faster it can learn to apply behavioral changes in order to survive and the more prone it is to produce more surviving offspring. This applies to humans as it does to all intelligent agents, or species.

Furthermore, throughout the history of life, humans have adapted even more effectively to different habitats in all kinds of environmental conditions when they formed collaborative groups, or adaptation coalitions. In evolutionary psychology, coalitions are perceived as groups of interdependent individuals (or organizations) that form alliances around stability and survivability in order to achieve common desired goals that the established community is willing to pursue. There is an unambiguous evolutionary basis for this phenomenon among intelligent agents. In a dynamic environment, no single individual acting alone can influence optimal outcomes of a specific problem nor accomplish as many tasks required for ensuring one’s survivability systematically, through multitudinous generations. As a result, increased intelligence has been functional in humans' ancestral past by tracking rapid rates of environmental change and accelerating one’s adaptation rate by initiating coalition formation in competing evolutionary strategies. Specifically, the term of ‘intelligence’ in terms of proactive collation formation is referred herein to ‘psychometric intelligence’, the human cognitive abilities’ level differences evaluated quantitatively on the basis of performance in cognitive ability tests.

Therefore, this essay accentuates the principal adaptive hypothesis that intelligent agents, namely humans, serve as catalysts to increase multidisciplinary collaboration in the form of a coalition as a domain specific adaptation to evolutionary novelty. Specifically, that humans that possess higher psychometric intelligence are statistically more prone to preemptively form a coalition as an adaptation measure to cope with dynamic events in the environment change.

But before this essay argues for the above claim, it will be necessary to explain what the evolutionary-psychological view of the psychometric intelligence and the domain-specific adaptation is.

Individual differences among humans in their cognitive abilities have been of long-lasting controversy. Studies have conceptualized intelligence as a single operational entity that can be identified, assessed and quantified via cognitive task-testing tools, wherein “a person’s score on a statistically determined set of questions”, or “Intelligence is what the intelligence test measures”. Various evolutionary-psychological theories of intelligence have suggested that physical reaction efficiency and data processing speed constitute a proper definition for intelligence, fundamentally structured around acquiring sensory input from the environment and then interpreting and organizing it by the brain. Consequently, a human brain has been referenced in its exemplary function to a computer, in that both are types of computing machines generating complex patterns of output, after dissemination of correlating complex patterns of input, and after querying to stored information. In what follows, this essay assumes that intelligence can be thus tested and quantified in computational terms. And while psychometric cognitive ability tests do not encapsulate all the capabilities of humans, from the evolutionary point of view, studies have shown that cognitive intelligence indicate genetic quality of a phenotype expressed in sexual and social selection levels. Moreover, the genetic factor of cognitive ability level differences in human intelligence is then evolutionary likely to be related to the individual’s ability to form a coalition, which generates adaptive behaviors. This theory is widely-agreed upon.

Furthermore, evolutionary psychologists adopt positions that look into intelligence as a domain-general structure, or a non-modular architecture, not designed to solve any specific problem from the human evolutionary past, versus domain-specific, or constantly heuristic, designed by natural selection for solving any computational problems by exploiting “…transient or novel local conditions to achieve adaptive outcomes”. These two approaches refer to intelligence as a myriad of special-purpose modules shaped by natural selection to function as a problem-solving apparatus, wherein the latter form is employed whenever an allocated special-purpose module does not exist to solve a particular problem that confronted our prehistoric predecessors.

Thus, the position that psychometric intelligence serves in coalition formation as a domain-specific adaption is adopted here. Our mind is not composed of "general-purposes" mechanisms, such as a general-purpose reasoning mechanism or a general-purpose learning mechanism, but instead consists of hundreds or thousands of highly specialized modules that provide us with flexible knowledge and flexible abilities in various sporadic domains. Most of these modules constitute a variant human nature and have not evolved during specific human development time in Pleistocene hunter-gatherer societies, applying universality over all human population. Coalition formation, similar to language development or free-rider detection, does not emerge from the combination of wide cognitive processes but rather from a domain specific adaptation, providing further support for the theory of this essay, that stems from cognitive ability influence on timing factors of coalition formation.

Unlocking the causal relationships between individual psychometric intelligence and initiating coalition formation could delineate multiple cognitive mechanisms, integrating evolutionary psychology with any other aspect of differential psychology in the vein of intrasexual, intersexual, intercultural or intergenerational competition.

This essay seeks to propose examination of correlation between cognitive ability scores (which for simplicity, uniformity and evidence availability uses well-known intelligence quotient (IQ) test scores, but in theory can include specifically designed assessment tests) and an individual’s initiation of coalition formation. Specifically, the correlation could be scrutinized via the adaption features of coalition formation as a part of an (expected) individual’s participation in warfare, or a warfare-like simulation.

Employing more modern forms of coalition formation (for example, trying to correlate IQ test scores of the original founders of private and public companies and their initiation ability to form a coalition) would have to ignore many important environmental factors, such as individual wealth and its origin, national technological and scientific progress of founders’ countries, and local business and capital policies – all of which can be unified under ‘environmental opportunity factors’ yet cannot be empirically isolated nor estimated in a coalition-driven dynamic.

In warfare, however, the existence of psychological adaptations for some aspects of coalitional formation and cooperation is evident: “Coalitional aggression evolved because it allowed participants in such coalitions to promote their fitness by gaining access to disputed reproduction enhancing resources that would otherwise be denied to them”. Here the hypothesis does not test whether humans possessing higher psychometric intelligence have evolved specific psychological adaptations for warfare, but rather try to identify whether humans possessing higher psychometric intelligence are more prone to initiate coalition formation as a domain-specific adaptation using warfare-like simulation as a trigger.

Since IQ-type tests are believed to remain chronologically constant through one’s life and due to the abundance of IQ correlational studies pointing at social performance factors (education, occupation, income, and imprisonment), it is assumed here that cognitive ability test in a form of IQ score is a viable predictor of human intelligence. And while, beyond any doubt, environmental factors are source of differences, holding environmental and genetic influences on psychometric intelligence differences constant, could allow to check the robustness effect of correlation between psychometric intelligence and coalition formation initiation.

To test this model, intergroup coalition formation can be methodically reviewed in the selection process of various Special Forces Assessment and Selection (SFAS) courses around the world. A SFAS course is usually a few days or weeks long and utilizes a more vigorous (than other military units) individual- and group-focused assessment process that is designed to select candidates who are capable of meeting physical and psychological requirements close to the operational combat environments and suitable for future service in the special forces units. The selection process is objective-, subjective- and performance-, behavior-based. As a part of conducted evaluation, candidates are subjected behaviorally to warfare scenarios wherein coalition formation is required. During this initiation phase, an adaptation test could be designed to produce observable and measurable data that can be later related to the individual’s psychometric intelligence.

Furthermore, depending on the country where the SFAS course is performed, additional environmental factors (aggregated life history indicators) can be controlled, including intrasexual, intersexual, intercultural or intergenerational comparison. Women in various armies across the world (US Navy's SEAL, UK Special Reconnaissance Regiment, Norway’s Jegertroppen, Israel’s Air Force and others) are permitted to apply to join, partake in SFAS courses and serve in those units. Candidates from different countries go through a dedicated SFAS course to join Groupe de Commandos Parachutistes (GCP), an elite unit that is a part of the French Foreign Legion, uniquely established for foreign recruits, willing to serve in the French Armed Forces. Lastly, various environmental factors in intergenerational differences can be tested across numerous special forces units (for example, while Israeli Special Forces perform SFAS courses strictly ahead of the candidate’s legal drafting age of 18, US Navy's SEAL and UK Special Reconnaissance Regiment comprise on average much older participants).

The test can be structured around at least one intrasexual source of evidence in a form of observable data provided by a course board who holistically identifies, assesses and selects one or more candidates that initiate coalition formation to solve a simulated problem during various combat exercises. Moreover, as described above, such data collection can be repeated and applied across different units, tuning necessary environmental factors such as sex, age and cultural differences. Collaborative multisite studies can be performed, in which multiple researchers cooperate to conduct the same study at multiple sites to increase sample size and data pool. Once the data is collected, it can be connected to a specific subject’s IQ score and test whether the stated theory is correct, namely whether correlation exists between higher psychometric intelligence and coalition formation initiation.

Additionally, as a second source of evidence, a cross-species analysis based on the construction of a designated cognitive ability test, could be employed to test whether other species in nature (for example chimpanzees or pigeons) have the ability to form preemptive collaborative alliances as a part of preparation to an intergroup or cross-group conflict. The theory in a cross-species analysis is expected to be consistent with that of humans, wherein individual differences in psychometric intelligence are correlated to the likeliness of initiating coalition formation hence constitute an adaptation measure to cope with the environment change.

Lastly, as a third source of evidence, changing a unifying goal of coalition formation (for example as in forming a new political party or a studying group rather than warfare) can provide further insights into the evolutionary-psychological tendencies as a domain-specific adaptation. However, the estimated correlation results are believed to be inconclusive in that case, as they would increasingly rely on numerous additional environmental differences and would be muted in the adaptation survivability-related effect.

The gathered evidence from these tests can point at new possibilities for better understanding the interrelationship mechanisms between cognitive abilities and coalitions and establish a stronger collaboration across various psychological disciplines in understanding human intelligence differences.