Adopting Function-as-a-Service (FaaS) for AI workflows

Nov 26

Function-as-a-Service (FaaS) stands at the crossroads of cloud computing innovation and the evolving needs of modern application development. It isn’t just an incremental improvement over existing paradigms; it is an entirely new mode of thinking about computation, resources, and scale. In a world where technology continues to demand agility and abstraction, FaaS offers a lens to rethink how software operates in a fundamentally event-driven, modular, and reactive manner.

At its essence, FaaS enables developers to execute isolated, stateless functions without concern for the underlying infrastructure. The abstraction here is not superficial but structural. Traditional cloud models like Infrastructure-as-a-Service (IaaS) or even Platform-as-a-Service (PaaS) hinge on predefined notions of persistence—instances, containers, or platforms that remain idle, waiting for tasks. FaaS discards this legacy. Instead, computation occurs as a series of discrete events, each consuming resources only for the moment it executes. This operational principle aligns deeply with the physics of computation itself: using resources only when causally necessary.

To fully grasp the implications of FaaS, consider its architecture. The foundational layer is virtualization, which isolates individual functions. Historically, the field has relied on virtualization techniques like hypervisors and container orchestration to allocate resources effectively. FaaS narrows this focus further. Lightweight microVMs and unikernels are emerging as dominant trends, optimized to ensure rapid cold starts and reduced resource overhead. However, this comes at a cost: such architectures often sacrifice flexibility, requiring developers to operate within tightly controlled parameters of execution.

Above this virtualization layer is the encapsulation layer, which transforms FaaS into something that developers can tangibly work with. The challenge here is not merely technical but conceptual. Cold starts—delays caused by initializing environments from scratch—represent a fundamental bottleneck. Various techniques, such as checkpointing, prewarming, and even speculative execution, seek to address this issue. Yet, each of these solutions introduces trade-offs. Speculative prewarming may solve latency for a subset of tasks but at the cost of wasted compute. This tension exemplifies the core dynamism of FaaS: every abstraction must be balanced against the inescapable physics of finite resources.

The orchestration layer introduces complexity. Once a simple scheduling problem, orchestration in FaaS becomes a fluid, real-time process of managing unpredictable workloads. Tasks do not arrive sequentially but chaotically, each demanding isolated execution while being part of larger workflows. Systems like Kubernetes, originally built for containers, are evolving to handle this flux. In FaaS, orchestration must not only schedule tasks efficiently but also anticipate failure modes and latency spikes that could disrupt downstream systems. This is particularly critical for AI applications, where real-time responsiveness often defines the product’s value.

The final piece of the puzzle is the coordination layer, where FaaS bridges with Backend-as-a-Service (BaaS) components. Here, stateless functions are augmented with stateful abstractions—databases, message queues, storage layers. This synthesis enables FaaS to transcend its stateless nature, allowing developers to compose complex workflows. However, this dependency on external systems introduces fragility. Latency and failure are not isolated to the function execution itself but ripple across the entire ecosystem. This creates a fascinating systems-level challenge: how to design architectures that are both modular and resilient under stress.

What makes FaaS particularly significant is its impact on enterprise AI development. The state of AI today demands systems that are elastic, cost-efficient, and capable of real-time decision-making. FaaS fits naturally into this paradigm. Training a machine learning model may remain the domain of large-scale, distributed clusters, but serving inferences is a different challenge altogether. With FaaS, inference pipelines can scale dynamically, handling sporadic spikes in demand without pre-provisioning costly infrastructure. This elasticity fundamentally changes the economics of deploying AI systems, particularly in industries where demand patterns are unpredictable.

Cost is another dimension where FaaS aligns with the economics of AI. The pay-as-you-go billing model eliminates the sunk cost of idle compute. Consider a fraud detection system in finance: the model is invoked only when a transaction occurs. Under traditional models, the infrastructure to handle such transactions would remain operational regardless of workload. FaaS eliminates this inefficiency, ensuring that resources are consumed strictly in proportion to demand. However, this efficiency can sometimes obscure the complexities of cost prediction. Variability in workload execution times or dependency latencies can lead to unexpected billing spikes, a challenge enterprises are still learning to navigate.

Timeouts also impose a hard ceiling on execution in most FaaS environments, often measured in seconds or minutes. For many AI tasks—especially inference pipelines processing large inputs or models requiring nontrivial preprocessing—these limits can become a structural constraint rather than a simple runtime edge case. Timeouts force developers to split logic across multiple functions, offload parts of computation to external services, or preemptively trim the complexity of their models. These are engineering compromises driven not by the shape of the problem, but by the shape of the platform.

Perhaps the most profound impact of FaaS on AI is its ability to reduce cognitive overhead for developers. By abstracting infrastructure management, FaaS enables teams to iterate on ideas without being burdened by operational concerns. This freedom is particularly valuable in AI, where rapid experimentation often leads to breakthroughs. Deploying a sentiment analysis model or an anomaly detection system no longer requires provisioning servers, configuring environments, or maintaining uptime. Instead, developers can focus purely on refining their models and algorithms.

But the story of FaaS is not without challenges. The reliance on statelessness, while simplifying scaling, introduces new complexities in state management. AI applications often require shared state, whether in the form of session data, user context, or intermediate results. Externalizing this state to distributed storage or databases adds latency and fragility. While innovations in distributed caching and event-driven state reconciliation offer partial solutions, they remain imperfect. The dream of a truly stateful FaaS model—one that maintains the benefits of statelessness while enabling efficient state sharing—remains an open research frontier.

Cold start latency is another unsolved problem. AI systems that rely on real-time inference cannot tolerate delays introduced by environment initialization. For example, a voice assistant processing user queries needs to respond instantly; any delay breaks the illusion of interactivity. Techniques like prewarming instances or relying on lightweight runtime environments mitigate this issue but cannot eliminate it entirely. The physics of computation imposes hard limits on how quickly environments can be instantiated, particularly when security isolation is required.

Vendor lock-in is a systemic issue that pervades FaaS adoption where currently each cloud provider builds proprietary abstractions, tying developers to specific APIs, runtimes, and pricing models. While open-source projects like Knative and OpenFaaS aim to create portable alternatives, they struggle to match the integration depth and ecosystem maturity of their commercial counterparts. This tension between portability and convenience is a manifestation of the broader dynamics in cloud computing.

Looking ahead, the future of FaaS I believe will be defined by its integration with edge computing. As computation migrates closer to the source of data generation, the principles of FaaS—modularity, event-driven execution, ephemeral state—become increasingly relevant. AI models deployed on edge devices, from autonomous vehicles to smart cameras, will rely on FaaS-like paradigms to manage local inference tasks. This shift will not only redefine the boundaries of FaaS but also force the development of new orchestration and coordination mechanisms capable of operating in highly distributed environments.

In reflecting on FaaS, one cannot ignore its broader almost philosophical implications. At its heart, FaaS is an argument about the nature of computation: that it is not a continuous resource to be managed but a series of discrete events to be orchestrated. This shift reframes the role of software itself, not as a persistent entity but as a dynamic, ephemeral phenomenon.

unstructured datadata infrastructuremachine learninglarge language modelsretrieval-augmented generationdata ingestionprocessing architecturedata storagedata retrievaldata processingdata analyticsdata governancedata securityartificial intelligencefederated learningknowledge graphssynthetic data

Dean Mai

I specialize in identifying emerging high-risk, high-reward technologies in early-stage startups, research universities, government sponsored laboratories and commercial companies.

In my current role at Xerox Ventures, I lead early- and growth-stage investments to catalyze rapidly evolving technologies, with particular interest in AI & ML, low-code/no-code, security, cloud infrastructure, dev tools, fintech/DeFi, serverless, and open-source.

Previously, I managed sourcing and diligence of strategic technology opportunities for Dyson Research, Design and Development (RDD), New Product Innovation (NPI), and New Product Development (NPD) groups, focusing on companies with a novel and differentiated scientific understanding or tough engineering solution for Dyson core research categories—energy storage (batteries), high-speed digital motors, power electronics, AI & ML, embedded sensors, turbomachinery (aero-thermodynamics and flow), spectroscopy, particle separation (filtration), and materials—in the U.S., Israel and China.

https://www.deanm.ai

Adopting Function-as-a-Service (FaaS) for AI workflows

Specialization and Modularity in AI Architecture with Multi-Agent Systems

Architectural Paradigms for Scalable Unstructured Data Processing in Enterprise