There is just so much happening in AI right now! Applications that used to take years to develop are now being built over weekend hackathons. It’s all a testament to the power of foundation models (we have not found the outer limits yet) and rapid innovation at the infrastructure layer to put that power in the hands of more developers. While this progress opens up exciting possibilities, it also poses challenges to developers who may find themselves overwhelmed by the myriad options available. Fortunately, as the ecosystem matures, we are beginning to see components of a new generative AI stack developing. Application FrameworksApplication frameworks have emerged to quickly absorb the storm of new innovations and rationalize them into a coherent programming model. They simplify the development process and allow developers to iterate quickly. Several frameworks have emerged, each building its own interchangeable and complementary ecosystem of tools. LangChain has become the developer community’s open-source focal point for building with foundation models. Fixie is building an enterprise-grade platform for creating, deploying, and managing AI agents. Cloud providers are also building application frameworks, such as Microsoft’s Semantic Kernel and Google Cloud’s Vertex AI platform. Developers are using these frameworks to create applications that generate new content, create semantic systems that allow users to search content using natural language, and agents that perform tasks autonomously. These applications are already fundamentally changing the way we create, the way we synthesize information, and the way we work. The tooling ecosystem makes it possible for application developers to more easily bring their visions to life by leveraging the domain expertise and understanding of their customers without necessarily needing the technical depth required at the infrastructure level. Today’s ecosystem can be broken into four parts: Models, Data, Evaluation Platform, and Deployment. ModelsLet’s start with the foundation model (FM) itself. FMs are capable of human-like reasoning. They are the “brain” behind it all. Developers have several FMs to choose from, varying in output quality, modalities, context window size, cost, and latency. The most optimal design often requires developers to use a combination of multiple FMs in their application. Developers can select proprietary FMs created by vendors like Open AI, Anthropic, or Cohere or host one of a growing number of open-source FMs. Developers can also choose to train their own model.
DataLLMs are a powerful technology. But, they are limited to reasoning about the facts on which they were trained. That’s constraining for developers looking to make decisions on the data that matters to them. Fortunately, there are mechanisms developers can use to connect and operationalize their data:
Evaluation PlatformLLM developers face a tradeoff between model performance, inference cost, and latency. Developers can improve performance across all three vectors by iterating on prompts, fine-tuning the model, or switching between model providers. However, measuring performance is more complex due to the probabilistic nature of LLMs and the non-determinism of tasks. Fortunately, there are several evaluation tools to help developers determine the best prompts, provide offline and online experimentation tracking, and monitor model performance in production:
DeploymentFinally, developers want to deploy their applications into production. Developers can self-host LLM applications and deploy them using popular frameworks like Gradio. Or, developers can use third-party services to deploy applications. Fixie can be used to build, share, and deploy Agents in production. The Future is still being builtLast month we hosted an AI meetup at the OctoML headquarters in Seattle, and like the meetups in other cities, we were overwhelmed by the number of attendees and the quality of their demos. We are astonished at how quickly the science and tooling ecosystems are developing and are excited by the new possibilities that will be unlocked. Some areas that we’re particularly excited about are no-code interfaces that bring the power of foundation models to more builders, the latest advancements in security for LLMs, better mechanisms to control and monitor the quality of model outputs, and new ways to distill models to make them cheaper to run in production. For developers building in or learning more about the rapidly evolving ecosystem, please reach out at palak@. Related Insights January 27, 2023 by Jon Turow, Palak Goel and Tim Porter Our View on the Foundation Model Stack March 20, 2023 by Matt McIlwain Game On in the Generative AI Stack March 29, 2023 by Jon Turow and Luis Ceze The Android Moment of AI: Open Source Models are Just Getting Started |
|