Summary
Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qcon.ai with any comments or concerns.
The presentation addresses the evolution of AI agents from stateless prompt-response tools to stateful, long-running systems, emphasizing the importance of context over computation.
Key Points:
- Context Engineering: Encompasses prompt engineering and includes everything given to the models to optimize their performance.
- Memory Management Layers:
- Short Memory: For immediate summarization needs during conversations.
- Long Memory: Involves durable storage solutions like vector databases.
- State Management: Essential for managing the multi-step processes of AI applications, facilitating learning and adaptation through feedback loops.
- Stream-native Context Engineering: By utilizing technologies like Apache Kafka and Apache Flink, context can be treated as dynamic data, enabling efficient memory orchestration and latency management.
- Challenges: Include limitations of current models concerning context storage, the need for dynamic context compression, and addressing latency and transparency issues.
- Architectural Approaches:
- Separation of Ephemeral Context: Ensuring the AI agents maintain only necessary context to reduce memory load.
- Utilization of Streaming Agents: Employing APIs of tools like Flink for real-time data processing and state management.
- Data Streaming with Kafka: Ensures event-driven applications can manage context efficiently, supporting real-time decision-making processes.
- Practical Application: The integration of AI agents into large-scale systems focuses on enhancing context management, offering a more scalable and responsive AI infrastructure.
Overall, the presentation outlines a transformative approach to AI system development, focusing on creating context-aware systems that offer robust memory management and real-time data processing capabilities.
This is the end of the AI-generated content.
As AI agents evolve from stateless prompt-response tools into stateful, long-running systems, context - not just compute - becomes the true bottleneck. Yet most architectures today treat context retrieval as an afterthought, bolting vector stores onto LLMs and hoping for the best. The result: brittle pipelines, runaway costs, and hallucinations born from memory mismanagement.
In this talk, we’ll explore a new approach: stream-native context engineering, powered by Apache Kafka and Apache Flink. By treating context as data in motion - continuously enriched, windowed, compacted, and served with low latency - we can build memory layers that scale with our agents and evolve with their understanding. We’ll dive into how stream processing primitives (state backends, RocksDB tuning, checkpoint strategies) can be repurposed for AI memory orchestration, and how to design architectures that separate ephemeral context from durable knowledge.
You’ll walk away with a practical blueprint for building context-aware AI systems - from ingestion to retrieval - and see why the next frontier of agentic intelligence won’t be decided in the model weights, but in the context pipeline that feeds them.
Speaker
Adi Polak
Director, Advocacy and Developer Experience Engineering @Confluent, Author of "Scaling Machine Learning with Spark" and "High Performance Spark 2nd Edition"
Adi is an experienced Software Engineer and people manager. She has worked with data and machine learning for operations and analytics for over a decade. As a data practitioner, she developed algorithms to solve real-world problems using machine learning techniques while leveraging expertise in distributed large-scale systems to build machine learning and data streaming pipelines. As a manager, Adi builds high-performance teams focused on trust, excellence, and ownership.
Adi has taught thousands of students how to scale machine learning systems and is the author of the successful book Scaling Machine Learning with Spark and High Performance Spark 2nd Edition.