Retrieval-Augmented Generation (RAG) has emerged as a powerful pattern for building LLM-powered applications that can leverage private or domain-specific data. However, the performance of a RAG system is heavily dependent on the quality of the context retrieved, which in turn depends on the effectiveness of the embedding model. Simply using a generic, off-the-shelf embedding model can lead to suboptimal results when dealing with specialized domains.
This talk provides a practical, step-by-step guide for developers to take control of their RAG system's performance. We will start with an open-source embedding model and demonstrate how to set up a robust evaluation framework to understand its behavior on your own data. You will learn how to identify the model's limitations and then proceed to finetune it to better capture the nuances of your specific domain. We will cover the entire lifecycle, from data preparation for fine-tuning to the evaluation of the improved model, and showcase the tangible improvements in the end-to-end RAG system.
Join this session to learn how to move from a generic RAG implementation to a genius one, with a fine-tuned embedding model that delivers more relevant context and ultimately, more accurate and useful responses from your generative model.
Key Takeaways:
- The performance of your RAG system is critically dependent on the quality of your embedding model; don't treat it as a black box.
- A systematic evaluation of your retrieval system is the first and most important step to understanding its limitations and identifying opportunities for improvement.
- Fine-tuning an open-source embedding model on your domain-specific data is a powerful and accessible technique for significantly improving the quality of retrieved context.
- The process of finetuning is not just about running a script; it's about a data-centric approach of preparing the right dataset to teach the model what's important in your domain.
- By moving from a generic to a fine-tuned embedding model, you can achieve a step-change in the performance of your RAG system, leading to more accurate and relevant responses from the generative model.
Speaker
Sahil Dua
Senior Software Engineer, Machine Learning @Google, Stanford AI, Co-Author of “The Kubernetes Workshop”, Open-Source Enthusiast
Sahil Dua is a Tech Lead focused on developing and adapting large language models (LLMs) with an expertise in Representation Learning. He oversees the full LLM lifecycle, from designing data pipelines and model architectures to optimizing models for highly efficient serving. Before Google, Sahil worked on the ML platform at Booking.com to scale machine learning model development and deployment.
A co-author of “The Kubernetes Workshop” book and an open-source enthusiast, Sahil has contributed to projects like Git, Pandas, and Linguist. As a frequent speaker at global conferences, he shares insights on AI, machine learning, and tech innovation, inspiring professionals across the industry.