Summary
Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qcon.ai with any comments or concerns.
In the presentation titled Fine Tuning the Enterprise: Reinforcement Learning in Practice, Wenjie Zi and Will Hang discuss the concept and application of Reinforcement Fine Tuning (RFT) for improving performance of AI models within various business contexts.
Key Points Covered:
- Introduction of Agent RFT: The presentation begins with an explanation of agent reinforcement learning and its importance in improving task completion efficiency for AI models by allowing them to interact with the outside world and learn from these interactions.
- Benefits of Agent RFT: Agent RFT is highlighted for enhancing model performance particularly in reasoning, tool usage efficiency, and sample efficiency in data-scarce domains.
- Challenges Addressed: Issues such as long tool call latency and the need for better model reasoning are addressed through RFT, resulting in more predictable and efficient model behaviors.
- Success Stories: Examples are provided across different domains, such as improving billing code accuracy and presentation generation, demonstrating the quantitative and qualitative improvements achievable with Agent RFT.
- Implementation Aspects: The talk covers practical details on setting up RFT tasks, including the configuration of tasks, choosing appropriate reward signals, and dealing with specific domain challenges through fine-tuning.
- Recommendations and Future Directions: Suggestions include ensuring high-quality training datasets, establishing baselines, optimizing tasks, and cautious use of reward signals to prevent reward hacking.
The session sheds light on the potential of RFT to produce AI models capable of learning more effectively from their own experiences, emphasizing its role in evolving AI capabilities to meet complex enterprise needs.
This is the end of the AI-generated content.
In this talk, we’ll discuss the Reinforcement Fine Tuning (RFT) platform, allowing you to train OpenAI models to reason better on your specific tasks to arrive at better answers. We’ll go over why RFT matters in the new era of Agents, how our customers have succeeded with RFT in their verticals, and how you can achieve success too. We’ll also cover the more theoretical aspects of reinforcement learning and why OpenAI models work uniquely well with RFT, as well as the more practical aspects of how to define effective evals, environments, and graders so that you can deploy better performing models with confidence.
Speaker
Wenjie Zi
Member of Technical Staff @OpenAI, Community Leader Building TAPNET, Previously @Grammarly, 10+ Years of Industrial Experience in Artificial Intelligence Applications
Wenjie Zi holds a Master’s degree in Computer Science from the University of Toronto, specializing in Natural Language Processing (NLP). Currently, she serves as a Member of Technical Staff at OpenAI, bringing over ten years of industrial experience in artificial intelligence applications. Wenjie has successfully implemented and deployed various projects, including Retrieval Augmented Generation (RAG), recommendation systems, semantic parsing (natural language to SQL), and quantitative trading.
Her research has been published in leading conferences and workshops such as ACL, NeurIPS, and KDD. Additionally, Wenjie is a course instructor for the certificate program at the University of Toronto, where she teaches deep learning-related subjects.
In her spare time, Wenjie actively participates in the Canadian AI community, serving as a committee member of MLOps World and as the lead of the Women in AI Canada Sponsorship team. She co-founded the Toronto AI Practitioners Network (TAPNET) in early 2024 and has organized multiple meetups with over 100 participants, aiming to connect technology practitioners across North America.
Speaker
Will Hang
Member of Technical Staff @OpenAI, Former Founder, Previously @DeepMind, @Google Brain, @Snorkel AI
Will is a Member of Technical Staff at OpenAI and tech lead on efforts such as Agent Reinforcement Fine Tuning and Vision Fine Tuning. Prior to OpenAI, Will led interactive ML infrastructure at Snorkel AI, speaking at venues like the Ray Summit, and was a cofounder of include.ai, a Sequoia-backed company. Previously, Will spent time at DeepMind and Google Brain, where he was a coauthor on AlphaChip, with work appearing in venues like Nature and NeurIPS. Will studied computer science at Stanford.