Summary
Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qcon.ai with any comments or concerns.
The purpose of this talk was to demonstrate how OpenAI uses AI agents to enable teams to navigate and analyze vast repositories of data more effectively.
Key Highlights:
- Data Challenges: Managing and querying extensive data platforms can be complex and time-consuming.
- AI Deployment: OpenAI has developed an AI data analyst named Kepler to streamline data explorations and answer complex data queries.
- Implementation: Kepler utilizes MCP, RAG, and vector search technologies for efficient data handling.
Core Components:
- Kepler automates query generation and data interpretation, integrating with various tools like Slacks and IDEs.
- Kepler evaluates using advanced techniques to avoid query errors and improve response accuracy.
- It features adaptive learning by utilizing memory and correction features, providing scalable solutions for diverse data tasks.
Systems and Processes:
- Kepler services include searches and task executions via codex tests to understand data origin, usage, and context.
- Key systems such as memory management and context retrieval help in making informed decisions based on historical data interactions.
- Evals are employed to ensure model accuracy and continuous improvement.
This approach shows how AI can significantly mitigate the challenges of handling large-scale data systems by automating and optimizing the data querying process, thus enhancing productivity and decision-making at OpenAI.
This is the end of the AI-generated content.
OpenAI's internal data platform spans tens of thousands of tables and hundreds of petabytes of data. It’s powerful, but navigating it without deep institutional knowledge is hard. In this talk, we’ll share how we built an AI agent that uses MCP, RAG, and vector search over platform metadata to intelligently explore this ecosystem: discovering relevant datasets, generating safe and correct queries, interpreting results, and delivering insights through natural language.
We’ll walk through the core architecture that enables this, including an index of all the tables in our data lake that our agent understands and integration with existing platform tools. You’ll also hear what we’ve learned from real adoption across teams like Data Science, Go-to-Market, and Finance, who now rely on the agent for debugging and analysis.
Finally, we’ll show how these capabilities extend into other data products such as dashboards, where conversational intelligence and on-the-fly recommendations bring an entirely new level of interactivity to analytical workflows.
Attendees will walk away with practical patterns for building data-aware AI agents, deploying retrieval-augmented systems in complex data environments, and driving sustained adoption of AI-assisted analytics.
Speaker
Bonnie Xu
Software Engineer @OpenAI, Previously @Stripe
Bonnie Xu is a software engineer and the tech lead of the Data Productivity team at OpenAI, where she built an AI-powered data tool from the ground up to help teams explore and understand data more intelligently. Before joining OpenAI, she spent four years at Stripe working on Data Platform and previously held engineering roles at Meta and Google. Her work focuses on building scalable systems that bring AI and data together to make analysis faster and more accessible.