In the quiet early hours of December 7, Beijing time, OpenAI held the second event in its ambitious 12-day series of live broadcasts. This session introduced “reinforcement fine-tuning” (RFT), a next-generation model refinement technique that promises to reshape the landscape of artificial intelligence customization.
Officially slated for release in 2025, RFT aims to give AI models domain-specific expertise, unlocking new possibilities for academia, industry, and beyond.
What is reinforcement fine-tuning?
RFT refines pretrained general-purpose models by further training them on targeted, specialized datasets. Think of it as taking a seasoned generalist and coaching them to master a specific craft.
This method could turn out highly efficient, delivering expert-level capabilities with only a small amount of additional data, sometimes just a few dozen samples. The process relies on iterative cycles of reasoning, validation, and testing to boost performance in specific domains, such as law, medicine, and more.
Notably, OpenAI is already collaborating with Thomson Reuters to apply RFT in building a legal-specific model tailored to the company’s needs.
“Reinforcement fine-tuning makes it easier to develop expert-level large models in specialized fields,” said the founder of an AI applications company in an interview with 36Kr. “This isn’t something that everyday users would typically need, but for professionals, it’s a game changer.”
The statement echoes OpenAI CEO Sam Altman’s sentiment on social media, where he called RFT one of his “biggest surprises of 2024” and expressed enthusiasm about its future applications.
A highlight of the live broadcast was OpenAI’s demonstration of a real-world application in rare genetic disease research. Collaborating with researchers from Lawrence Berkeley National Laboratory and Germany’s Charite university hospital, OpenAI trained its GPT-o1 Mini model using RFT to identify the causes of rare diseases. The refined GPT-o1 Mini outperformed its larger counterpart, highlighting RFT’s potential to address complex challenges with precision and speed.
Unlike conventional fine-tuning methods, which often prioritize “memorization,” RFT focuses on enhancing reasoning and problem-solving in specific domains. This fundamental shift allows models to achieve superior performance, even when trained on limited datasets. By combining training and testing phases in iterative cycles, RFT drives the model toward greater accuracy and adaptability.
While RFT is still in its research preview stage, OpenAI has extended an open invitation to universities, research institutions, and businesses to join its RFT research program. The company is also seeking collaborative partners willing to contribute datasets to refine and expand the method’s capabilities ahead of its full release in 2025.
KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Wang Fangyu for 36Kr.