QbitAI Spotlights TIGER Lab’s One-Shot CFT — 24× Faster AI Training to Top Accuracy, Backed by NetMind & other collaborators

Large Language Models (LLMs) already carry surprising reasoning skills, but tapping into them has been either expensive (reinforcement learning with verifiable rewards, RLVR) or fragile (large-data supervised fine-tuning, SFT). A new study led by the University of Waterloo’s TIGER Lab—with NetMind as an industry collaborator and our CEO Kai Zou as one of the co-authors—introduces One-Shot Critique Fine-Tuning (CFT) and shows there’s a third way that is both cheap and robust. Although the paper is not yet published and is currently only available on arXiv, it has already been featured by QbitAI, a leading Chinese tech media outlet with over 3.5 million subscribers.

Large Language Models (LLMs) already carry surprising reasoning skills, but tapping into them has been either expensive (reinforcement learning with verifiable rewards, RLVR) or fragile (large-data supervised fine-tuning, SFT). A new study led by the University of Waterloo’s TIGER Lab—with NetMind as an industry collaborator and our CEO Kai Zou as one of the co-authors—introduces One-Shot Critique Fine-Tuning (CFT) and shows there’s a third way that is both cheap and robust. Although the paper is not yet published and is currently only available on arXiv, it has already been featured by QbitAI, a leading Chinese tech media outlet with over 3.5 million subscribers.

Why Critique Instead of Direct Answer?

CFT still belongs to the category of SFT, but instead of asking the model to imitate a reference answer as normal SFT does, CFT trains the model to criticize the quality of a candidate answer. This aligns with how humans learn: before mastering a problem, we often learn by evaluating and reflecting on existing attempts. Critiquing exposes the model to diverse reasoning paths, both correct and flawed, building a deeper understanding of logical patterns and pitfalls.

One Problem, Many Answers, Many Critiques: How One-Shot CFT Works

Overview of the 1-shot CFT dataset construction and the key difference between SFT and CFT training.

The One-Shot CFT framework is refreshingly simple, yet conceptually powerful. Here's how it works:

  1. Choose a Single Seed Problem: For example, a challenging math or logical reasoning question.
  2. Generate Diverse Answers: In this research, we use various open-source models (e.g., Qwen2.5-Math-7B-Instruct, MiMo-7B-SFT, Phi-4-reasoning) to produce multiple distinct answers to the problem.
  3. Critique with Stronger Models: We feed these answers into powerful, larger models like Claude-3-7-Sonnet or GPT-4.1 to generate in-depth critiques, highlighting the strengths and weaknesses of each answer.
  4. Train a Target Model: Using these critiques as supervision, we train a target model (e.g., Qwen2.5-Math-1.5B, Llama-3.2-3B-Instruct). The model learns not from imitation, but by learning why an answer is good or bad.

The biggest benefit of SFT/CFT compared to RL is sample efficiency. Actually, our experiments demonstrate that One-Shot CFT can be completed in under 5 GPU-hours but still drastically boost the performance in several benchmarks.

We benchmarked One-Shot CFT against state-of-the-art methods across both math and logical reasoning domains using datasets like MATH-500, Olympiad, AMC24, and AMC23.

Average accuracy (%) on different benchmarks for Qwen and Llama models, comparing base, SFT, RLVR, and CFT with only one training example.

Low-Cost, High-Impact: A New Path for LLM Training

Compared to the massive computational demands of reinforcement learning, One-Shot CFT is a game-changer. It offers significantly more efficient training that can be completed with just a single A100 GPU (depending on your base model size), eliminating the need for complex reward models or specialized RL infrastructure. Moreover, the project is fully open-sourced, providing access to training scripts, fine-tuned model weights, and datasets. This makes One-Shot CFT a practical and scalable solution for individual researchers, small labs, and startups with limited resources looking to enhance the reasoning capabilities of large language models.

Learn More & Get Started

In a world where massive models are often synonymous with massive costs, our new algorithm One-Shot CFT proves there’s a third way: efficient, effective, and elegantly human-like!