SFT

Supervised fine-tuning methods, data recipes, token weighting, and reasoning generalization studies.

Research category

Supervised fine-tuning methods, data recipes, token weighting, and reasoning generalization studies.

3Papers

5Resource links

2026.05Latest month

3 papers

SFT Methods

2026.05 SFT Methods

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

This paper systematically studies difficulty-based data selection for supervised fine-tuning and shows that no single difficulty level is universally optimal. It explains the data-size-dependent optimum through a tradeoff between in-distribution generalization and extrapolation, with the best difficulty shifting toward harder examples as the data budget grows.

Paper

2026.04 SFT Methods

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

This paper challenges the common claim that supervised fine-tuning (SFT) only memorizes while RL generalizes, finding cross-domain generalization from reasoning SFT with long chain-of-thought supervision depends jointly on optimization dynamics, training data, and base model capability.

Paper Code Hugging Face

2026.01 SFT Methods

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

This paper presents ProFit, a supervised fine-tuning method that mitigates single-reference overfitting by using token probability as a proxy for semantic importance and masking low-probability tokens. The approach focuses learning on core logical content and improves reasoning and math performance over standard SFT baselines.

Paper