2 stories tagged with #post-training, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Post Training"
ARXIV CS.AI
Complementing reinforcement learning with SFT through logit averaging in the post training of LLMs
We introduce a novel method that averages the logits of a frozen reference policy (e.g., SFT) and a trainable policy, and incorporate the method into Group Relative Policy Optimiza…
X (FORMERLY TWITTER)
Distribution Fine Tuning (DFT): A post training step that fixes LLM writing
I fixed why LLMs write so poorly, and I have a demo to prove it Announcing Distribution Fine Tuning (DFT): A post training step that fixes LLM writing Model outputs fooled pangra…