LLM Training
Browse 19 LLM Training modes for AI coding agents — production-grounded, cited, installable. Part of the VIBE library.
axolotl-expert-mode
Axolotl — YAML-driven LLM fine-tuning with LoRA/QLoRA, DPO/GRPO, DeepSpeed, FSDP
View → modedistillation-expert-mode
Teacher-student LLM distillation — logits, on-policy distillation, context distillation
View → modedora-expert-mode
Weight-Decomposed Low-Rank Adaptation — magnitude + direction split for better LoRA quality
View → modedpo-expert-mode
Direct Preference Optimization — preference alignment without an explicit reward model
View → modefine-tune-eval-expert-mode
Evaluate fine-tuned LLMs — domain benchmarks, regression checks, catastrophic forgetting detection
View → modegrpo-expert-mode
Group Relative Policy Optimization — DeepSeek-R1 style reasoning RL with verifiable rewards
View → modekto-expert-mode
Kahneman-Tversky Optimization — preference alignment from binary feedback instead of paired comparisons
View → modelora-expert-mode
Low-Rank Adaptation for parameter-efficient fine-tuning of LLMs
View → modemerge-experts-mode
MergeKit recipes — SLERP, TIES, DARE, model soups, task arithmetic, MoE merging
View → modeorpo-expert-mode
Odds-Ratio Preference Optimization — single-stage SFT + preference alignment without a reference model
View → modepeft-expert-mode
HuggingFace PEFT library survey — LoRA, IA3, prompt tuning, prefix tuning, AdaLoRA, OFT/BOFT, VeRA
View → modeqlora-expert-mode
4-bit quantized LoRA fine-tuning with NF4, double quantization, and paged optimizers
View → moderlaif-expert-mode
RL from AI Feedback — principle-driven critique, AI judges, scaling preference labeling without humans
View → moderlhf-expert-mode
Reward-model + PPO RLHF pipeline — when it still beats DPO and how to run it correctly
View → modesft-expert-mode
Supervised fine-tuning fundamentals — chat templates, packing, completion-only loss, NEFTune
View → modesimpo-expert-mode
Simple Preference Optimization — reference-free, length-normalized preference alignment
View → modesynthetic-data-expert-mode
Generate fine-tuning datasets — distilabel, Magpie, Self-Instruct, Evol-Instruct, augmentoolkit
View → modetrl-expert-mode
HuggingFace TRL — SFTTrainer, DPOTrainer, PPOTrainer, GRPOTrainer, RewardTrainer
View → modeunsloth-expert-mode
Unsloth — 2x faster LLM fine-tuning with 70% less VRAM via fused Triton kernels
View →