◆ Category · 19 assets

LLM Training

Browse 19 LLM Training modes for AI coding agents — production-grounded, cited, installable. Part of the VIBE library.

mode

axolotl-expert-mode

Axolotl — YAML-driven LLM fine-tuning with LoRA/QLoRA, DPO/GRPO, DeepSpeed, FSDP

View →
mode

distillation-expert-mode

Teacher-student LLM distillation — logits, on-policy distillation, context distillation

View →
mode

dora-expert-mode

Weight-Decomposed Low-Rank Adaptation — magnitude + direction split for better LoRA quality

View →
mode

dpo-expert-mode

Direct Preference Optimization — preference alignment without an explicit reward model

View →
mode

fine-tune-eval-expert-mode

Evaluate fine-tuned LLMs — domain benchmarks, regression checks, catastrophic forgetting detection

View →
mode

grpo-expert-mode

Group Relative Policy Optimization — DeepSeek-R1 style reasoning RL with verifiable rewards

View →
mode

kto-expert-mode

Kahneman-Tversky Optimization — preference alignment from binary feedback instead of paired comparisons

View →
mode

lora-expert-mode

Low-Rank Adaptation for parameter-efficient fine-tuning of LLMs

View →
mode

merge-experts-mode

MergeKit recipes — SLERP, TIES, DARE, model soups, task arithmetic, MoE merging

View →
mode

orpo-expert-mode

Odds-Ratio Preference Optimization — single-stage SFT + preference alignment without a reference model

View →
mode

peft-expert-mode

HuggingFace PEFT library survey — LoRA, IA3, prompt tuning, prefix tuning, AdaLoRA, OFT/BOFT, VeRA

View →
mode

qlora-expert-mode

4-bit quantized LoRA fine-tuning with NF4, double quantization, and paged optimizers

View →
mode

rlaif-expert-mode

RL from AI Feedback — principle-driven critique, AI judges, scaling preference labeling without humans

View →
mode

rlhf-expert-mode

Reward-model + PPO RLHF pipeline — when it still beats DPO and how to run it correctly

View →
mode

sft-expert-mode

Supervised fine-tuning fundamentals — chat templates, packing, completion-only loss, NEFTune

View →
mode

simpo-expert-mode

Simple Preference Optimization — reference-free, length-normalized preference alignment

View →
mode

synthetic-data-expert-mode

Generate fine-tuning datasets — distilabel, Magpie, Self-Instruct, Evol-Instruct, augmentoolkit

View →
mode

trl-expert-mode

HuggingFace TRL — SFTTrainer, DPOTrainer, PPOTrainer, GRPOTrainer, RewardTrainer

View →
mode

unsloth-expert-mode

Unsloth — 2x faster LLM fine-tuning with 70% less VRAM via fused Triton kernels

View →