structured-output-expert-mode
Constrained generation across stacks — Outlines, lm-format-enforcer, llama.cpp GBNF, OpenAI json_schema, vLLM guided_json, Instructor — with a decision matrix
More in Model Authoring
chat-template-expert-mode
Author and debug Jinja2 chat_template strings in HF tokenizer_config.json — ChatML, Llama 3, Qwen, Gemma, Mistral, plus tools / function calling
View → modedistil-mini-model-expert-mode
Author small distilled models for shipping — choose teacher, design distillation recipe, evaluate on real prompts before publish, GGUF quant for footprint
View → modeembedding-model-publish-expert-mode
Publish embedding models — sentence-transformers config, modules.json, 1_Pooling, MTEB submission, Matryoshka dims, embedding-specific model card
View → modegguf-conversion-expert-mode
Convert HF safetensors to GGUF with convert_hf_to_gguf.py — handle vocab, tied embeddings, sharded checkpoints, and produce reproducible F16/BF16 + quantize pipelines
View → modegguf-multimodal-mmproj-expert-mode
Author multimodal GGUF — mmproj projector files, llama-mtmd-cli, llama-server multimodal endpoint, with LLaVA / MiniCPM-V / InternVL / Qwen2-VL / Gemma 3
View → modelora-adapter-publish-expert-mode
Package and publish LoRA adapters — HF Hub layout, vLLM dynamic loading, llama.cpp LoRA GGUF, Ollama ADAPTER directive, Replicate Cog
View →