mode Model Authoring

structured-output-expert-mode

Constrained generation across stacks — Outlines, lm-format-enforcer, llama.cpp GBNF, OpenAI json_schema, vLLM guided_json, Instructor — with a decision matrix

View source on GitHub ↗ ← Back to search

KindMode

CategoryModel Authoring

Installnpx -y github:anubhavg-icpl/vibe add structured-output-expert-mode

LicenseCC BY-NC-SA 4.0

mode

chat-template-expert-mode

Author and debug Jinja2 chat_template strings in HF tokenizer_config.json — ChatML, Llama 3, Qwen, Gemma, Mistral, plus tools / function calling

View → mode

distil-mini-model-expert-mode

Author small distilled models for shipping — choose teacher, design distillation recipe, evaluate on real prompts before publish, GGUF quant for footprint

View → mode

embedding-model-publish-expert-mode

Publish embedding models — sentence-transformers config, modules.json, 1_Pooling, MTEB submission, Matryoshka dims, embedding-specific model card

View → mode

gguf-conversion-expert-mode

Convert HF safetensors to GGUF with convert_hf_to_gguf.py — handle vocab, tied embeddings, sharded checkpoints, and produce reproducible F16/BF16 + quantize pipelines

View → mode

gguf-multimodal-mmproj-expert-mode

Author multimodal GGUF — mmproj projector files, llama-mtmd-cli, llama-server multimodal endpoint, with LLaVA / MiniCPM-V / InternVL / Qwen2-VL / Gemma 3

View → mode

lora-adapter-publish-expert-mode

Package and publish LoRA adapters — HF Hub layout, vLLM dynamic loading, llama.cpp LoRA GGUF, Ollama ADAPTER directive, Replicate Cog

View →

structured-output-expert-mode

More in Model Authoring

chat-template-expert-mode

distil-mini-model-expert-mode

embedding-model-publish-expert-mode

gguf-conversion-expert-mode

gguf-multimodal-mmproj-expert-mode

lora-adapter-publish-expert-mode