◆ Category · 17 assets

Model Authoring

Browse 17 Model Authoring modes for AI coding agents — production-grounded, cited, installable. Part of the VIBE library.

mode

chat-template-expert-mode

Author and debug Jinja2 chat_template strings in HF tokenizer_config.json — ChatML, Llama 3, Qwen, Gemma, Mistral, plus tools / function calling

View →
mode

distil-mini-model-expert-mode

Author small distilled models for shipping — choose teacher, design distillation recipe, evaluate on real prompts before publish, GGUF quant for footprint

View →
mode

embedding-model-publish-expert-mode

Publish embedding models — sentence-transformers config, modules.json, 1_Pooling, MTEB submission, Matryoshka dims, embedding-specific model card

View →
mode

gguf-conversion-expert-mode

Convert HF safetensors to GGUF with convert_hf_to_gguf.py — handle vocab, tied embeddings, sharded checkpoints, and produce reproducible F16/BF16 + quantize pipelines

View →
mode

gguf-multimodal-mmproj-expert-mode

Author multimodal GGUF — mmproj projector files, llama-mtmd-cli, llama-server multimodal endpoint, with LLaVA / MiniCPM-V / InternVL / Qwen2-VL / Gemma 3

View →
mode

lora-adapter-publish-expert-mode

Package and publish LoRA adapters — HF Hub layout, vLLM dynamic loading, llama.cpp LoRA GGUF, Ollama ADAPTER directive, Replicate Cog

View →
mode

mlx-converter-expert-mode

Convert HF safetensors models to MLX format, quantize to 4-bit / 8-bit, publish to mlx-community on HF Hub for Apple Silicon serving

View →
mode

model-card-publish-expert-mode

Author HF model cards — README.md frontmatter (license, library_name, base_model, datasets, language, pipeline_tag, tags), eval results, intended use, training attribution

View →
mode

ollama-library-publisher-expert-mode

Publish models to ollama.com/library — namespace setup, ollama push, signing keys, quant tags, parameter-size tags, model card README authoring

View →
mode

ollama-modelfile-expert-mode

Author production Modelfiles with FROM, PARAMETER, TEMPLATE, SYSTEM, ADAPTER, MESSAGE, and LICENSE directives for Llama 3, Qwen, Phi, and Gemma

View →
mode

ollama-multimodal-modelfile-expert-mode

Author Ollama Modelfiles for vision models — llava, llama3.2-vision, MiniCPM-V — with mmproj projector handling and image-token templates

View →
mode

prompt-template-marketplace-expert-mode

Share and version prompt templates — LangChain Hub, Langfuse, dotprompt, OpenAI Playground exports, promptfoo configs — with deprecation patterns

View →
mode

quantization-format-expert-mode

Pick between GGUF K/IQ quants, AWQ, GPTQ, bitsandbytes NF4, EXL2, MLX 4-bit, NVFP4 — decision matrix by hardware and serving stack

View →
mode

safetensors-expert-mode

Author and inspect safetensors files — header layout, sharding via model.safetensors.index.json, mmap loading, and PEFT adapter format

View →
mode

structured-output-expert-mode

Constrained generation across stacks — Outlines, lm-format-enforcer, llama.cpp GBNF, OpenAI json_schema, vLLM guided_json, Instructor — with a decision matrix

View →
mode

system-prompt-engineering-expert-mode

Author durable system prompts — persona, capability scoping, refusal patterns, output format directives, jailbreak hardening, prompt caching, dynamic injection

View →
mode

tokenizer-engineering-expert-mode

Train tokenizers from scratch with HF tokenizers — BPE / SentencePiece / WordPiece — extend vocab for new languages or code, and add chat / special tokens

View →