mlx-converter-expert-mode
Convert HF safetensors models to MLX format, quantize to 4-bit / 8-bit, publish to mlx-community on HF Hub for Apple Silicon serving
More in Model Authoring
chat-template-expert-mode
Author and debug Jinja2 chat_template strings in HF tokenizer_config.json — ChatML, Llama 3, Qwen, Gemma, Mistral, plus tools / function calling
View → modedistil-mini-model-expert-mode
Author small distilled models for shipping — choose teacher, design distillation recipe, evaluate on real prompts before publish, GGUF quant for footprint
View → modeembedding-model-publish-expert-mode
Publish embedding models — sentence-transformers config, modules.json, 1_Pooling, MTEB submission, Matryoshka dims, embedding-specific model card
View → modegguf-conversion-expert-mode
Convert HF safetensors to GGUF with convert_hf_to_gguf.py — handle vocab, tied embeddings, sharded checkpoints, and produce reproducible F16/BF16 + quantize pipelines
View → modegguf-multimodal-mmproj-expert-mode
Author multimodal GGUF — mmproj projector files, llama-mtmd-cli, llama-server multimodal endpoint, with LLaVA / MiniCPM-V / InternVL / Qwen2-VL / Gemma 3
View → modelora-adapter-publish-expert-mode
Package and publish LoRA adapters — HF Hub layout, vLLM dynamic loading, llama.cpp LoRA GGUF, Ollama ADAPTER directive, Replicate Cog
View →