Changelog
What's new in vMLX
- Multilingual website — Korean, Spanish, Chinese, and Japanese with auto-detect
- Updated model compatibility
- Bug fixes and stability improvements
- Persistent disk cache — cache survives app restarts
- Configurable disk cache size and directory
- Performance improvements for large contexts
- KV cache quantization (
q4/q8) — 2–4x cache memory savings
- Storage-boundary quantization — full precision during generation
- Improved memory efficiency
- Browser automation tool — Playwright-based
- Web search tool — DuckDuckGo + Brave
- URL fetch tool
- Expanded agentic toolkit to 20+ tools
- Vision model support — Qwen VL, LLaVA
- VL models work with full 5-layer caching stack
- Image attachment in chat UI
- Mamba/SSM support with
BatchMambaCache
- Speculative decoding with configurable draft model
- Separate embedding model endpoint
- Initial release
- Paged multi-context KV cache with prefix caching
- Continuous batching — up to 256 sequences
- Built-in agentic coding tools (MCP)
- OpenAI-compatible API — 7 endpoints
- Built-in HuggingFace model browser
- Code-signed with Apple Developer ID