Changelog

What's new in vMLX

v1.1.9 March 2026 Latest

Multilingual website — Korean, Spanish, Chinese, and Japanese with auto-detect
Updated model compatibility
Bug fixes and stability improvements

v1.1.8 February 2026

Persistent disk cache — cache survives app restarts
Configurable disk cache size and directory
Performance improvements for large contexts

v1.1.7 February 2026

KV cache quantization (q4/q8) — 2–4x cache memory savings
Storage-boundary quantization — full precision during generation
Improved memory efficiency

v1.1.6 January 2026

Browser automation tool — Playwright-based
Web search tool — DuckDuckGo + Brave
URL fetch tool
Expanded agentic toolkit to 20+ tools

v1.1.5 January 2026

Vision model support — Qwen VL, LLaVA
VL models work with full 5-layer caching stack
Image attachment in chat UI

v1.1.0 December 2025

Mamba/SSM support with BatchMambaCache
Speculative decoding with configurable draft model
Separate embedding model endpoint

v1.0.0 November 2025

Initial release
Paged multi-context KV cache with prefix caching
Continuous batching — up to 256 sequences
Built-in agentic coding tools (MCP)
OpenAI-compatible API — 7 endpoints
Built-in HuggingFace model browser
Code-signed with Apple Developer ID