Skip to main content

Token optimization goes Kubernetes-native

MCP Optimizer capabilities are now embedded directly in vMCP, bringing the same token reduction and improved tool selection per request from desktop to Kubernetes-native deployments. Instead of every developer configuring a local optimizer instance, platform teams deploy it once and every connected client benefits automatically.

  • On-demand tool discovery means agents no longer receive hundreds of tool descriptions in context. Instead, tools are discovered at request time and only the relevant ones are surfaced (up to 8 by default, configurable) via hybrid (semantic + keyword search), cutting token usage by 60-85% per request while improving tool selection accuracy.
  • Team-wide token savings from a single deployment means the 60-85% per-request reduction the Optimizer delivers on desktop now applies to every developer connected to vMCP. Deploy it once, and the savings multiply across the entire team without anyone managing a local optimizer instance.
  • No per-developer configuration required. Developers point their MCP client at the vMCP endpoint and get optimized routing automatically, with no local embedding models, no search parameter tuning, and no setup drift across the team.

The setup is GitOps-friendly: EmbeddingServer and VirtualMCPServer CRDs deploy through your existing CI/CD pipeline. For the full configuration reference and quickstart examples, check out the vMCP optimizer guide to get started. For desktop users, MCP Optimizer remains available through the ToolHive UI and CLI as before.

Getting started

For detailed release notes, check the project repositories:

You can find all ToolHive documentation on the Stacklok documentation site.