Privacy-first • Self-hosted • Built for production
Infrastructure products for
LLM applications and agentic workflows
Start with AI Cost Firewall, a free OpenAI-compatible gateway
for reducing repeated LLM calls, improving cost visibility, and adding safeguards to AI applications,
RAG systems, and agentic workflows. Add VCAL Semantic Cache
when semantic reuse becomes critical infrastructure for low-latency, private, and scalable AI systems.
Free open-source entry point
AI Cost Firewall
An OpenAI-compatible gateway that sits in front of your application to reduce duplicate
and semantically similar requests, improve cost visibility, and add production safeguards.
- • Drop-in gateway for
/v1/chat/completions
- • Exact and semantic reuse to reduce token spend
- • Fits agentic steps such as routing, classification, summarization, validation, and repeated reasoning
- • Prometheus and Grafana visibility out of the box
- • Tested at 500 RPS as a full gateway layer
- • Best starting point for teams adopting cost control quickly
Production semantic cache layer
VCAL Semantic Cache
A production-ready semantic cache service for teams that need caching to be a durable,
observable, and deployable part of their AI infrastructure.
- • Dedicated semantic cache service for LLM workloads
- • Semantic reuse layer for LLM applications, RAG systems, and agentic workflows
- • Built for private deployment inside your own environment
- • Snapshots, observability, licensing, and enterprise support
- • Tested at 30K RPS as a focused cache lookup layer
- • Natural next step when semantic reuse becomes strategic
Benchmarks measure different workloads: AI Cost Firewall is tested as a full OpenAI-compatible gateway,
while VCAL Semantic Cache is tested as a focused semantic cache lookup layer. Real-world throughput
depends on hardware, configuration, request size, cache behavior, and deployment topology.