OpenAI-compatible • Self-hosted • Fully free

AI Cost Firewall

Reduce repeated LLM spend with a fully free OpenAI-compatible gateway that combines exact caching and semantic reuse in front of your upstream provider, application, or agent runtime.

Read Docs Quickstart GitHub Open Source

Best for support bots, internal assistants, documentation search, RAG systems, and agentic workflows with repeated LLM calls.

How AI Cost Firewall works

A drop-in OpenAI-compatible gateway that reduces duplicate and semantically similar requests across chatbots, RAG systems, internal tools, and agentic workflows while preserving full API compatibility.

Exact cache (Redis) handles repeated requests, while semantic cache (Qdrant) enables reuse of similar queries. Cache misses are forwarded to the LLM provider.

Why AI Cost Firewall

Many LLM applications and agentic workflows repeatedly send the same or slightly rephrased prompts for routing, classification, summarization, validation, and user-facing responses. AI Cost Firewall sits in front of the provider API and intercepts those requests before they become unnecessary token spend.

Exact cache

Instant reuse for identical prompts and payloads.

Semantic layer

Reuse answers for meaningfully similar prompts, not only exact matches.

Operational visibility

Track hits, misses, and estimated savings with Prometheus and Grafana.

Quickstart

Run the gateway in front of your existing OpenAI-compatible client, application, or agent framework and point it to the firewall endpoint.

git clone https://github.com/vcal-project/ai-firewall.git
cd ai-firewall
cp configs/ai-firewall.conf.example configs/ai-firewall.conf
nano configs/ai-firewall.conf # Replace the placeholders with your API keys
docker compose pull
docker compose up -d

Open source and self-deployable

AI Cost Firewall is a fully free project that you can deploy in your own environment. It is designed for teams that want a simple OpenAI-compatible gateway with exact and semantic caching, without committing to a managed service or commercial package.

Fully free

Use, test, modify, and deploy it yourself as part of your own stack.

Self-hosted

Keep traffic, cache data, and observability inside your own infrastructure perimeter.

OpenAI-compatible

Put it in front of existing clients, AI applications, and agentic workflows with minimal application changes.

Enterprise add-on ecosystem

Extend AI Cost Firewall with VCAL Guards

AI Cost Firewall remains a free, self-hosted gateway. VCAL Guards are optional enterprise add-ons for teams that need privacy protection, security controls, compliance support, and audit workflows around LLM traffic.

Primary add-on

VCAL Privacy Guard

Detect, redact, or anonymize sensitive data before prompts reach upstream LLM providers, then restore safe placeholder mappings where needed.

• PII and secrets detection
• Redaction or anonymization mode
• Metadata-only audit and metrics

Discuss Privacy Guard

Planned

VCAL Security Guard

Prompt-risk checks, unsafe instruction detection, and security-oriented policy signals around AI traffic.

• Prompt and response risk signals
• Security rule packs
• Operator metrics

Planned

VCAL Compliance

Compliance-oriented controls, policy templates, and deployment evidence for regulated AI workflows.

• Compliance policy profiles
• Configurable control evidence
• Enterprise reporting hooks

Planned

VCAL Audit

Structured audit export, integration with SIEM pipelines, and long-term operational evidence retention.

• Metadata-first audit events
• Export integrations
• Retention-oriented workflows

Get started with AI Cost Firewall

Explore the repository, deploy it in your own environment, and test how much repeated LLM traffic your applications, RAG systems, or agentic workflows can eliminate with exact and semantic caching.

Need setup assistance, architecture guidance, production hardening, or enterprise support? Contact VCAL .

View on GitHub See Quickstart See VCAL Semantic Cache