OpenAI-compatible • Self-hosted • Fully free

AI Cost Firewall

Reduce repeated LLM spend with a fully free OpenAI-compatible gateway that combines exact caching and semantic reuse in front of your upstream provider.

Quickstart GitHub Open Source

Best for support bots, internal assistants, documentation search, and agent workloads with repeated prompts.

Why AI Cost Firewall

Many LLM applications repeatedly send the same or slightly rephrased prompts. AI Cost Firewall sits in front of the provider API and intercepts those requests before they become unnecessary token spend.

Exact cache

Instant reuse for identical prompts and payloads.

Semantic layer

Reuse answers for meaningfully similar prompts, not only exact matches.

Operational visibility

Track hits, misses, and estimated savings with Prometheus and Grafana.

Quickstart

Run the gateway in front of your existing OpenAI-compatible client and point your application to the firewall endpoint.

git clone https://github.com/vcal-project/ai-firewall.git
cd ai-firewall
cp configs/ai-firewall.conf.example configs/ai-firewall.conf
nano configs/ai-firewall.conf # Replace the placeholders with your API keys
docker compose pull
docker compose up -d

Open source and self-deployable

AI Cost Firewall is a fully free project that you can deploy in your own environment. It is designed for teams that want a simple OpenAI-compatible gateway with exact and semantic caching, without committing to a managed service or commercial package.

Fully free

Use, test, modify, and deploy it yourself as part of your own stack.

Self-hosted

Keep traffic, cache data, and observability inside your own infrastructure perimeter.

OpenAI-compatible

Put it in front of existing clients and workflows with minimal application changes.

Get started with AI Cost Firewall

Explore the repository, deploy it in your own environment, and test how much repeated LLM traffic you can eliminate with exact and semantic caching.

View on GitHub See Quickstart See VCAL Server