LLM Proxy: An Open-Source Gateway to Secure and Control Your AI Usage

6 min read · 2026-04-07

At Lab34, we build AI tools that help organizations adopt AI safely and at scale. Today we are open-sourcing LLM Proxy — a lightweight, OpenAI-compatible reverse proxy that sits between your applications and your LLM providers, giving you full control over access, costs, and data security.

We originally built LLM Proxy for ReArch, our background AI agent platform that lets enterprises ship code at scale. It gave birth to the Guard Rails feature, and ultimately to LLM Proxy as a standalone project.

The Problem

Companies adopting LLMs face a common set of challenges:

Secret leakage. Developers and non-technical users paste passwords, API keys, and internal identifiers into prompts without thinking twice. Once that data reaches an external API, you have lost control of it.
Credential sprawl. Sharing a single OpenAI API key across teams is a security risk. Rotating it means touching every service that uses it.
No visibility. Without a proxy layer, there is no centralized place to see who is calling what model, how many tokens are being consumed, or whether usage is within budget.
Rate control. A single runaway script can exhaust your rate limits and block the rest of the organization.

LLM Proxy solves all of these with a single, self-hosted binary.

What Is LLM Proxy?

LLM Proxy is a drop-in reverse proxy written in Go. Any application that speaks the OpenAI API can point at LLM Proxy instead, with zero code changes. Under the hood, the proxy:

Authenticates the request using a proxy-issued API key (prefixed llmp-).
Enforces rate limits per key (configurable requests per minute).
Applies Guard Rails — regex-based rules that reject or redact sensitive content before it ever leaves your network.
Forwards the request to the real upstream provider (OpenAI, Azure OpenAI, local models, or any OpenAI-compatible backend).
Tracks token usage per key, provider, and time range.

Your upstream API keys never leave the proxy. Clients only ever see their llmp- key.

Guard Rails: Stop Secrets Before They Leak

Guard Rails is the feature that prompted us to build LLM Proxy in the first place. It works in two modes:

Reject Mode

Define a regex pattern. If any message in the request matches, the entire request is blocked with a 400 error and an audit event is recorded. The prompt never reaches the upstream provider.

Example: Block any request containing what looks like an AWS secret key:

{
  "pattern": "AKIA[0-9A-Z]{16}",
  "mode": "reject"
}

Replace Mode

Same regex matching, but instead of blocking, the matched content is replaced with a safe placeholder before forwarding. The upstream provider sees the sanitized version.

Example: Redact anything that looks like a database connection string:

{
  "pattern": "postgres://[^\\s]+",
  "mode": "replace",
  "replace_by": "[REDACTED_DB_URL]"
}

Full Audit Trail

Every time a Guard Rail triggers — whether it rejects or redacts — an event is logged with the rule that matched, the API key that sent the request, and the original input text. You can review these events through the dashboard or the admin API, giving your security team full visibility into attempted leaks.

Guard Rails are cached in memory with a 30-second TTL, so adding or removing rules takes effect almost instantly without restarting the proxy.

Key Features at a Glance

Feature	Description
OpenAI-compatible API	Works with any OpenAI client library. Just change the base URL.
Streaming support	Full SSE streaming for chat completions.
Multi-provider	Register multiple upstream backends — OpenAI, Azure, local LLMs, or any compatible endpoint.
Proxy API keys	Issue scoped keys per team, project, or environment. Upstream credentials stay hidden.
Per-key rate limiting	Token-bucket rate limiter with configurable RPM per key.
Token usage tracking	Query usage by key, provider, and date range. Know exactly where your budget is going.
Guard Rails	Regex-based reject or redact rules with a full audit log.
Web dashboard	Manage providers, keys, guard rails, and usage from a browser. Includes a chat playground.
Swagger UI	Interactive API documentation at `/docs`.

Getting Started

Option 1: Docker Compose (Recommended)

The fastest way to get started:

services:
  llm-proxy:
    image: ghcr.io/lab34-es/llm-proxy:latest
    ports:
      - "8080:8080"
    environment:
      - ADMIN_TOKEN=change-me-to-a-secure-token
      - ADDR=:8080
      - DSN=/data/llm-proxy.db
    volumes:
      - llm-proxy-data:/data
    restart: unless-stopped

volumes:
  llm-proxy-data:

docker compose up -d

Option 2: Build From Source

go build -o llm-proxy .
ADMIN_TOKEN=my-secret-admin-token ./llm-proxy

The proxy starts on port 8080. Open http://localhost:8080/dashboard to access the web UI.

Step-by-Step Setup

1. Register an upstream provider. Go to the dashboard or use the admin API:

curl -X POST http://localhost:8080/admin/providers \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "openai",
    "base_url": "https://api.openai.com",
    "api_key": "sk-proj-..."
  }'

2. Create a proxy API key. Assign it to a provider and set a rate limit:

curl -X POST http://localhost:8080/admin/keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "engineering-team",
    "provider_id": "<provider-id>",
    "rate_limit_rpm": 120
  }'

Save the returned llmp- key — it is shown only once.

3. Configure Guard Rails. Protect against secret leakage:

curl -X POST http://localhost:8080/admin/guardrails \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"pattern": "(?i)password\\s*[:=]\\s*\\S+", "mode": "reject"}'

4. Point your applications at the proxy. Any OpenAI-compatible client works:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="llmp-..."  # your proxy key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

That is it. Your application talks to the proxy, the proxy talks to OpenAI (or whichever provider you configured), and Guard Rails ensure nothing sensitive gets through.

Architecture

LLM Proxy is a single Go binary backed by SQLite. There are no external dependencies to manage — no Redis, no Postgres, no message queues. It runs anywhere: a VM, a container, a Kubernetes pod, or even a Raspberry Pi.

Your App (OpenAI SDK)
       |
       v
  [LLM Proxy]
       |
       +-- Auth (SHA-256 key lookup)
       +-- Rate Limiter (per-key token bucket)
       +-- Guard Rails (reject / redact + audit)
       |
       v
  [Upstream Provider]
  (OpenAI, Azure, local LLM, etc.)

The proxy supports both standard JSON responses and streaming (SSE), so tools like chat interfaces and coding assistants work without modification.

Why We Built This

When we built ReArch, we needed a way to let customers bring their own LLM providers while ensuring their API keys and sensitive data stayed safe. Some of our customers operate in regulated industries where sending an unredacted database password to an external API is not just a bad practice — it is a compliance violation.

We looked at existing solutions and found them either too heavy (full API management platforms) or too limited (simple reverse proxies without content inspection). So we built exactly what we needed: a focused, lightweight proxy with Guard Rails at its core.

Now we are releasing it as open source under the MIT license, because we believe every company using LLMs should have access to this level of control without vendor lock-in.

Use Cases

Engineering teams that need per-developer API keys without sharing upstream credentials.
Regulated industries (finance, healthcare, government) that must prevent sensitive data from reaching external APIs.
Platform teams managing LLM access for multiple internal services with usage tracking and rate limits.
AI-powered products like ReArch that need a reliable proxy layer between their platform and upstream LLM providers.
Cost management — track exactly how many tokens each team or project consumes.

Open Source

LLM Proxy is MIT-licensed and available on GitHub. We welcome contributions, bug reports, and feature requests.

GitHub: github.com/lab34-es/llm-proxy
Docker image: ghcr.io/lab34-es/llm-proxy:latest
Platforms: Linux (amd64, arm64), macOS, anywhere Go runs