← back

llm-redact: Find Sensitive Data in Your Repos Before It Reaches an LLM

5 min read Β·

At Lab34, we build AI tools that help organizations adopt AI safely and at scale. Today we are open-sourcing llm-redact β€” a command-line tool that scans cloned repositories for secrets, credentials, and personally identifiable information, so you can build guardrail lists and redaction rules before any content reaches an LLM.

We originally built llm-redact while working on the Guard Rails feature for LLM Proxy. We needed a way to audit existing codebases across entire organizations β€” hundreds of repositories, files and git history β€” to understand what sensitive data existed and where. That audit tool became llm-redact.

The Problem

Companies adopting LLM-assisted development face a common set of challenges:

llm-redact solves all of these with two complementary scanning approaches.

Two Scanners

Regex Scanner

A fast, pattern-based scanner with 45 compiled regex patterns across 8 categories. It scans both file contents and git history (commit messages and diffs), catching secrets that were committed and later removed.

The patterns cover:

# Scan a directory of repos with regex patterns
./run-scan.sh scan --path /path/to/repos --depth 2

Ollama Scanner

For things regex cannot catch β€” hardcoded internal hostnames, encoded secrets, sensitive comments β€” the Ollama scanner sends each file to a local LLM for context-aware analysis. Because it uses Ollama, everything stays on your machine. No data leaves your network.

# Scan with a local Ollama model
./run-scan.sh ollama --path /path/to/repos --model gemma3:27b --depth 3

The two scanners are complementary. Run the regex scanner first for speed and coverage, then run the Ollama scanner on the same repos to catch what patterns miss.

Multi-Repo Discovery

Both scanners support scanning a single repository or an entire directory tree of cloned repos. Point llm-redact at a parent directory and set the --depth flag:

/repos/
  org-a/
    api-service/      <- git repo
    frontend/         <- git repo
  org-b/
    backend/          <- git repo
./run-scan.sh scan --path /repos --depth 2

All three repositories are discovered and scanned automatically. This is how we use it internally β€” pointing it at a directory containing hundreds of cloned repos from across the organization.

Output Formats

Results can be rendered as a terminal table or exported for further processing:

FormatBehaviour
tableColored terminal table grouped by repository
jsonOne .json file per repository
txtOne .txt file per repository, one line per finding
csvSingle findings.csv with a repository column
# Export findings as CSV
./run-scan.sh scan --path /path/to/repos --output csv --output-dir ./results

Key Options at a Glance

FlagDescriptionDefault
--pathPath to a repo or directory of reposrequired
--depthMax depth to search for git repos2
--outputOutput format: table, json, txt, csvtable
--output-dirDirectory for file-based output-
--include / --excludeGlob patterns to filter filesall files
--no-git-historySkip commit and diff scanningfalse
--history-depthMax commits to scan per repoall
--workersParallel workers for file scanningCPU count
--processesRepos to scan in parallel1
--modelOllama model name (ollama scanner only)-

Getting Started

# Clone the repo
git clone https://github.com/lab34/llm-redact.git
cd llm-redact

# Run setup (creates venv, installs dependencies)
./setup.sh

# Scan with regex patterns
./run-scan.sh scan --path /path/to/repos

# Scan with a local Ollama model
./run-scan.sh ollama --path /path/to/repos --model llama3

The setup.sh script detects your Python installation, creates a virtual environment, and installs all dependencies. The run-scan.sh wrapper activates the venv automatically, so there is nothing else to manage.

How It Works

llm-redact is a Python CLI backed by two independent scanning engines. There are no external services to manage β€” everything runs locally.

Your Repositories
       |
       v
  [llm-redact]
       |
       +-- Repo Discovery (walks for .git/ dirs up to --depth)
       +-- Regex Scanner (45 patterns across files + git history)
       +-- Ollama Scanner (local LLM analysis per file)
       |
       v
  [Findings Report]
  (table, json, txt, csv)

The regex scanner uses compiled patterns and parallel workers for speed. The Ollama scanner respects file size limits and can target specific file types with --include globs, so you can focus the LLM on configuration and environment files where secrets are most likely to appear.

Why We Built This

When we built the Guard Rails feature for LLM Proxy, we realized that writing redaction rules is only half the problem. You first need to know what to redact and where it lives. For a single repository that is manageable. For an organization with hundreds of repos, years of git history, and dozens of teams β€” it is not.

We looked at existing secret scanning tools and found them focused on CI/CD gatekeeping (blocking commits) rather than auditing existing codebases at scale. We needed something that could scan an entire organization’s worth of cloned repos in one pass and produce a report we could use to build guardrail configurations.

Now we are releasing it as open source under the MIT license, because we believe every company using LLM tooling should be able to audit their codebases for sensitive data before adopting AI-assisted development.

Use Cases

Open Source

llm-redact is MIT-licensed and available on GitHub. We welcome contributions, bug reports, and feature requests.