Reducing context

before calling an expensive model

INDEX

  1. Introduction
  2. Requirements
  3. The problem
  4. Solution 1: clean noise before prompting
  5. Solution 2: operational summary
  6. Solution 3: compress memory with Caveman
  7. Solution 4: task-specific templates
  8. Solution 5: measure and decide
  9. Extras
  10. Full workflow

1. Introduction

Saving tokens is not only about shorter prompts. The biggest saving often happens before the prompt: not sending junk to the model.

In this lab we clean repeated text, create operational summaries, compress repeated project memory and only send the expensive model what it really needs.

2. Requirements

We need a long text, a clear target, a local cleanup step, a paid model for the final answer and a way to measure tokens before and after.

3. The problem

We will separate useful information from noise, create a smaller input, compress repeated memory when relevant and measure the result.

Context reduction before and after
Context reduction before and after

4. Solution 1: clean noise before prompting

Rules are often enough for repeated headers, signatures, duplicated lines or useless debug traces.

def clean_lines(text):
    lines = []
    seen = set()
    for line in text.splitlines():
        normalized = line.strip()
        if not normalized or "DEBUG" in normalized:
            continue
        if normalized in seen:
            continue
        seen.add(normalized)
        lines.append(line)
    return "\n".join(lines)

5. Solution 2: operational summary

Use a small, stable structure:

Goal:
Relevant data:
Constraints:
Recent changes:
Concrete question:

This keeps reasoning material and removes filler.

6. Solution 3: compress memory with Caveman

Caveman includes caveman-compress, a tool for compressing memory files such as CLAUDE.md, todos or preferences so every session loads fewer tokens. Its README explains that it keeps a human-readable .original.md backup and preserves code blocks, URLs, paths, commands, technical terms, headings, tables, dates and numeric values.

This is useful when the same project memory is loaded repeatedly.

7. Solution 4: task-specific templates

Use different templates for tickets, meetings and code review. Each workflow has different fields that matter.

8. Solution 5: measure and decide

Track tokens before, tokens after, answer quality and human review time. Saving tokens is only useful if the review cost does not explode.

9. Extras

Use local AI as a filter, keep cleaned context, and avoid summarizing away dates, numbers or constraints.

10. Full workflow

Clean, summarize, compress recurring memory, call the paid model only with relevant context, and measure.