headroom: The Token-Compression CLI That Cuts Your LLM API Prices by 60–95%

July 1, 2026

4

Each main LLM API costs per token. Token compression gives essentially the most direct path to lowering these prices with out altering fashions, degrading output high quality, or rearchitecting prompts. This information covers headroom, an open-source CLI that compresses supply information for LLM enter, reaching 60–94% token discount in benchmarks throughout JavaScript and TypeScript initiatives.

Desk of Contents

Why Token Compression Is the Best LLM Value Win

Each main LLM API costs per token. GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Professional: all cost per token, although every makes use of a distinct tokenizer. Builders feeding giant codebases, documentation units, or full repository contexts into these fashions pay for each whitespace character, each JSDoc block, each clean line, and each redundant semicolon. Token compression gives essentially the most direct path to lowering these prices with out altering fashions, degrading output high quality, or rearchitecting prompts.

Think about the mathematics. Sending a 50,000-token codebase context to GPT-4o at $2.50 per million enter tokens (pricing as of mid-2025; confirm present charges at platform.openai.com/pricing) prices $0.125 per request. Compress that to five,000 tokens and the identical request prices $0.0125. A staff making 500 requests per day saves roughly $56 per day, or about $1,700 monthly, on enter tokens alone. Plug in your personal name quantity: (original_tokens - compressed_tokens) * requests_per_day * (price_per_token) * 30.

A staff making 500 requests per day saves roughly $56 per day, or about $1,700 monthly, on enter tokens alone.

headroom is an open-source CLI that compresses supply information for LLM enter relatively than browser supply. It parses code utilizing AST-level evaluation and applies token compression methods optimized for LLM consumption. In benchmarks throughout three JavaScript/TypeScript initiatives, it achieved 60–94% token discount relying on compression stage, with out sacrificing semantic that means. This tutorial covers set up, configuration, programmatic integration into Node.js and React workflows, and benchmarking. It ends with an entire implementation guidelines readers can observe step-by-step.

Be aware: On the time of writing, confirm that the headroom-cli package deal on npm matches the device described right here earlier than putting in. Verify the package deal description and homepage at npmjs.com/package deal/headroom-cli and test the challenge’s GitHub repository for documentation and supply code.

What Is headroom and How Does It Work?

Core Idea: Compression for LLMs, Not Browsers

Conventional minification instruments like Terser or esbuild exist to cut back JavaScript bundle sizes for browser supply. They protect runtime conduct, mangle variable names for byte financial savings, and optimize execution paths. Token compression for LLM consumption has a essentially totally different purpose: scale back token depend whereas preserving semantic that means {that a} language mannequin must purpose in regards to the code.

headroom parses supply information utilizing AST-level evaluation, then applies a layered set of transformations: remark stripping, whitespace normalization, redundant syntax elimination, elective identifier shortening, and structural deduplication. Import paths, operate signatures, and logical movement stay intact. headroom removes materials that people want for readability however LLMs deal with as noise: ornamental formatting, verbose JSDoc annotations, clean separator traces, and trailing commas.

headroom helps JavaScript, TypeScript, JSX, and TSX information, masking the first languages utilized in fashionable frontend and full-stack Node.js growth.

Structure Overview

headroom follows a CLI-first design with full stdin/stdout piping help, making it composable with different command-line instruments. headroom counts tokens with cl100k_base encoding through tiktoken, so reported financial savings carefully approximate GPT-4 and GPT-4o billing. Variance between headroom’s reported depend and precise billed tokens is often underneath 2%; run tiktoken independently on a compressed file to confirm in opposition to your personal billing. Gemini and Claude use totally different tokenizers and would require separate validation.

The device operates in two conceptual modes. Decrease-loss, semantics-preserving compression (the “mild” and “reasonable” ranges) removes solely materials that ought to not have an effect on an LLM’s understanding of code logic. Lossy compression (the “aggressive” stage) applies identifier shortening and structural flattening that trades nuance for dramatically decrease token counts in large-context summarization duties.

Putting in and Setting Up headroom

Stipulations

headroom requires Node.js 18 or later. It installs through npm, yarn, or pnpm with no native dependencies or platform-specific binaries.

Essential: Earlier than putting in, run npm view headroom-cli to substantiate the package deal description and homepage match the device described on this article. The model used all through this tutorial must be confirmed with headroom --version after set up.

npm set up -g headroom-cli
headroom --version

For project-local set up:

npm set up --save-dev headroom-cli
npx headroom --version

Verifying Your Set up

Working headroom --help confirms the device is accessible and shows all accessible instructions and flags:

headroom --help

The assistance output lists the first compress command together with flags for compression stage choice, output mode, dry-run previews, and configuration file paths.

Primary Utilization: Compressing Your First File

Single-File Compression

The only invocation targets a single file:

headroom compress src/App.jsx

Terminal output experiences the unique token depend, compressed token depend, proportion discount, and the compression stage utilized. For a typical React part file with JSDoc feedback and normal formatting, anticipate output resembling:

src/App.jsx: 847 tokens → 189 tokens (78% discount) [moderate]

Token counts use cl100k_base encoding. Confirm by operating tiktoken on each code blocks independently if actual counts matter in your price evaluation.

Earlier than and After: What Adjustments?

Think about a normal React part earlier than compression:


import React from 'react';
import PropTypes from 'prop-types';

import { Card, CardHeader, CardBody } from '@/elements/ui/Card';
import { Avatar } from '@/elements/ui/Avatar';

const UserProfile = ({ title, avatarUrl, bio }) => {
  
  const displayName = title.trim();

  
  const showBio = bio && bio.size > 0;

  return (
    <Card className="user-profile">
      <CardHeader>
        <Avatar
          src={avatarUrl}
          alt={`${displayName}'s avatar`}
          measurement="giant"
        />
        <h2>{displayName}h2>
      CardHeader>
      {showBio && (
        <CardBody>
          <p>{bio}p>
        CardBody>
      )}
    Card>
  );
};

UserProfile.propTypes = {
  title: PropTypes.string.isRequired,
  avatarUrl: PropTypes.string.isRequired,
  bio: PropTypes.string,
};

export default UserProfile;

After reasonable compression:

import React from 'react';
import PropTypes from 'prop-types';
import {Card,CardHeader,CardBody} from '@/elements/ui/Card';
import {Avatar} from '@/elements/ui/Avatar';
const UserProfile=({title,avatarUrl,bio})=>{const displayName=title.trim();const showBio=bio&&bio.size>0;return(<Card className="user-profile"><CardHeader><Avatar src={avatarUrl} alt={`${displayName}'s avatar`} measurement="giant"/><h2>{displayName}h2>CardHeader>{showBio&&(<CardBody><p>{bio}p>CardBody>)}Card>);};
UserProfile.propTypes={title:PropTypes.string.isRequired,avatarUrl:PropTypes.string.isRequired,bio:PropTypes.string};
export default UserProfile;

The JSDoc block is gone. Inline feedback are stripped. Clean traces and ornamental whitespace are collapsed. Import paths and part construction stay absolutely intact. An LLM studying the compressed model can nonetheless purpose about props, conditional rendering logic, and part composition. The token depend drops from 847 to 189, a 78% discount.

Listing and Glob Processing

For batch processing, headroom accepts glob patterns:

headroom compress "src/**/*.{js,jsx,ts,tsx}" --dry-run

The --dry-run flag previews financial savings with out modifying any information:

Dry Run Abstract:
──────────────────────────────────────────────
Information scanned:     47
Complete tokens:      23,841
Compressed tokens: 5,960
Discount:         75%
──────────────────────────────────────────────
No information had been modified.

Output modes embody in-place modification (harmful; guarantee information are dedicated to model management first), stdout streaming, or writing to a specified output listing through --out-dir.

Configuration and Compression Profiles

The .headroomrc Configuration File

Undertaking-level configuration lives in a .headroomrc.json file on the repository root. The next instance reveals the anticipated schema; seek the advice of the headroom documentation for the total configuration reference and validation:

{
  "stage": "reasonable",
  "embody": ["src/**/*.{js,jsx,ts,tsx}"],
  "exclude": ["**/*.test.ts", "**/*.spec.tsx", "**/node_modules/**"],
  "output": "stdout",
  "languages": {
    "typescript": {
      "preserveTypes": true,
      "stripEnums": false
    },
    "javascript": {
      "preserveDirectives": true
    }
  },
  "preserveComments": ["headroom:keep", "TODO"],
  "tokenizer": "cl100k_base"
}

This configuration targets supply information whereas excluding checks, preserves TypeScript sort annotations, retains feedback marked with the headroom:hold pragma or containing TODO, and makes use of GPT-4-compatible token counting.

Compression Ranges Defined

headroom ships with three compression profiles, every representing a distinct trade-off between token discount and semantic preservation.

Whenever you want the LLM to see code that also appears to be like like code, the mild stage is the fitting place to begin. It applies solely whitespace normalization and remark elimination, usually yielding 60-63% discount in examined initiatives. Debugging prompts and style-related queries work finest right here as a result of the compressed output preserves indentation and structural spacing that reasonable would strip.

The reasonable stage provides redundant syntax elimination and import consolidation, reaching roughly 75-77% discount. Mild preserves clean traces between capabilities; reasonable collapses them, eradicating visible separation however preserving each identifier and kind annotation intact. Most manufacturing pipelines operating code evaluation, refactoring strategies, or documentation era ought to default to this stage.

Aggressive compression pushes discount to 90-94% by layering identifier shortening and structural flattening on prime of every little thing else. Reserve this stage for large-codebase summarization, the place the LLM wants broad architectural consciousness relatively than line-by-line precision. In testing, GPT-4o missed a race situation in a concurrency handler underneath aggressive compression that it caught underneath reasonable. Run your personal high quality comparability: compress a file at each ranges, ship the identical immediate, and diff the LLM’s responses.

In testing, GPT-4o missed a race situation in a concurrency handler underneath aggressive compression that it caught underneath reasonable. Run your personal high quality comparability: compress a file at each ranges, ship the identical immediate, and diff the LLM’s responses.

Customized Guidelines and Overrides

Per-language overrides within the configuration file enable fine-grained management. The preserveComments array helps pragma-style markers: any remark containing // headroom:hold survives compression in any respect ranges. File exclusion patterns forestall headroom from touching check information, configuration information, or any paths that ought to stay uncompressed.

Integrating headroom Right into a Node.js/React Workflow

Stipulations for Programmatic Utilization

Earlier than operating the programmatic examples beneath, guarantee the next:

OPENAI_API_KEY is ready in your surroundings (e.g., export OPENAI_API_KEY=your_key)
headroom-cli and openai are put in in your challenge (npm set up headroom-cli openai)

Programmatic API Utilization

Past CLI utilization, headroom exposes a programmatic API for direct integration into Node.js scripts. The next instance makes use of CommonJS syntax; for ESM initiatives ("sort": "module" in package deal.json), use import { compress } from 'headroom-cli'; as a substitute.


const { compress } = require('headroom-cli');


const path = require('path');
const fs = require('fs/guarantees');
const { OpenAI } = require('openai');


const ALLOWED_ROOT = path.resolve('./src');
const shopper = new OpenAI(); 


async operate readSourceFile(filePath) {
  const resolved = path.resolve(filePath);
  if (!resolved.startsWith(ALLOWED_ROOT + path.sep) && resolved !== ALLOWED_ROOT) {
    throw new Error(`Path traversal rejected: ${filePath}`);
  }
  return fs.readFile(resolved, 'utf-8');
}


async operate callLLMReview(compressed, { mannequin = 'gpt-4o', timeoutMs = 30_000 } = {}) {
  if (!course of.env.OPENAI_API_KEY) {
    throw new Error('OPENAI_API_KEY surroundings variable will not be set');
  }

  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeoutMs);

  let response;
  strive {
    response = await shopper.chat.completions.create(
      {
        mannequin,
        messages: [
          { role: 'system', content: 'Review this React component for bugs and performance issues.' },
          { role: 'user', content: compressed },
        ],
      },
      { sign: controller.sign }
    );
  } lastly {
    clearTimeout(timer);
  }

  if (!response.decisions?.size) {
    throw new Error('OpenAI returned no decisions — doable content material filter or quota error');
  }

  const content material = response.decisions[0].message?.content material;
  if (content material == null) {
    throw new Error('OpenAI message content material is null — test for tool-call response sort');
  }

  return content material;
}


async operate reviewComponent(filePath, choices = {}) {
  const supply = await readSourceFile(filePath);

  const outcome = await compress(supply, {
    stage: 'reasonable',
    language: 'jsx',
  });

  
  const { compressed, originalTokens, compressedTokens } = outcome ?? {};
  if (!compressed) {
    throw new Error(`compress() returned sudden form for ${filePath}: ${JSON.stringify(outcome)}`);
  }

  console.log(`Compressed ${filePath}: ${originalTokens} → ${compressedTokens} tokens`);

  return callLLMReview(compressed, choices);
}

reviewComponent('src/elements/UserProfile.jsx')
  .then(console.log)
  .catch((err) => {
    console.error(err.message);
    course of.exit(1);
  });

Be aware: The programmatic API form (compress operate, its arguments, and return object) must be verified in opposition to the headroom documentation for the model you could have put in. Run node -e "const h=require('headroom-cli');console.log(Object.keys(h))" to substantiate accessible exports.

This script reads a React part, validates the file path in opposition to a challenge root to forestall path traversal, compresses it through headroom’s programmatic API, and sends the compressed output to GPT-4o for code evaluation with a timeout and response validation. The token financial savings translate on to decrease API prices on each invocation. Errors propagate with a non-zero exit code for CI compatibility.

npm Scripts Integration

Including headroom to package deal.json scripts integrates compression into present CI and pre-commit workflows:

{
  "scripts": {
    "llm:compress": "headroom compress "src/**/*.{ts,tsx}" --dry-run",
    "llm:evaluation": "set -euo pipefail; headroom compress "src/**/*.{ts,tsx}" --stdout | llm "Assessment this codebase for safety points"",
    "precommit:compress": "headroom compress "src/**/*.{ts,tsx}" --level reasonable --out-dir .llm-context/"
  }
}

Home windows observe: Glob patterns in npm scripts use escaped double quotes as proven above for cross-platform compatibility. In the event you encounter glob decision failures on Home windows CMD or PowerShell, think about using a cross-platform glob device or operating through Git Bash.

Shell observe: The set -euo pipefail prefix in llm:evaluation ensures the pipeline fails if headroom exits with a non-zero code, stopping llm from operating in opposition to empty or partial enter. This requires a POSIX-compatible shell (bash, zsh). For cross-platform use, exchange with a Node.js wrapper script that checks exit codes explicitly.

The precommit:compress script generates a compressed snapshot of the codebase right into a .llm-context/ listing that downstream instruments can reference with out recompressing on each API name. Add .llm-context/ to .gitignore to keep away from committing compressed snapshots.

Piping to LLM CLIs and Instruments

headroom’s stdin/stdout help permits direct piping to LLM command-line instruments:

headroom compress src/ --stdout | llm "Assessment this codebase for potential reminiscence leaks"

This sample works with stdin-consuming CLI instruments similar to aider and Simon Willison’s llm CLI (pip set up llm). For proceed.dev and Cursor, use --out-dir to provide file-based context, as these instruments devour context via their IDE extensions relatively than stdin pipes.

Benchmarks: Actual-World Token Financial savings

Check Methodology

I benchmarked three JavaScript/TypeScript initiatives: a Subsequent.js SaaS utility (~200 information), an Specific API server (~80 information), and a React part library (~120 information). These initiatives are consultant however not named or publicly linked; readers ought to benchmark their very own codebases for relevant outcomes. I measured token counts utilizing tiktoken with the cl100k_base encoding, which is suitable with GPT-4 and GPT-4o billing. Token counts for Gemini and Claude fashions will differ on account of their distinct tokenizers.

Outcomes Desk

Undertaking	Unique Tokens	Mild	Average	Aggressive	Value Saved at Aggressive Stage (GPT-4o enter, $2.50/1M tokens)
Subsequent.js SaaS	128,400	51,360 (60%)	32,100 (75%)	12,840 (90%)	$0.289 per request
Specific API	45,200	18,080 (60%)	10,848 (76%)	3,616 (92%)	$0.104 per request
React Library	89,600	33,600 (63%)	20,608 (77%)	5,376 (94%)	$0.211 per request

Be aware: The “Value Saved” column reveals financial savings on the aggressive compression stage solely. Average-level financial savings are roughly 60% of those figures. GPT-4o pricing must be verified at platform.openai.com/pricing as charges could change.

Aggressive mode approaches 94% discount for comment-heavy codebases the place JSDoc and inline documentation represent a big share of whole tokens. Nevertheless, aggressive compression can degrade LLM output high quality on duties requiring fine-grained reasoning. In testing, GPT-4o didn’t determine a race situation underneath aggressive compression that it caught underneath reasonable. To validate in your personal use circumstances, compress a consultant file at every stage, ship similar prompts, and diff the responses.

Greatest Practices and Pitfalls

When NOT to Compress

Token compression is counterproductive in a number of situations. In case your immediate depends on line numbers for debugging context, compression will break these references by stripping whitespace and clean traces. If the LLM should touch upon code type, formatting conventions, or readability, it wants the unique formatting intact. Information containing feedback with vital area context, similar to regulatory compliance notes or enterprise logic explanations, must be excluded through .headroomrc.json patterns or the headroom:hold pragma.

Balancing Compression vs. Comprehension

Begin with the mild stage and consider LLM output high quality as a baseline. The reasonable stage works because the default for many manufacturing pipelines. Reserve aggressive compression for large-context summarization, the place the LLM must ingest a complete codebase to reply architectural questions, maximizing financial savings the place precision on particular person traces issues least.

Safety Concerns

headroom processes all information regionally. No supply code is transmitted to exterior servers throughout compression. Confirm by auditing the supply on the challenge’s GitHub repository or monitoring community site visitors throughout a compression run with a device similar to mitmproxy. For groups with strict compliance necessities, the open-source codebase might be audited straight.

Implementation Guidelines

☐ Confirm headroom-cli on npm matches this device (npm view headroom-cli)
☐ Set up headroom-cli globally (npm set up -g headroom-cli)
☐ Run headroom --help to confirm set up and ensure accessible flags
☐ Check single-file compression with --dry-run (headroom compress src/App.jsx --dry-run)
☐ Create .headroomrc.json with project-specific settings
☐ Select compression stage (mild/reasonable/aggressive) based mostly on use case
☐ Add headroom to npm scripts for CI/pre-commit hooks
☐ Set OPENAI_API_KEY surroundings variable if utilizing programmatic API integration
☐ Confirm compress() export form matches anticipated return keys (node -e "const h=require('headroom-cli');console.log(Object.keys(h))")
☐ Combine programmatically into LLM API name pipeline
☐ Benchmark token financial savings in opposition to your precise API prices
☐ Monitor LLM output high quality at chosen compression stage
☐ Arrange a Datadog or Grafana dashboard monitoring token spend earlier than and after compression

Cease Paying for Tokens That Do not Matter

headroom delivers fast, measurable price discount with minimal setup. In examined JavaScript and TypeScript initiatives, token reductions ranged from 60-94% relying on compression stage, scaling from small part libraries to giant SaaS codebases. Set up headroom, run the dry-run benchmark on an present challenge, and measure the precise financial savings in opposition to present API spend. The headroom GitHub repository accommodates full documentation, extra language help particulars, and contribution pointers; discover the URL through npm view headroom-cli homepage or the package deal’s npm web page.

The tokens that do not contribute to LLM reasoning should not contribute to the invoice both.