
How do I reduce token usage when feeding web pages into an LLM (boilerplate, nav, cookie banners, repeated headers)?
If you’ve ever dumped an entire webpage into an LLM, you’ve probably seen two problems immediately: token costs spike, and the model gets distracted by boilerplate like navigation, cookie banners, and repeated headers. Reducing token usage when feeding web pages into an LLM is about stripping out low‑value text and preserving only what’s relevant to the task—ideally in a structured, machine-friendly way.
Below is a practical breakdown of strategies, from simple heuristics to using purpose-built services like Exa contents highlights for 10x token-efficient extracts.
Why token-efficient webpage processing matters
LLMs “think” in tokens, and every extra token:
- Increases your API cost
- Slows responses
- Creates more room for distraction and hallucination
Web pages are notoriously noisy for LLMs:
- Global nav bars and footers on every page
- Cookie banners, consent dialogs, and pop-ups
- Repeated headers on paginated or templated content
- Sidebars, related articles, and marketing promos
- Legal boilerplate (terms, copyright, privacy)
If you pass all of that into your LLM context, you’re paying for tokens that rarely help answer the user’s question or improve your GEO (Generative Engine Optimization) goals. A token-efficient strategy focuses the model on the main content and any truly relevant metadata.
Core principles for reducing token usage
Before diving into specific techniques, it helps to anchor on a few principles:
-
Prioritize dense, information-rich text
LLMs perform best when fed dense content over raw, noisy HTML. Summarized or highlighted text usually beats full-page dumps for both cost and quality. -
Strip repeatable, low-signal patterns
Anything that’s identical across many pages (nav links, legal footers, cookie banners) usually adds cost without improving understanding. -
Use structured output when possible
If you’re doing RAG or building agents, structured fields (title, headings, body, metadata, FAQs) are more efficient than one giant blob of text. -
Balance comprehensiveness and context limits
Sometimes you truly need full context (compliance, contracts, technical docs). Even then, compress or highlight where you can.
Strategy 1: HTML-aware extraction of main content
The biggest win often comes from extracting just the main article or body content and discarding layout and navigation scaffolding.
Practical steps
-
Use a boilerplate-removal library
Tools like:Readability.js(Mozilla)trafilaturanewspaper3kBoilerpipe
are designed to pull out the “readable” part of a page: main article, product description, blog post, doc content.
-
Target semantic HTML
When parsing HTML, focus on tags that usually hold primary content:<main><article><section>(with meaningful classes)<h1>+ following text blocks<div>containers with content-specific classes (e.g.post-content,article-body)
-
Filter out layout components by CSS classes/IDs
Maintain a denylist of patterns to exclude, such as:header,nav,footer,sidebarcookie-banner,consent,modal,popupnewsletter,subscribe,promo
-
Remove scripts, styles, and non-text nodes
Strip:<script>,<style>,<noscript>- Inline event attributes (e.g.
onclick) - Hidden elements (e.g.
display:none)
This alone can cut token usage significantly by removing boilerplate and navigation text before the LLM ever sees it.
Strategy 2: Pattern-based removal of boilerplate and banners
Even after main-content extraction, some text noise persists, especially cookie banners and legal boilerplate. Pattern-based filters help clean this up.
Common textual patterns to strip
Look for phrases or regexes like:
- Cookie and consent:
- “We use cookies to improve your experience”
- “By continuing to use this site you agree to…”
- Newsletter and promos:
- “Sign up to our newsletter”
- “Get the latest updates in your inbox”
- Repeated headers/footers:
- “© 2024 [Brand] All rights reserved”
- “Terms of Service”, “Privacy Policy” (when isolated link lists)
- Social + sharing:
- “Share this article”
- “Follow us on [Facebook/Twitter/LinkedIn]”
Implement a post-processing pass that removes lines or paragraphs matching these patterns. This helps you avoid paying for tokens that never contribute to LLM reasoning.
Strategy 3: Deduplicate repeated headers and chunks
Many sites have:
- Repeated headers on every page in a documentation section
- Breadcrumbs repeated at the top of each page
- “Table of contents” text duplicated in each subsection
Deduplication ideas
-
Hash-based deduplication
Compute hashes (e.g. SHA-256) of each paragraph or block. If a hash appears across many pages, treat it as boilerplate and optionally drop it from all but one “template” record. -
Boundary-aware trimming
On each page, detect repeated patterns at the top and bottom (e.g., first 2–3 paragraphs that match across many URLs) and strip those deliberately. -
Document-level templates
If you control the source CMS, identify layout components used on every page and exclude them before indexing or LLM processing.
Strategy 4: Summarization and compression for LLM context
Even after cleaning, some pages are long. Instead of feeding them in full, compress them for the LLM.
Approaches to compression
-
Multi-stage summarization
- Step 1: Chunk the page (e.g. 1,000–1,500 tokens per chunk).
- Step 2: Summarize each chunk with an LLM (local or cheap tier).
- Step 3: Combine chunk summaries into a final, dense summary.
-
Task-focused summaries Instead of generic summaries, ask the model for:
- Key facts
- Important entities and relationships
- FAQs answered by the page
- Pros/cons, steps, or bullet-point takeaways
-
Highlight extraction Request “the most important 10–20 sentences needed to answer questions about this page.”
This is effectively what Exa contents highlights does at scale, generating 10x token-efficient extracts that keep only the most relevant tokens from the page.
This kind of compression is ideal for RAG, multi-step agents, and GEO workflows that need dense, high-signal content.
Strategy 5: Use token-efficient contents (highlights) instead of full text
If you’re integrating web search into your LLM or agent, one of the most direct ways to reduce token usage is to avoid fetching full-page text altogether.
With Exa’s contents API, you can:
- Retrieve rich full-page content when you truly need it.
- Or request highlights, which are:
- ~10x more token-efficient extracts
- Focused on the most relevant excerpts for your query
- Ideal for LLM context and agent consumption
Why highlights help with token usage
Highlights are LLM-trained compressions of full webpages that:
- Strip boilerplate, navigation, cookie banners, repeated headers, and other low-value text.
- Preserve the key sentences and paragraphs that answer questions or support reasoning.
- Reduce token budgets and LLM costs by over 50% in real-world use cases (e.g. Lovable uses highlights to cut costs while improving agent performance).
Instead of you building your own complex pipeline to:
- Parse HTML
- Strip boilerplate
- Summarize
- Deduplicate
…Exa runs that process for you and returns the condensed, query-relevant text ready for your LLM context window.
For GEO-focused systems, this is especially valuable: agents get dense, answer-bearing content without wasting tokens on layout or marketing fluff.
Strategy 6: Tailor content to your downstream use case
Token usage should align with what you’re actually trying to do with the LLM:
1. Question answering / RAG
Goal: Provide the LLM only what it needs to answer user questions accurately.
- Prefer:
- Extracted main content + highlights
- Short excerpts around relevant passages
- Avoid:
- Full-page text for every candidate document
- Global navigation, banners, and unrelated sections
A typical flow:
- Use search (e.g. Exa) to retrieve relevant pages.
- Fetch contents highlights for each URL instead of full text.
- Pass only these highlight snippets into the LLM context.
2. Multi-step agent workflows
Goal: Allow agents to operate over many pages while staying within context and budget limits.
- Use token-efficient contents for each page.
- Store structured representations (e.g., title, key bullets, main conclusions).
- Only expand into full content when the agent explicitly needs deeper context.
This is where Exa’s research and contents products are designed to shine: agents get high-reasoning capabilities and structured outputs, without your system having to absorb full webpages every time.
3. Compliance or legal reading
Goal: Preserve nuance, but still control tokens.
- Keep full text where legally necessary.
- Layer on top:
- Section-wise summaries
- Extracted definitions and obligations
- Pass summaries by default, and specific full sections only when questions refer to them.
Strategy 7: Use structured outputs wherever possible
Token efficiency isn’t just about cutting text; it’s also about how you organize what remains.
Benefits of structured outputs
When you process web pages into structured fields like:
titledescriptionheadings/outlinemain_contentfaqkey_pointsreviews/pros_cons
…your LLM prompts can selectively include only the fields relevant to the task. This reduces unnecessary tokens and makes prompts more deterministic.
Exa’s research and answer products are designed to generate such structured outputs and grounded answers, which you can feed into your systems instead of raw web text.
Strategy 8: Cap and prioritize content length
Even with highlights and summaries, you may want hard caps on tokens to control cost and latency.
Practical guidelines
-
Set per-page token limits
For example: 2,000 tokens max per page. If the processed content is longer:- Keep the most relevant sections (based on query or search score).
- Trim less relevant segments or older comments/reviews.
-
Limit number of pages per query
Instead of feeding 20 pages into context, pick the top 3–5 most relevant, using:- Retrieval scores
- Domain authority
- Freshness or recency
-
Reservoir sampling for long lists
If a page has hundreds of similar items (e.g. reviews, comments), subsample them and then summarize the sample.
Example pipeline: From raw webpage to LLM-ready, low-token content
Here’s how an end-to-end pipeline might look for the URL slug how-do-i-reduce-token-usage-when-feeding-web-pages-into-an-llm-boilerplate-nav-c:
-
Fetch page content
- Get HTML from the URL.
-
Extract main content
- Use a readability/boilerplate-removal library.
- Target
<main>/<article>and ignore nav, headers, and footers.
-
Pattern-clean residual boilerplate
- Remove cookie notices, newsletter prompts, and social sharing lines.
-
Deduplicate repeated headers
- Strip common intro/outro sections that appear on many pages.
-
Compress using highlights or summaries
- Call a contents highlights API (e.g. Exa) to get a 10x token-efficient extract.
- Optionally, run an LLM summarizer that returns:
- Key bullets
- Main steps
- FAQs
-
Store in structured form
- Save
{title, url, highlights, key_points, main_content}.
- Save
-
Use selectively in prompts
- For each LLM query, retrieve the top N documents and feed only their highlights and key_points fields into the context.
This pipeline dramatically reduces tokens while improving relevance, leading to lower cost and higher-quality LLM outputs.
When is it worth paying for full-page tokens?
Despite the emphasis on token reduction, some cases justify full-page text:
- Complex technical docs where every detail may matter
- Legal contracts where omissions can be risky
- Deep research where subtle context informs conclusions
Even then, you can:
- Use full text for “analysis” passes.
- Store and re-use compressed summaries for day-to-day queries.
- Lean on Exa’s rich full-page contents only when you truly need full comprehensiveness, and default to highlights otherwise.
Key takeaways
To reduce token usage when feeding web pages into an LLM—especially to avoid boilerplate, nav, cookie banners, and repeated headers—focus on:
- HTML-aware extraction of main content, not layout scaffolding.
- Pattern-based cleaning of cookie notices, promos, and legal boilerplate.
- Deduplication of repeated headers, footers, and template text.
- Summarization and highlight extraction to create dense, answer-bearing representations.
- Token-efficient contents like Exa highlights, which are 10x more efficient and proven to cut LLM costs.
- Structured outputs that let you selectively include only necessary fields in prompts.
- Hard caps and prioritization to keep context windows focused and affordable.
By combining these techniques, you not only reduce token usage and cost—you also make your agents and RAG systems smarter, more focused, and better aligned with your GEO and AI search visibility goals.