sonbahis girişsonbahissonbahis güncelgameofbetvdcasinomatbetgrandpashabetgrandpashabetエクスネスMeritbetmeritbet girişMeritbetVaycasinoBetasusBetkolikMeritbetmeritbetMeritbet girişMeritbetgiftcardmall/mygiftfradteosbetteosbet girişholiganbetholiganbet girişimajbetimajbet girişjasminbetjasminbet girişlimanbetlimanbet girişinterbahisinterbahis girişkingroyalkingroyal girişteosbetteosbet girişholiganbetholiganbet girişimajbetimajbet girişjasminbetjasminbet girişlimanbetlimanbet girişinterbahisinterbahis girişkingroyalkingroyal girişteosbetteosbet girişholiganbetholiganbet girişimajbetimajbet girişjasminbetjasminbet girişlimanbetlimanbet girişinterbahisinterbahis girişkingroyalkingroyal girişbahis siteleribahis siteleri girişcasino sitelericasino siteleri girişholiganbetholiganbet girişbetciobetcio girişimajbetimajbet girişinterbahisinterbahis girişbahiscasinobahiscasino girişbahis siteleribahis sitelericasino sitelericasino siteleri girişbetciobetcio girişholiganbetholiganbet girişimajbetimajbet girişinterbahisinterbahis girişbahiscasinobahiscasino girişbahis siteleribahis siteleri girişcasino sitelericasino siteleri girişalobetalobet girişbetasus girişbetasusenbetenbet girişbetplaybetplay girişorisbetorisbetceltabetceltabet girişgalabetgalabetqueenbetqueenbet girişpumabetpumabet girişpolobetpolobet girişbetpuanbetpuan girişbetpuanbetpuan girişbetpuanbetpuan girişbetpuanbetpuanalobetbetasusenbetbetplaygalabetalobetalobet girişbahiscasinobahiscasino girişteosbetteosbet girişromabetromabet girişkulisbetkulisbet giriştambettambet girişvipslotvipslot girişbetzulabetzula girişenjoybetenjoybet girişalobetalobet girişbetasusbetasus girişenbetenbet girişbetplaybetplay girişorisbetorisbet girişceltabetceltabet girişgalabetgalabet girişqueenbet girişqueenbetpumabetpumabet girişpolobetpolobet girişalobetalobet girişbetasusbetasus girişenbetenbet girişbetplaybetplay girişorisbetorisbet girişceltabetceltabet girişgalabetgalabet girişqueenbetqueenbet girişpumabetpumabet girişpolobetpolobet girişalobetalobet girişbetasusbetasus girişsonbahissonbahis girişromabetromabet girişroyalbetroyalbet girişceltabetceltabet girişeditörbeteditörbet girişqueenbet girişqueenbetbetzulabetzula girişteosbetteosbet girişsweet bonanzasweet bonanza oyunu oynasweet bonanzasweet bonanza oyunu oynasweet bonanza oynasweet bonanza oynasweet bonanzasweet bonanzasweet bonanzasweet bonanza oynasweet bonanzasweet bonanza oynasweet bonanzasweet bonanza oynasweet bonanzasweet bonanza oynaultrabeteditörbetenjoybetromabetteosbettambetroyalbetsonbahisvipslotmedusabahisromabetromabet girişalobetalobet girişteosbetteosbet girişbetasusbetasus girişsonbahis girişsonbahisroyalbetroyalbet girişceltabetceltabet girişeditörbeteditörbet girişqueenbetqueenbet girişbetzulabetzula girişromabetromabet girişromabetromabet girişroketbetroketbet girişroketbetroketbet girişbetnanobetnano girişbetnanobetnano girişpashagamingpashagaming girişpashagamingpashagaming girişgrandbettinggrandbetting girişgrandbettinggrandbetting girişbetlikebetlike girişbetlikebetlike girişbetciobetcio girişbetciobetcio girişbarbibetbarbibet girişbarbibetbarbibet girişdeneme bonusu veren sitelerdeneme bonusu veren sitelerdeneme bonusu veren sitelerdeneme bonusu veren sitelerdeneme bonusu veren sitelerdeneme bonusu veren sitelerpumabetpumabet girişpumabetpumabet girişcasibomcasibomcasibomcasibomcasibomcasibomjojobetjojobetjojobetjojobetjojobetjojobetcasibom girişcasibom girişcasibom girişcasibom girişjojobet girişjojobet girişjojobet girişjojobet girişkalebetkalebetbetnisbetnisbetkolikbetkolikjokerbetjokerbethiltonbethiltonbetkulisbetkulisbetmasterbettingmasterbetttingbetparibubetparibubetgarbetgarbahiscasinobahicasinoromabetromabetalobetalobetroketbetroketbetceltabetceltabet girişroyalbetroyalbet girişbetasusbetasus girişromabetromabet girişqueenbetqueenbet girişbetzulabetzula girişeditörbeteditörbet girişsonbahissonbahis girişteosbetteosbet girişalobetalobet girişromabetromabet girişromabetromabet girişpokerklaspokerklas girişpokerklaspokerklas girişbetciobetcio girişbetciobetcio girişroketbetroketbet girişroketbetroketbet girişcasibomcasibom girişcasibomcasibom girişcasinodiorcasinodior girişcasinodiorcasinodior giriştimebettimebet giriştimebettimebet girişjojobetjojobet girişjojobetjojobet girişpokerklaspokerklas girişpokerklaspokerklas giriş

Implementing Prompt Compression to Reduce Agentic Loop Costs


In this article, you will learn what prompt compression is, why it matters for agentic AI loops, and how to implement it practically using summarization and instruction distillation.

Topics we will cover include:

  • Why agentic loops accumulate token costs quadratically, and how prompt compression addresses this.
  • A review of the main prompt compression strategies, including instruction distillation, recursive summarization, vector database retrieval, and LLMLingua.
  • A working Python example that combines recursive summarization and instruction distillation to achieve meaningful token savings.

Introduction

Agentic loops in production can be synonymous with high costs, especially when it comes to both LLM and external application usage via APIs, where billing is often closely related to token usage.

The good news: prompt compression is one of the most effective strategies you can implement to navigate the high costs of agentic loops. This article introduces and discusses how a number of prompt compression techniques can help alleviate financial issues when using agentic loops.

Prompt Compression: Motivation and Common Strategies

Numerous agentic frameworks, such as LangGraph and AutoGPT, enforce that the agent keeps a context of what it has done in previous steps. Suppose your agent needs to take 10 to 20 steps to solve a problem. To conduct step 1, it sends 500 tokens. For step 2, it must send those prior 500 tokens plus new information inherent to this step — say about 1,000 tokens in total. This may grow to about 1,500 tokens in step 3, and so on. By the time we reach the 20th step, we have been “paying” for sending largely the same information over and over.

In the example above, it may seem like the number of tokens sent per step (full prompt size) grows linearly. In fact, however, the cumulative costs of the entire agent loop become quadratic, not linear, leading to a cost explosion for long-lasting loops. This is where prompt compression techniques come to help, with strategies like selective context, summarization, and others, as we will discuss shortly.

Example cost curve of agentic loops without vs. with prompt compression

Example cost curve of agentic loops without vs. with prompt compression

The issue is not just financial: there is another hidden cost related to latency, as longer prompts take longer to process, and not all users are willing to wait 30 seconds per interaction. Compressed prompts also enable faster inference and reduce compute overhead.

To put this in perspective, a 500K token context could theoretically be reduced to a 32K token compressed window that retains all relevant information, while elements like repetitive JSON structures, stop words, and low-value conversational parts are removed. Here are some cost-effective solutions and frameworks that can be considered for implementing your own prompt compression strategy:

  • Instruction distillation: this consists of creating a “compressed” version of a long system prompt that may be sent repeatedly, containing symbols or shorthand that the model will understand and interpret.
  • Recursive summarization: every few steps in a loop, use the agent or a smaller, cheaper model like Llama 3 or GPT-4o-mini to summarize the previous steps’ context into a more succinct paragraph outlining the current state of the task.
  • Vector database (RAG) for history retrieval: this replaces sending the full history repeatedly by storing it in a free, local vector database like FAISS or Chroma. For any given prompt, only the most relevant actions are retrieved as part of its context.
  • LLMLingua: an open-source framework that is gaining popularity, focused on detecting and eliminating “non-critical” tokens in a prompt before it is sent to a larger, more expensive language model.

A Practical Example: Summarizing Agent

Below is an example of a cost-friendly prompt compression strategy that combines recursive summarization and instruction distillation using Python. The code is intended to serve as a template of what such prompt compression logic should look like when translated into a real, large-scale scenario. It shows a simplified simulation of an agentic loop, emphasizing the summarization and distillation steps:

This code shows how to periodically replace the cumulative list of actions with a summary that spans a single string, helping avoid the added costs of paying for the same context tokens in every loop iteration. Try using a small, cheap model or a local one like Llama 3 to perform the summarization step.

Regarding distillation, this example illustrates what it actually does:

A standard 42-token prompt that reads “You are a helpful research assistant. Your goal is to find information about X. Please provide your output in a valid JSON format and do not include any conversational filler.” can be distilled into this 12-token prompt: “Act: ResearchBot. Task: Find X. Output: JSON. No fluff.” The model will understand it in a nearly identical fashion. Imagine a 100-step loop: this 30-token difference alone can save about 3,000 tokens just on the system prompt.

Output:

Wrapping Up

Prompt compression is not a minor optimization; it is a practical necessity for any agentic system that runs more than a handful of steps. The strategies covered here, from instruction distillation and recursive summarization to RAG-based history retrieval and LLMLingua, each address the quadratic cost problem from a different angle, and they can be combined for even greater savings. As a starting point, recursive summarization paired with a distilled system prompt requires no additional infrastructure and can already cut token usage dramatically, as the example above demonstrates.



Source link

WordPress Directory Divinity – Church& Nonprofit wordpress theme Diwine – Winery & Wine Shop, Vineyard WordPress Theme Dixon & Lamber | Law Firm WordPress Theme Dizen – Digital Agency Elementor Template Kit Dizy – Creative Portfolio Theme dizzy – Support Creators Content Script DJ Rainflow | Music Band & Musician WordPress Theme Djaen – Restaurant Elementor Template Kit Djompo Kit – Senior Care Elementor Template Kit D’leh – Creative Multi-Purpose WordPress Theme