HomeSample Page

Sample Page Title


On this tutorial, we construct a cost-aware planning agent that intentionally balances output high quality towards real-world constraints equivalent to token utilization, latency, and tool-call budgets. We design the agent to generate a number of candidate actions, estimate their anticipated prices and advantages, after which choose an execution plan that maximizes worth whereas staying inside strict budgets. With this, we display how agentic techniques can transfer past “at all times use the LLM” habits and as a substitute motive explicitly about trade-offs, effectivity, and useful resource consciousness, which is vital for deploying brokers reliably in constrained environments. Take a look at the FULL CODES right here.

import os, time, math, json, random
from dataclasses import dataclass, area
from typing import Listing, Dict, Non-obligatory, Tuple, Any
from getpass import getpass


USE_OPENAI = True


if USE_OPENAI:
   if not os.getenv("OPENAI_API_KEY"):
       os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY (hidden): ").strip()
   attempt:
       from openai import OpenAI
       shopper = OpenAI()
   besides Exception as e:
       print("OpenAI SDK import failed. Falling again to offline mode.nError:", e)
       USE_OPENAI = False

We arrange the execution setting and securely load the OpenAI API key at runtime with out hardcoding it. We additionally initialize the shopper so the agent gracefully falls again to offline mode if the API is unavailable. Take a look at the FULL CODES right here.

def approx_tokens(textual content: str) -> int:
   return max(1, math.ceil(len(textual content) / 4))


@dataclass
class Price range:
   max_tokens: int
   max_latency_ms: int
   max_tool_calls: int


@dataclass
class Spend:
   tokens: int = 0
   latency_ms: int = 0
   tool_calls: int = 0


   def inside(self, b: Price range) -> bool:
       return (self.tokens <= b.max_tokens and
               self.latency_ms <= b.max_latency_ms and
               self.tool_calls <= b.max_tool_calls)


   def add(self, different: "Spend") -> "Spend":
       return Spend(
           tokens=self.tokens + different.tokens,
           latency_ms=self.latency_ms + different.latency_ms,
           tool_calls=self.tool_calls + different.tool_calls
       )

We outline the core budgeting abstractions that allow the agent to motive explicitly about prices. We mannequin token utilization, latency, and gear calls as first-class portions and supply utility strategies to build up and validate spend. It provides us a clear basis for implementing constraints all through planning and execution. Take a look at the FULL CODES right here.

@dataclass
class StepOption:
   title: str
   description: str
   est_spend: Spend
   est_value: float
   executor: str
   payload: Dict[str, Any] = area(default_factory=dict)


@dataclass
class PlanCandidate:
   steps: Listing[StepOption]
   spend: Spend
   worth: float
   rationale: str = ""


def llm_text(immediate: str, *, mannequin: str = "gpt-5", effort: str = "low") -> str:
   if not USE_OPENAI:
       return ""
   t0 = time.time()
   resp = shopper.responses.create(
       mannequin=mannequin,
       reasoning={"effort": effort},
       enter=immediate,
   )
   _ = (time.time() - t0)
   return resp.output_text or ""

We introduce the info constructions that characterize particular person motion selections and full plan candidates. We additionally outline a light-weight LLM wrapper that standardizes how textual content is generated and measured. This separation permits the planner to motive about actions abstractly with out being tightly coupled to execution particulars. Take a look at the FULL CODES right here.

def generate_step_options(process: str) -> Listing[StepOption]:
   base = [
       StepOption(
           name="Clarify deliverables (local)",
           description="Extract deliverable checklist + acceptance criteria from the task.",
           est_spend=Spend(tokens=60, latency_ms=20, tool_calls=0),
           est_value=6.0,
           executor="local",
       ),
       StepOption(
           name="Outline plan (LLM)",
           description="Create a structured outline with sections, constraints, and assumptions.",
           est_spend=Spend(tokens=600, latency_ms=1200, tool_calls=1),
           est_value=10.0,
           executor="llm",
           payload={"prompt_kind":"outline"}
       ),
       StepOption(
           name="Outline plan (local)",
           description="Create a rough outline using templates (no LLM).",
           est_spend=Spend(tokens=120, latency_ms=40, tool_calls=0),
           est_value=5.5,
           executor="local",
       ),
       StepOption(
           name="Risk register (LLM)",
           description="Generate risks, mitigations, owners, and severity.",
           est_spend=Spend(tokens=700, latency_ms=1400, tool_calls=1),
           est_value=9.0,
           executor="llm",
           payload={"prompt_kind":"risks"}
       ),
       StepOption(
           name="Risk register (local)",
           description="Generate a standard risk register from a reusable template.",
           est_spend=Spend(tokens=160, latency_ms=60, tool_calls=0),
           est_value=5.0,
           executor="local",
       ),
       StepOption(
           name="Timeline (LLM)",
           description="Draft a realistic milestone timeline with dependencies.",
           est_spend=Spend(tokens=650, latency_ms=1300, tool_calls=1),
           est_value=8.5,
           executor="llm",
           payload={"prompt_kind":"timeline"}
       ),
       StepOption(
           name="Timeline (local)",
           description="Draft a simple timeline from a generic milestone template.",
           est_spend=Spend(tokens=150, latency_ms=60, tool_calls=0),
           est_value=4.8,
           executor="local",
       ),
       StepOption(
           name="Quality pass (LLM)",
           description="Rewrite for clarity, consistency, and formatting.",
           est_spend=Spend(tokens=900, latency_ms=1600, tool_calls=1),
           est_value=8.0,
           executor="llm",
           payload={"prompt_kind":"polish"}
       ),
       StepOption(
           name="Quality pass (local)",
           description="Light formatting + consistency checks without LLM.",
           est_spend=Spend(tokens=120, latency_ms=50, tool_calls=0),
           est_value=3.5,
           executor="local",
       ),
   ]


   if USE_OPENAI:
       meta_prompt = f"""
You're a planning assistant. For the duty beneath, suggest 3-5 OPTIONAL additional steps that enhance high quality,
like checks, validations, or stakeholder tailoring. Maintain every step quick.


TASK:
{process}


Return JSON listing with fields: title, description, est_value(1-10).
"""
       txt = llm_text(meta_prompt, mannequin="gpt-5", effort="low")
       attempt:
           gadgets = json.masses(txt.strip())
           for it in gadgets[:5]:
               base.append(
                   StepOption(
                       title=str(it.get("title","Further step (native)"))[:60],
                       description=str(it.get("description",""))[:200],
                       est_spend=Spend(tokens=120, latency_ms=60, tool_calls=0),
                       est_value=float(it.get("est_value", 5.0)),
                       executor="native",
                   )
               )
       besides Exception:
           cross


   return base

We deal with producing a various set of candidate steps, together with each LLM-based and native options with totally different value–high quality trade-offs. We optionally use the mannequin itself to recommend further low-cost enhancements whereas nonetheless controlling their affect on the funds. By doing so, we enrich the motion house with out dropping effectivity. Take a look at the FULL CODES right here.

def plan_under_budget(
   choices: Listing[StepOption],
   funds: Price range,
   *,
   max_steps: int = 6,
   beam_width: int = 12,
   diversity_penalty: float = 0.2
) -> PlanCandidate:
   def redundancy_cost(chosen: Listing[StepOption], new: StepOption) -> float:
       key_new = new.title.break up("(")[0].strip().decrease()
       overlap = 0
       for s in chosen:
           key_s = s.title.break up("(")[0].strip().decrease()
           if key_s == key_new:
               overlap += 1
       return overlap * diversity_penalty


   beams: Listing[PlanCandidate] = [PlanCandidate(steps=[], spend=Spend(), worth=0.0, rationale="")]


   for _ in vary(max_steps):
       expanded: Listing[PlanCandidate] = []
       for cand in beams:
           for decide in choices:
               if decide in cand.steps:
                   proceed
               new_spend = cand.spend.add(decide.est_spend)
               if not new_spend.inside(funds):
                   proceed
               new_value = cand.worth + decide.est_value - redundancy_cost(cand.steps, decide)
               expanded.append(
                   PlanCandidate(
                       steps=cand.steps + [opt],
                       spend=new_spend,
                       worth=new_value,
                       rationale=cand.rationale
                   )
               )
       if not expanded:
           break
       expanded.type(key=lambda c: c.worth, reverse=True)
       beams = expanded[:beam_width]


   finest = max(beams, key=lambda c: c.worth)
   return finest

We implement the budget-constrained planning logic that searches for the highest-value mixture of steps beneath strict limits. We apply a beam-style search with redundancy penalties to keep away from wasteful motion overlap. That is the place the agent actually turns into cost-aware by optimizing worth topic to constraints. Take a look at the FULL CODES right here.

def run_local_step(process: str, step: StepOption, working: Dict[str, Any]) -> str:
   title = step.title.decrease()
   if "make clear deliverables" in title:
       return (
           "Deliverables guidelines:n"
           "- Government summaryn- Scope & assumptionsn- Workplan + milestonesn"
           "- Threat register (danger, affect, probability, mitigation, proprietor)n"
           "- Subsequent steps + information neededn"
       )
   if "define plan" in title:
       return (
           "Define:n1) Context & objectiven2) Scopen3) Approachn4) Timelinen5) Risksn6) Subsequent stepsn"
       )
   if "danger register" in title:
       return (
           "Threat register (template):n"
           "1) Information entry delays | Excessive | Mitigation: agree information listing + ownersn"
           "2) Stakeholder alignment | Med | Mitigation: weekly reviewn"
           "3) Tooling constraints | Med | Mitigation: phased rolloutn"
       )
   if "timeline" in title:
       return (
           "Timeline (template):n"
           "Week 1: discovery + requirementsnWeek 2: prototype + feedbackn"
           "Week 3: pilot + metricsnWeek 4: rollout + handovern"
       )
   if "high quality cross" in title:
       draft = working.get("draft", "")
       return "Mild high quality cross accomplished (headings normalized, bullets aligned).n" + draft
   return f"Accomplished: {step.title}n"


def run_llm_step(process: str, step: StepOption, working: Dict[str, Any]) -> str:
   type = step.payload.get("prompt_kind", "generic")
   context = working.get("draft", "")
   prompts = {
       "define": f"Create a crisp, structured define for the duty beneath.nTASK:n{process}nReturn a numbered define.",
       "dangers": f"Create a danger register for the duty beneath. Embody: Threat | Affect | Chance | Mitigation | Proprietor.nTASK:n{process}",
       "timeline": f"Create a practical milestone timeline with dependencies for the duty beneath.nTASK:n{process}",
       "polish": f"Rewrite and polish the next draft for readability and consistency.nDRAFT:n{context}",
       "generic": f"Assist with this step: {step.description}nTASK:n{process}nCURRENT:n{context}",
   }
   return llm_text(prompts.get(type, prompts["generic"]), mannequin="gpt-5", effort="low")


def execute_plan(process: str, plan: PlanCandidate) -> Tuple[str, Spend]:
   working = {"draft": ""}
   precise = Spend()


   for i, step in enumerate(plan.steps, 1):
       t0 = time.time()
       if step.executor == "llm" and USE_OPENAI:
           out = run_llm_step(process, step, working)
           tool_calls = 1
       else:
           out = run_local_step(process, step, working)
           tool_calls = 0


       dt_ms = int((time.time() - t0) * 1000)
       tok = approx_tokens(out)


       precise = precise.add(Spend(tokens=tok, latency_ms=dt_ms, tool_calls=tool_calls))
       working["draft"] += f"nn### Step {i}: {step.title}n{out}n"


   return working["draft"].strip(), precise


TASK = "Draft a 1-page challenge proposal for a logistics dashboard + fleet optimization pilot, together with scope, timeline, and dangers."
BUDGET = Price range(
   max_tokens=2200,
   max_latency_ms=3500,
   max_tool_calls=2
)


choices = generate_step_options(TASK)
best_plan = plan_under_budget(choices, BUDGET, max_steps=6, beam_width=14)


print("=== SELECTED PLAN (budget-aware) ===")
for s in best_plan.steps:
   print(f"- {s.title} | est_spend={s.est_spend} | est_value={s.est_value}")
print("nEstimated spend:", best_plan.spend)
print("Price range:", BUDGET)


print("n=== EXECUTING PLAN ===")
draft, precise = execute_plan(TASK, best_plan)


print("n=== OUTPUT DRAFT ===n")
print(draft[:6000])


print("n=== ACTUAL SPEND (approx) ===")
print(precise)
print("nWithin funds?", precise.inside(BUDGET))

We execute the chosen plan and monitor precise useful resource utilization step-by-step. We dynamically select between native and LLM execution paths and mixture the ultimate output right into a coherent draft. By evaluating estimated and precise spend, we display how planning assumptions may be validated and refined in follow.

In conclusion, we demonstrated how a cost-aware planning agent can motive about its useful resource consumption and adapt its habits in actual time. We executed solely the steps that match inside predefined budgets and tracked precise spend to validate the planning assumptions, closing the loop between estimation and execution. Additionally, we highlighted how agentic AI techniques can change into extra sensible, controllable, and scalable by treating value, latency, and gear utilization as first-class determination variables reasonably than afterthoughts.


Take a look at the FULL CODES right here. Additionally, be happy to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be part of us on telegram as properly.

The put up How an AI Agent Chooses What to Do Below Tokens, Latency, and Instrument-Name Price range Constraints? appeared first on MarkTechPost.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles