Context length configuration¶
Aurora Swarm supports both config-based and dynamic context length management when using VLLMPool.
Config-based¶
Set default and aggregation token limits via environment variables or constructor parameters:
Environment variables:
AURORA_SWARM_MAX_TOKENS— default max tokens (e.g. 512)AURORA_SWARM_MAX_TOKENS_AGGREGATION— for aggregation/reduce steps (e.g. 2048)AURORA_SWARM_MODEL_MAX_CONTEXT— model’s max context length (optional; if unset, fetched from vLLM/v1/models)
Constructor (priority over env):
from aurora_swarm import VLLMPool, AgentEndpoint
pool = VLLMPool(
endpoints=[AgentEndpoint("host1", 8000)],
model="openai/gpt-oss-120b",
max_tokens=1024,
max_tokens_aggregation=2048,
model_max_context=131072, # optional; skip API query
buffer=512,
)
Per-request override: pass max_tokens to post() for a single call.
Dynamic sizing¶
When max_tokens is not passed to post(), the pool computes a safe value:
Fetches the model’s max context from vLLM
/v1/models(cached)Estimates prompt tokens (e.g.
len(prompt) // 4)Uses
min(configured_cap, model_max - prompt_est - buffer)so responses are not truncated
Aggregation steps in patterns (e.g. broadcast_and_reduce() reduce step, pipeline stages 2+) use max_tokens_aggregation when the pool provides it.
Full details, examples, and reasoning-model notes are in the repo: CONTEXT_LENGTH.md.