Context length configuration ============================= Aurora Swarm supports both **config-based** and **dynamic** context length management when using :class:`~aurora_swarm.vllm_pool.VLLMPool`. Config-based ------------ Set default and aggregation token limits via environment variables or constructor parameters: **Environment variables:** - ``AURORA_SWARM_MAX_TOKENS`` — default max tokens (e.g. 512) - ``AURORA_SWARM_MAX_TOKENS_AGGREGATION`` — for aggregation/reduce steps (e.g. 2048) - ``AURORA_SWARM_MODEL_MAX_CONTEXT`` — model's max context length (optional; if unset, fetched from vLLM ``/v1/models``) **Constructor (priority over env):** .. code-block:: python from aurora_swarm import VLLMPool, AgentEndpoint pool = VLLMPool( endpoints=[AgentEndpoint("host1", 8000)], model="openai/gpt-oss-120b", max_tokens=1024, max_tokens_aggregation=2048, model_max_context=131072, # optional; skip API query buffer=512, ) **Per-request override:** pass ``max_tokens`` to :meth:`~aurora_swarm.vllm_pool.VLLMPool.post` for a single call. Dynamic sizing -------------- When ``max_tokens`` is not passed to :meth:`~aurora_swarm.vllm_pool.VLLMPool.post`, the pool computes a safe value: - Fetches the model's max context from vLLM ``/v1/models`` (cached) - Estimates prompt tokens (e.g. ``len(prompt) // 4``) - Uses ``min(configured_cap, model_max - prompt_est - buffer)`` so responses are not truncated Aggregation steps in patterns (e.g. :func:`~aurora_swarm.patterns.broadcast.broadcast_and_reduce` reduce step, pipeline stages 2+) use ``max_tokens_aggregation`` when the pool provides it. Full details, examples, and reasoning-model notes are in the repo: ``CONTEXT_LENGTH.md``.