API Reference¶

Core¶

Package (aurora_swarm)¶

Aurora Swarm — communication patterns for large-scale LLM agent orchestration.

class aurora_swarm.AgentEndpoint(host, port, tags=<factory>)[source]¶

Bases: object

A single agent’s network address plus optional metadata tags.

host: str¶

port: int¶

tags: dict[str, str]¶

property url: str¶

class aurora_swarm.AgentPool(endpoints, concurrency=512, connector_limit=1024, timeout=120.0)[source]¶

Bases: object

Async pool of agent HTTP endpoints with concurrency control.

Parameters:

endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Agent endpoints — either AgentEndpoint objects or plain (host, port) tuples (tags will be empty).
concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).
connector_limit (int) – Maximum number of TCP connections in the aiohttp pool.
timeout (float) – Per-request timeout in seconds.

async close()[source]¶

Return type:: None

property size: int¶: Number of agents in the pool.

property timeout: float¶: Base per-request timeout in seconds.

property endpoints: list[AgentEndpoint]¶

by_tag(key, value)[source]¶

Return a sub-pool of agents whose tag key equals value.

Return type:: AgentPool

sample(n)[source]¶

Return a sub-pool of n randomly chosen agents.

Return type:: AgentPool

select(indices)[source]¶

Return a sub-pool with agents at the given indices.

Return type:: AgentPool

slice(start, stop)[source]¶

Return a sub-pool from index start to stop.

Return type:: AgentPool

async post(agent_index, prompt, max_tokens=None)[source]¶

Send prompt to the agent at agent_index and return its response.

The call is throttled by the pool-wide semaphore so that at most concurrency requests are in flight at once.

Parameters:

agent_index (int) – Index of the agent to send the prompt to.
prompt (str) – The prompt text.
max_tokens (int | None) – Optional maximum tokens to generate. Ignored by base AgentPool (only used by VLLMPool and subclasses).

Return type:

Response

async send_all(prompts)[source]¶

Send prompts[i] to agent[i % size] concurrently.

Returns responses in input order (i.e. results[i] corresponds to prompts[i]).

Return type:: list[Response]

async send_all_batched(prompts, max_tokens=None)[source]¶

Send prompts with batching if supported, otherwise use send_all.

Default implementation for base AgentPool — just delegates to send_all. VLLMPool overrides this to use batch API for efficiency.

Parameters:

prompts (list[str]) – List of prompts to send.
max_tokens (int | None) – Optional max tokens override (ignored in base implementation).

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async broadcast_prompt(prompt)[source]¶

Send the same prompt to every agent in the pool.

Return type:: list[Response]

class aurora_swarm.EmbeddingPool(endpoints, model, concurrency=512, timeout=60.0)[source]¶

Bases: object

Async pool of embedding endpoints (OpenAI-compatible /v1/embeddings).

Parameters:

endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Embedding endpoints — either AgentEndpoint objects or (host, port) tuples (tags will be empty).
model (str) – Embedding model id (e.g. sentence-transformers/all-MiniLM-L6-v2).
concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).
timeout (float) – Per-request timeout in seconds.

property size: int¶: Number of endpoints in the pool.

property timeout: float¶: Per-request timeout in seconds.

property endpoints: list[AgentEndpoint]¶

by_tag(key, value)[source]¶

Return a sub-pool of endpoints whose tag key equals value.

Return type:: EmbeddingPool

sample(n)[source]¶

Return a sub-pool of n randomly chosen endpoints.

Return type:: EmbeddingPool

select(indices)[source]¶

Return a sub-pool with endpoints at the given indices.

Return type:: EmbeddingPool

slice(start, stop)[source]¶

Return a sub-pool from index start to stop.

Return type:: EmbeddingPool

async embed_one(agent_index, text)[source]¶

Request embedding for text from the endpoint at agent_index.

Return type:: EmbeddingResponse

async embed_all(texts)[source]¶

Scatter texts across endpoints round-robin; return responses in input order.

Return type:: list[EmbeddingResponse]

async close()[source]¶

Release resources. AsyncOpenAI clients do not require explicit close.

Return type:: None

class aurora_swarm.EmbeddingResponse(success, embedding, error=None, agent_index=-1)[source]¶

Bases: object

Result of a single embedding request.

success: bool¶

embedding: list[float] | None¶

error: str | None = None¶

agent_index: int = -1¶

class aurora_swarm.Response(success, text, error=None, agent_index=-1)[source]¶

Bases: object

Result of a single agent call.

success: bool¶

text: str¶

error: str | None = None¶

agent_index: int = -1¶

class aurora_swarm.VLLMPool(endpoints, model='openai/gpt-oss-120b', max_tokens=None, max_tokens_aggregation=None, model_max_context=None, buffer=512, use_batch=True, concurrency=512, connector_limit=1024, timeout=300.0, batch_concurrency=256, timeout_per_sequence=None, batch_timeout_cap=None)[source]¶

Bases: AgentPool

Agent pool that communicates via vLLM’s OpenAI-compatible API.

Parameters:

endpoints (list[AgentEndpoint]) – Agent endpoints (host + port where vLLM is listening).
model (str) – Model identifier passed in the "model" field of every request (e.g. "openai/gpt-oss-120b").
max_tokens (int | None) – Maximum tokens to generate per request (default context). Can be overridden via AURORA_SWARM_MAX_TOKENS env var.
max_tokens_aggregation (int | None) – Maximum tokens for aggregation/reduce steps (larger prompts). Can be overridden via AURORA_SWARM_MAX_TOKENS_AGGREGATION env var. Defaults to 2 * max_tokens if not specified.
model_max_context (int | None) – Model’s maximum context length. If None, will be fetched from vLLM’s /v1/models endpoint on first request. Can be overridden via AURORA_SWARM_MODEL_MAX_CONTEXT env var.
buffer (int) – Safety margin (in tokens) for dynamic sizing to account for reasoning overhead. Defaults to 512.
use_batch (bool) – If True, use batch prompting via the completions API for send_all_batched. If False, fall back to individual requests. Defaults to True.
concurrency (int) – Maximum number of in-flight requests.
connector_limit (int) – Maximum TCP connections in the aiohttp pool.
timeout (float) – Base per-request timeout in seconds. Single requests use this; batch requests use max(timeout, scaled) where scaled depends on batch size.
batch_concurrency (int) – vLLM’s max concurrent sequences (waves). Used to scale batch timeout; default 256.
timeout_per_sequence (float | None) – Estimated seconds per sequence for batch timeout scaling. Can be set via AURORA_SWARM_TIMEOUT_PER_SEQUENCE. Default 60.0.
batch_timeout_cap (float | None) – If set, cap the computed batch timeout so one huge batch does not get an extreme value. Optional.

async post(agent_index, prompt, max_tokens=None)[source]¶

Send prompt via the OpenAI chat-completions API on the agent.

The prompt is wrapped as a single user message.

Parameters:

agent_index (int) – Index of the agent to send the prompt to.
prompt (str) – The prompt text.
max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on prompt length and model context limit.

Return type:

Response

async post_batch(agent_index, prompts, max_tokens=None)[source]¶

Send multiple prompts to one agent via the completions API.

Uses the OpenAI completions endpoint which supports batch prompts (a list of strings). This reduces N HTTP requests to 1.

Parameters:

agent_index (int) – Index of the agent to send prompts to.
prompts (list[str]) – List of prompts to send in one batch.
max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on average prompt length.

Returns:

One Response per prompt, in the same order as the input.

Return type:

list[Response]

async send_all_batched(prompts, max_tokens=None)[source]¶

Send prompts using batch API, grouping by target agent.

Groups prompts by their target agent (round-robin based on index), then sends one batched request per agent. Reconstructs results in input order.

Parameters:

prompts (list[str]) – List of prompts to send.
max_tokens (int | None) – Optional max tokens override.

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async close()[source]¶

Return type:: None

aurora_swarm.parse_hostfile(path)[source]¶

Parse a hostfile and return a list of AgentEndpoint objects.

Parameters:: path (str | Path) – Path to the hostfile.
Returns:: Parsed endpoints in file order.
Return type:: list[AgentEndpoint]

Hostfile¶

class aurora_swarm.hostfile.AgentEndpoint(host, port, tags=<factory>)[source]¶

Bases: object

A single agent’s network address plus optional metadata tags.

host: str¶

port: int¶

tags: dict[str, str]¶

property url: str¶

aurora_swarm.hostfile.parse_hostfile(path)[source]¶

Parse a hostfile and return a list of AgentEndpoint objects.

Parameters:: path (str | Path) – Path to the hostfile.
Returns:: Parsed endpoints in file order.
Return type:: list[AgentEndpoint]

Pool¶

class aurora_swarm.pool.Response(success, text, error=None, agent_index=-1)[source]¶

Bases: object

Result of a single agent call.

success: bool¶

text: str¶

error: str | None = None¶

agent_index: int = -1¶

class aurora_swarm.pool.AgentPool(endpoints, concurrency=512, connector_limit=1024, timeout=120.0)[source]¶

Bases: object

Async pool of agent HTTP endpoints with concurrency control.

Parameters:

endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Agent endpoints — either AgentEndpoint objects or plain (host, port) tuples (tags will be empty).
concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).
connector_limit (int) – Maximum number of TCP connections in the aiohttp pool.
timeout (float) – Per-request timeout in seconds.

async close()[source]¶

Return type:: None

property size: int¶: Number of agents in the pool.

property timeout: float¶: Base per-request timeout in seconds.

property endpoints: list[AgentEndpoint]¶

by_tag(key, value)[source]¶

Return a sub-pool of agents whose tag key equals value.

Return type:: AgentPool

sample(n)[source]¶

Return a sub-pool of n randomly chosen agents.

Return type:: AgentPool

select(indices)[source]¶

Return a sub-pool with agents at the given indices.

Return type:: AgentPool

slice(start, stop)[source]¶

Return a sub-pool from index start to stop.

Return type:: AgentPool

async post(agent_index, prompt, max_tokens=None)[source]¶

Send prompt to the agent at agent_index and return its response.

The call is throttled by the pool-wide semaphore so that at most concurrency requests are in flight at once.

Parameters:

agent_index (int) – Index of the agent to send the prompt to.
prompt (str) – The prompt text.
max_tokens (int | None) – Optional maximum tokens to generate. Ignored by base AgentPool (only used by VLLMPool and subclasses).

Return type:

Response

async send_all(prompts)[source]¶

Send prompts[i] to agent[i % size] concurrently.

Returns responses in input order (i.e. results[i] corresponds to prompts[i]).

Return type:: list[Response]

async send_all_batched(prompts, max_tokens=None)[source]¶

Send prompts with batching if supported, otherwise use send_all.

Default implementation for base AgentPool — just delegates to send_all. VLLMPool overrides this to use batch API for efficiency.

Parameters:

prompts (list[str]) – List of prompts to send.
max_tokens (int | None) – Optional max tokens override (ignored in base implementation).

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async broadcast_prompt(prompt)[source]¶

Send the same prompt to every agent in the pool.

Return type:: list[Response]

VLLM pool¶

VLLMPool is an AgentPool subclass for vLLM OpenAI-compatible endpoints. It supports:

Config-based and dynamic context length (see Context length configuration)
Batch prompting for high-throughput inference (see Batch Prompting)
Both /v1/completions (batch mode) and /v1/chat/completions (non-batch) endpoints

class aurora_swarm.vllm_pool.VLLMPool(endpoints, model='openai/gpt-oss-120b', max_tokens=None, max_tokens_aggregation=None, model_max_context=None, buffer=512, use_batch=True, concurrency=512, connector_limit=1024, timeout=300.0, batch_concurrency=256, timeout_per_sequence=None, batch_timeout_cap=None)[source]¶

Bases: AgentPool

Agent pool that communicates via vLLM’s OpenAI-compatible API.

Parameters:

endpoints (list[AgentEndpoint]) – Agent endpoints (host + port where vLLM is listening).
model (str) – Model identifier passed in the "model" field of every request (e.g. "openai/gpt-oss-120b").
max_tokens (int | None) – Maximum tokens to generate per request (default context). Can be overridden via AURORA_SWARM_MAX_TOKENS env var.
max_tokens_aggregation (int | None) – Maximum tokens for aggregation/reduce steps (larger prompts). Can be overridden via AURORA_SWARM_MAX_TOKENS_AGGREGATION env var. Defaults to 2 * max_tokens if not specified.
model_max_context (int | None) – Model’s maximum context length. If None, will be fetched from vLLM’s /v1/models endpoint on first request. Can be overridden via AURORA_SWARM_MODEL_MAX_CONTEXT env var.
buffer (int) – Safety margin (in tokens) for dynamic sizing to account for reasoning overhead. Defaults to 512.
use_batch (bool) – If True, use batch prompting via the completions API for send_all_batched. If False, fall back to individual requests. Defaults to True.
concurrency (int) – Maximum number of in-flight requests.
connector_limit (int) – Maximum TCP connections in the aiohttp pool.
timeout (float) – Base per-request timeout in seconds. Single requests use this; batch requests use max(timeout, scaled) where scaled depends on batch size.
batch_concurrency (int) – vLLM’s max concurrent sequences (waves). Used to scale batch timeout; default 256.
timeout_per_sequence (float | None) – Estimated seconds per sequence for batch timeout scaling. Can be set via AURORA_SWARM_TIMEOUT_PER_SEQUENCE. Default 60.0.
batch_timeout_cap (float | None) – If set, cap the computed batch timeout so one huge batch does not get an extreme value. Optional.

async post(agent_index, prompt, max_tokens=None)[source]¶

Send prompt via the OpenAI chat-completions API on the agent.

The prompt is wrapped as a single user message.

Parameters:

agent_index (int) – Index of the agent to send the prompt to.
prompt (str) – The prompt text.
max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on prompt length and model context limit.

Return type:

Response

async post_batch(agent_index, prompts, max_tokens=None)[source]¶

Send multiple prompts to one agent via the completions API.

Uses the OpenAI completions endpoint which supports batch prompts (a list of strings). This reduces N HTTP requests to 1.

Parameters:

agent_index (int) – Index of the agent to send prompts to.
prompts (list[str]) – List of prompts to send in one batch.
max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on average prompt length.

Returns:

One Response per prompt, in the same order as the input.

Return type:

list[Response]

async send_all_batched(prompts, max_tokens=None)[source]¶

Send prompts using batch API, grouping by target agent.

Groups prompts by their target agent (round-robin based on index), then sends one batched request per agent. Reconstructs results in input order.

Parameters:

prompts (list[str]) – List of prompts to send.
max_tokens (int | None) – Optional max tokens override.

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async close()[source]¶

Return type:: None

Embedding pool¶

EmbeddingPool provides scatter-gather over OpenAI-compatible /v1/embeddings endpoints. It uses the same hostfile/endpoint model as AgentPool (e.g. parse_hostfile() and by_tag() for role-based filtering). Use it with scatter_gather_embeddings() for the same “pool + pattern” style as LLM scatter-gather.

class aurora_swarm.embedding_pool.EmbeddingResponse(success, embedding, error=None, agent_index=-1)[source]¶

Bases: object

Result of a single embedding request.

success: bool¶

embedding: list[float] | None¶

error: str | None = None¶

agent_index: int = -1¶

class aurora_swarm.embedding_pool.EmbeddingPool(endpoints, model, concurrency=512, timeout=60.0)[source]¶

Bases: object

Async pool of embedding endpoints (OpenAI-compatible /v1/embeddings).

Parameters:

endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Embedding endpoints — either AgentEndpoint objects or (host, port) tuples (tags will be empty).
model (str) – Embedding model id (e.g. sentence-transformers/all-MiniLM-L6-v2).
concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).
timeout (float) – Per-request timeout in seconds.

property size: int¶: Number of endpoints in the pool.

property timeout: float¶: Per-request timeout in seconds.

property endpoints: list[AgentEndpoint]¶

by_tag(key, value)[source]¶

Return a sub-pool of endpoints whose tag key equals value.

Return type:: EmbeddingPool

sample(n)[source]¶

Return a sub-pool of n randomly chosen endpoints.

Return type:: EmbeddingPool

select(indices)[source]¶

Return a sub-pool with endpoints at the given indices.

Return type:: EmbeddingPool

slice(start, stop)[source]¶

Return a sub-pool from index start to stop.

Return type:: EmbeddingPool

async embed_one(agent_index, text)[source]¶

Request embedding for text from the endpoint at agent_index.

Return type:: EmbeddingResponse

async embed_all(texts)[source]¶

Scatter texts across endpoints round-robin; return responses in input order.

Return type:: list[EmbeddingResponse]

async close()[source]¶

Release resources. AsyncOpenAI clients do not require explicit close.

Return type:: None

Aggregators¶

See Aggregators for usage guide and examples.

Aggregation strategies for agent responses.

Every aggregator silently skips responses with success=False unless include_failures=True is passed.

aurora_swarm.aggregators.majority_vote(responses, include_failures=False)[source]¶

Return (winner, confidence) where confidence is the vote fraction.

Responses are stripped and compared case-insensitively.

Return type:: tuple[str, float]

aurora_swarm.aggregators.concat(responses, separator='\\n', include_failures=False)[source]¶

Join all response texts with separator.

Return type:: str

aurora_swarm.aggregators.best_of(responses, score_fn, include_failures=False)[source]¶

Return the single highest-scoring response.

Return type:: Response

aurora_swarm.aggregators.top_k(responses, k, score_fn, include_failures=False)[source]¶

Return the k highest-scoring responses (descending).

Return type:: list[Response]

aurora_swarm.aggregators.structured_merge(responses, include_failures=False)[source]¶

Parse each response as JSON and merge into a flat list.

Returns (merged_list, errors) where errors captures parse failures with the agent index and error message.

Return type:: tuple[list[Any], list[dict[str, Any]]]

aurora_swarm.aggregators.statistics(responses, extract_fn=None, include_failures=False)[source]¶

Compute summary statistics over numeric response values.

If extract_fn is None, response text is converted to float directly.

Returns dict with keys mean, std, median, min, max.

Return type:: dict[str, float]

aurora_swarm.aggregators.failure_report(responses)[source]¶

Return a diagnostic summary of successes and failures.

Keys: total, success_count, failure_count, failures (list of {agent_index, error} dicts).

Return type:: dict[str, Any]

Communication patterns¶

Broadcast¶

async aurora_swarm.patterns.broadcast.broadcast(pool, prompt)[source]¶

Send prompt to every agent in pool, return all responses in order.

Return type:: list[Response]

async aurora_swarm.patterns.broadcast.broadcast_and_reduce(pool, prompt, reduce_prompt, reducer_agent_index=0)[source]¶

Two-phase broadcast: gather all responses, then reduce with one agent.

Parameters:

pool (AgentPool) – The agent pool to broadcast to.
prompt (str) – The prompt sent to every agent.
reduce_prompt (str) – A template string containing {responses} which will be replaced with the concatenated agent outputs.
reducer_agent_index (int) – Index of the agent (within pool) used for the reduction step.

Return type:

Response

Scatter-Gather¶

async aurora_swarm.patterns.scatter_gather.scatter_gather(pool, prompts)[source]¶

Send prompts[i] to agent[i % pool.size], gather in input order.

If there are more prompts than agents the work wraps round-robin. Uses batch API when available (VLLMPool) for improved throughput.

Return type:: list[Response]

async aurora_swarm.patterns.scatter_gather.map_gather(pool, items, prompt_template)[source]¶

Higher-level scatter: format prompt_template with each item.

The template must contain an {item} placeholder.

Parameters:

pool (AgentPool) – Agent pool.
items (list[Any]) – Work items — each is str()-ified and inserted into the template.
prompt_template (str) – Prompt with an {item} placeholder.

Return type:

list[Response]

Scatter-Gather (embeddings)¶

async aurora_swarm.patterns.embedding.scatter_gather_embeddings(embed_pool, texts)[source]¶

Send texts[i] to endpoint[i % pool.size], gather in input order.

Parameters:

embed_pool (EmbeddingPool) – Embedding pool (e.g. from parse_hostfile + by_tag).
texts (list[str]) – Texts to embed.

Returns:

One response per text, in same order as texts.

Return type:

list[EmbeddingResponse]

Tree-Reduce¶

async aurora_swarm.patterns.tree_reduce.tree_reduce(pool, prompt, reduce_prompt, fanin=50, items=None)[source]¶

Run a hierarchical tree-reduce over pool.

Parameters:

pool (AgentPool) – The agent pool (used for both leaf work and supervisors).
prompt (str) – Leaf-level task. If items is provided the template should contain an {item} placeholder.
reduce_prompt (str) – Supervisor summarisation task. Must contain {responses} and may contain {level}.
fanin (int) – Number of responses each supervisor handles per group.
items (list[Any] | None) – If given, scatter items across leaf agents (one per agent, round-robin). Otherwise the same prompt is broadcast.

Return type:

Response

Blackboard¶

class aurora_swarm.patterns.blackboard.Blackboard(sections, prompt_fn)[source]¶

Bases: object

Shared-state workspace for multi-round agent collaboration.

Parameters:

sections (list[str]) – Names of the board sections (e.g. ["hypotheses", "critiques"]).
prompt_fn (Callable[[str, dict[str, list[str]]], str]) – prompt_fn(role, board_state) -> str — generates the prompt that an agent with the given role should receive, given the current board contents.

property board: dict[str, list[str]]¶: Current board state (mutable reference).

property round: int¶: Number of completed rounds.

snapshot()[source]¶

Return a serialisable deep copy of the board state.

Return type:: dict[str, Any]

async run(pool, max_rounds=10, convergence_fn=None)[source]¶

Execute rounds until max_rounds or convergence.

Agent roles are determined by the role tag on each endpoint. Agents whose role matches a board section contribute to that section. Agents with no role tag or a role not in the board sections are skipped.

Parameters:

pool (AgentPool) – Agent pool with role-tagged endpoints.
max_rounds (int) – Upper bound on the number of rounds.
convergence_fn (Optional[Callable[[dict[str, list[str]]], bool]]) – Optional convergence_fn(board_state) -> bool. If it returns True after a round the session stops early.

Returns:

The final board.

Return type:

BoardState

Pipeline¶

class aurora_swarm.patterns.pipeline.Stage(name, prompt_template, n_agents, output_transform=None, output_filter=None)[source]¶

Bases: object

One step of a pipeline.

name¶: Human-readable label for the stage.

prompt_template¶: Must contain {input} which is replaced with the previous stage’s output (or the initial input for the first stage).

n_agents¶: How many agents this stage should use.

output_transform¶: f(responses) -> Any — reshapes the list of responses into a single value to feed the next stage. If None, responses are concatenated with newlines.

output_filter¶: f(response) -> bool — drops responses that return False before the transform step.

async aurora_swarm.patterns.pipeline.run_pipeline(pool, stages, initial_input, reuse_agents=True)[source]¶

Execute stages sequentially; the output of each flows to the next.

Parameters:

pool (AgentPool) – The full agent pool.
stages (list[Stage]) – Ordered list of pipeline stages.
initial_input (Any) – Value substituted into {input} for the first stage.
reuse_agents (bool) – If True all stages draw agents from the same pool (up to n_agents). If False the pool is partitioned so each stage receives a dedicated, non-overlapping subset.

Returns:

The transformed output of the final stage.

Return type:

Any

API Reference¶

Core¶

Package (aurora_swarm)¶

Hostfile¶

Pool¶

VLLM pool¶

Embedding pool¶

Aggregators¶

Communication patterns¶

Broadcast¶

Scatter-Gather¶

Scatter-Gather (embeddings)¶

Tree-Reduce¶

Blackboard¶

Pipeline¶

Aurora Swarm

Navigation

Related Topics