API Reference

Core

Package (aurora_swarm)

Aurora Swarm — communication patterns for large-scale LLM agent orchestration.

class aurora_swarm.AgentEndpoint(host, port, tags=<factory>)[source]

Bases: object

A single agent’s network address plus optional metadata tags.

host: str
port: int
tags: dict[str, str]
property url: str
class aurora_swarm.AgentPool(endpoints, concurrency=512, connector_limit=1024, timeout=120.0)[source]

Bases: object

Async pool of agent HTTP endpoints with concurrency control.

Parameters:
  • endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Agent endpoints — either AgentEndpoint objects or plain (host, port) tuples (tags will be empty).

  • concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).

  • connector_limit (int) – Maximum number of TCP connections in the aiohttp pool.

  • timeout (float) – Per-request timeout in seconds.

async close()[source]
Return type:

None

property size: int

Number of agents in the pool.

property timeout: float

Base per-request timeout in seconds.

property endpoints: list[AgentEndpoint]
by_tag(key, value)[source]

Return a sub-pool of agents whose tag key equals value.

Return type:

AgentPool

sample(n)[source]

Return a sub-pool of n randomly chosen agents.

Return type:

AgentPool

select(indices)[source]

Return a sub-pool with agents at the given indices.

Return type:

AgentPool

slice(start, stop)[source]

Return a sub-pool from index start to stop.

Return type:

AgentPool

async post(agent_index, prompt, max_tokens=None)[source]

Send prompt to the agent at agent_index and return its response.

The call is throttled by the pool-wide semaphore so that at most concurrency requests are in flight at once.

Parameters:
  • agent_index (int) – Index of the agent to send the prompt to.

  • prompt (str) – The prompt text.

  • max_tokens (int | None) – Optional maximum tokens to generate. Ignored by base AgentPool (only used by VLLMPool and subclasses).

Return type:

Response

async send_all(prompts)[source]

Send prompts[i] to agent[i % size] concurrently.

Returns responses in input order (i.e. results[i] corresponds to prompts[i]).

Return type:

list[Response]

async send_all_batched(prompts, max_tokens=None)[source]

Send prompts with batching if supported, otherwise use send_all.

Default implementation for base AgentPool — just delegates to send_all. VLLMPool overrides this to use batch API for efficiency.

Parameters:
  • prompts (list[str]) – List of prompts to send.

  • max_tokens (int | None) – Optional max tokens override (ignored in base implementation).

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async broadcast_prompt(prompt)[source]

Send the same prompt to every agent in the pool.

Return type:

list[Response]

class aurora_swarm.EmbeddingPool(endpoints, model, concurrency=512, timeout=60.0)[source]

Bases: object

Async pool of embedding endpoints (OpenAI-compatible /v1/embeddings).

Parameters:
  • endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Embedding endpoints — either AgentEndpoint objects or (host, port) tuples (tags will be empty).

  • model (str) – Embedding model id (e.g. sentence-transformers/all-MiniLM-L6-v2).

  • concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).

  • timeout (float) – Per-request timeout in seconds.

property size: int

Number of endpoints in the pool.

property timeout: float

Per-request timeout in seconds.

property endpoints: list[AgentEndpoint]
by_tag(key, value)[source]

Return a sub-pool of endpoints whose tag key equals value.

Return type:

EmbeddingPool

sample(n)[source]

Return a sub-pool of n randomly chosen endpoints.

Return type:

EmbeddingPool

select(indices)[source]

Return a sub-pool with endpoints at the given indices.

Return type:

EmbeddingPool

slice(start, stop)[source]

Return a sub-pool from index start to stop.

Return type:

EmbeddingPool

async embed_one(agent_index, text)[source]

Request embedding for text from the endpoint at agent_index.

Return type:

EmbeddingResponse

async embed_all(texts)[source]

Scatter texts across endpoints round-robin; return responses in input order.

Return type:

list[EmbeddingResponse]

async close()[source]

Release resources. AsyncOpenAI clients do not require explicit close.

Return type:

None

class aurora_swarm.EmbeddingResponse(success, embedding, error=None, agent_index=-1)[source]

Bases: object

Result of a single embedding request.

success: bool
embedding: list[float] | None
error: str | None = None
agent_index: int = -1
class aurora_swarm.Response(success, text, error=None, agent_index=-1)[source]

Bases: object

Result of a single agent call.

success: bool
text: str
error: str | None = None
agent_index: int = -1
class aurora_swarm.VLLMPool(endpoints, model='openai/gpt-oss-120b', max_tokens=None, max_tokens_aggregation=None, model_max_context=None, buffer=512, use_batch=True, concurrency=512, connector_limit=1024, timeout=300.0, batch_concurrency=256, timeout_per_sequence=None, batch_timeout_cap=None)[source]

Bases: AgentPool

Agent pool that communicates via vLLM’s OpenAI-compatible API.

Parameters:
  • endpoints (list[AgentEndpoint]) – Agent endpoints (host + port where vLLM is listening).

  • model (str) – Model identifier passed in the "model" field of every request (e.g. "openai/gpt-oss-120b").

  • max_tokens (int | None) – Maximum tokens to generate per request (default context). Can be overridden via AURORA_SWARM_MAX_TOKENS env var.

  • max_tokens_aggregation (int | None) – Maximum tokens for aggregation/reduce steps (larger prompts). Can be overridden via AURORA_SWARM_MAX_TOKENS_AGGREGATION env var. Defaults to 2 * max_tokens if not specified.

  • model_max_context (int | None) – Model’s maximum context length. If None, will be fetched from vLLM’s /v1/models endpoint on first request. Can be overridden via AURORA_SWARM_MODEL_MAX_CONTEXT env var.

  • buffer (int) – Safety margin (in tokens) for dynamic sizing to account for reasoning overhead. Defaults to 512.

  • use_batch (bool) – If True, use batch prompting via the completions API for send_all_batched. If False, fall back to individual requests. Defaults to True.

  • concurrency (int) – Maximum number of in-flight requests.

  • connector_limit (int) – Maximum TCP connections in the aiohttp pool.

  • timeout (float) – Base per-request timeout in seconds. Single requests use this; batch requests use max(timeout, scaled) where scaled depends on batch size.

  • batch_concurrency (int) – vLLM’s max concurrent sequences (waves). Used to scale batch timeout; default 256.

  • timeout_per_sequence (float | None) – Estimated seconds per sequence for batch timeout scaling. Can be set via AURORA_SWARM_TIMEOUT_PER_SEQUENCE. Default 60.0.

  • batch_timeout_cap (float | None) – If set, cap the computed batch timeout so one huge batch does not get an extreme value. Optional.

async post(agent_index, prompt, max_tokens=None)[source]

Send prompt via the OpenAI chat-completions API on the agent.

The prompt is wrapped as a single user message.

Parameters:
  • agent_index (int) – Index of the agent to send the prompt to.

  • prompt (str) – The prompt text.

  • max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on prompt length and model context limit.

Return type:

Response

async post_batch(agent_index, prompts, max_tokens=None)[source]

Send multiple prompts to one agent via the completions API.

Uses the OpenAI completions endpoint which supports batch prompts (a list of strings). This reduces N HTTP requests to 1.

Parameters:
  • agent_index (int) – Index of the agent to send prompts to.

  • prompts (list[str]) – List of prompts to send in one batch.

  • max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on average prompt length.

Returns:

One Response per prompt, in the same order as the input.

Return type:

list[Response]

async send_all_batched(prompts, max_tokens=None)[source]

Send prompts using batch API, grouping by target agent.

Groups prompts by their target agent (round-robin based on index), then sends one batched request per agent. Reconstructs results in input order.

Parameters:
  • prompts (list[str]) – List of prompts to send.

  • max_tokens (int | None) – Optional max tokens override.

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async close()[source]
Return type:

None

aurora_swarm.parse_hostfile(path)[source]

Parse a hostfile and return a list of AgentEndpoint objects.

Parameters:

path (str | Path) – Path to the hostfile.

Returns:

Parsed endpoints in file order.

Return type:

list[AgentEndpoint]

Hostfile

class aurora_swarm.hostfile.AgentEndpoint(host, port, tags=<factory>)[source]

Bases: object

A single agent’s network address plus optional metadata tags.

host: str
port: int
tags: dict[str, str]
property url: str
aurora_swarm.hostfile.parse_hostfile(path)[source]

Parse a hostfile and return a list of AgentEndpoint objects.

Parameters:

path (str | Path) – Path to the hostfile.

Returns:

Parsed endpoints in file order.

Return type:

list[AgentEndpoint]

Pool

class aurora_swarm.pool.Response(success, text, error=None, agent_index=-1)[source]

Bases: object

Result of a single agent call.

success: bool
text: str
error: str | None = None
agent_index: int = -1
class aurora_swarm.pool.AgentPool(endpoints, concurrency=512, connector_limit=1024, timeout=120.0)[source]

Bases: object

Async pool of agent HTTP endpoints with concurrency control.

Parameters:
  • endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Agent endpoints — either AgentEndpoint objects or plain (host, port) tuples (tags will be empty).

  • concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).

  • connector_limit (int) – Maximum number of TCP connections in the aiohttp pool.

  • timeout (float) – Per-request timeout in seconds.

async close()[source]
Return type:

None

property size: int

Number of agents in the pool.

property timeout: float

Base per-request timeout in seconds.

property endpoints: list[AgentEndpoint]
by_tag(key, value)[source]

Return a sub-pool of agents whose tag key equals value.

Return type:

AgentPool

sample(n)[source]

Return a sub-pool of n randomly chosen agents.

Return type:

AgentPool

select(indices)[source]

Return a sub-pool with agents at the given indices.

Return type:

AgentPool

slice(start, stop)[source]

Return a sub-pool from index start to stop.

Return type:

AgentPool

async post(agent_index, prompt, max_tokens=None)[source]

Send prompt to the agent at agent_index and return its response.

The call is throttled by the pool-wide semaphore so that at most concurrency requests are in flight at once.

Parameters:
  • agent_index (int) – Index of the agent to send the prompt to.

  • prompt (str) – The prompt text.

  • max_tokens (int | None) – Optional maximum tokens to generate. Ignored by base AgentPool (only used by VLLMPool and subclasses).

Return type:

Response

async send_all(prompts)[source]

Send prompts[i] to agent[i % size] concurrently.

Returns responses in input order (i.e. results[i] corresponds to prompts[i]).

Return type:

list[Response]

async send_all_batched(prompts, max_tokens=None)[source]

Send prompts with batching if supported, otherwise use send_all.

Default implementation for base AgentPool — just delegates to send_all. VLLMPool overrides this to use batch API for efficiency.

Parameters:
  • prompts (list[str]) – List of prompts to send.

  • max_tokens (int | None) – Optional max tokens override (ignored in base implementation).

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async broadcast_prompt(prompt)[source]

Send the same prompt to every agent in the pool.

Return type:

list[Response]

VLLM pool

VLLMPool is an AgentPool subclass for vLLM OpenAI-compatible endpoints. It supports:

  • Config-based and dynamic context length (see Context length configuration)

  • Batch prompting for high-throughput inference (see Batch Prompting)

  • Both /v1/completions (batch mode) and /v1/chat/completions (non-batch) endpoints

class aurora_swarm.vllm_pool.VLLMPool(endpoints, model='openai/gpt-oss-120b', max_tokens=None, max_tokens_aggregation=None, model_max_context=None, buffer=512, use_batch=True, concurrency=512, connector_limit=1024, timeout=300.0, batch_concurrency=256, timeout_per_sequence=None, batch_timeout_cap=None)[source]

Bases: AgentPool

Agent pool that communicates via vLLM’s OpenAI-compatible API.

Parameters:
  • endpoints (list[AgentEndpoint]) – Agent endpoints (host + port where vLLM is listening).

  • model (str) – Model identifier passed in the "model" field of every request (e.g. "openai/gpt-oss-120b").

  • max_tokens (int | None) – Maximum tokens to generate per request (default context). Can be overridden via AURORA_SWARM_MAX_TOKENS env var.

  • max_tokens_aggregation (int | None) – Maximum tokens for aggregation/reduce steps (larger prompts). Can be overridden via AURORA_SWARM_MAX_TOKENS_AGGREGATION env var. Defaults to 2 * max_tokens if not specified.

  • model_max_context (int | None) – Model’s maximum context length. If None, will be fetched from vLLM’s /v1/models endpoint on first request. Can be overridden via AURORA_SWARM_MODEL_MAX_CONTEXT env var.

  • buffer (int) – Safety margin (in tokens) for dynamic sizing to account for reasoning overhead. Defaults to 512.

  • use_batch (bool) – If True, use batch prompting via the completions API for send_all_batched. If False, fall back to individual requests. Defaults to True.

  • concurrency (int) – Maximum number of in-flight requests.

  • connector_limit (int) – Maximum TCP connections in the aiohttp pool.

  • timeout (float) – Base per-request timeout in seconds. Single requests use this; batch requests use max(timeout, scaled) where scaled depends on batch size.

  • batch_concurrency (int) – vLLM’s max concurrent sequences (waves). Used to scale batch timeout; default 256.

  • timeout_per_sequence (float | None) – Estimated seconds per sequence for batch timeout scaling. Can be set via AURORA_SWARM_TIMEOUT_PER_SEQUENCE. Default 60.0.

  • batch_timeout_cap (float | None) – If set, cap the computed batch timeout so one huge batch does not get an extreme value. Optional.

async post(agent_index, prompt, max_tokens=None)[source]

Send prompt via the OpenAI chat-completions API on the agent.

The prompt is wrapped as a single user message.

Parameters:
  • agent_index (int) – Index of the agent to send the prompt to.

  • prompt (str) – The prompt text.

  • max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on prompt length and model context limit.

Return type:

Response

async post_batch(agent_index, prompts, max_tokens=None)[source]

Send multiple prompts to one agent via the completions API.

Uses the OpenAI completions endpoint which supports batch prompts (a list of strings). This reduces N HTTP requests to 1.

Parameters:
  • agent_index (int) – Index of the agent to send prompts to.

  • prompts (list[str]) – List of prompts to send in one batch.

  • max_tokens (int | None) – Optional override for max tokens. If None, uses dynamic sizing based on average prompt length.

Returns:

One Response per prompt, in the same order as the input.

Return type:

list[Response]

async send_all_batched(prompts, max_tokens=None)[source]

Send prompts using batch API, grouping by target agent.

Groups prompts by their target agent (round-robin based on index), then sends one batched request per agent. Reconstructs results in input order.

Parameters:
  • prompts (list[str]) – List of prompts to send.

  • max_tokens (int | None) – Optional max tokens override.

Returns:

Responses in the same order as input prompts.

Return type:

list[Response]

async close()[source]
Return type:

None

Embedding pool

EmbeddingPool provides scatter-gather over OpenAI-compatible /v1/embeddings endpoints. It uses the same hostfile/endpoint model as AgentPool (e.g. parse_hostfile() and by_tag() for role-based filtering). Use it with scatter_gather_embeddings() for the same “pool + pattern” style as LLM scatter-gather.

class aurora_swarm.embedding_pool.EmbeddingResponse(success, embedding, error=None, agent_index=-1)[source]

Bases: object

Result of a single embedding request.

success: bool
embedding: list[float] | None
error: str | None = None
agent_index: int = -1
class aurora_swarm.embedding_pool.EmbeddingPool(endpoints, model, concurrency=512, timeout=60.0)[source]

Bases: object

Async pool of embedding endpoints (OpenAI-compatible /v1/embeddings).

Parameters:
  • endpoints (Sequence[AgentEndpoint | tuple[str, int]]) – Embedding endpoints — either AgentEndpoint objects or (host, port) tuples (tags will be empty).

  • model (str) – Embedding model id (e.g. sentence-transformers/all-MiniLM-L6-v2).

  • concurrency (int) – Maximum number of in-flight requests (asyncio semaphore size).

  • timeout (float) – Per-request timeout in seconds.

property size: int

Number of endpoints in the pool.

property timeout: float

Per-request timeout in seconds.

property endpoints: list[AgentEndpoint]
by_tag(key, value)[source]

Return a sub-pool of endpoints whose tag key equals value.

Return type:

EmbeddingPool

sample(n)[source]

Return a sub-pool of n randomly chosen endpoints.

Return type:

EmbeddingPool

select(indices)[source]

Return a sub-pool with endpoints at the given indices.

Return type:

EmbeddingPool

slice(start, stop)[source]

Return a sub-pool from index start to stop.

Return type:

EmbeddingPool

async embed_one(agent_index, text)[source]

Request embedding for text from the endpoint at agent_index.

Return type:

EmbeddingResponse

async embed_all(texts)[source]

Scatter texts across endpoints round-robin; return responses in input order.

Return type:

list[EmbeddingResponse]

async close()[source]

Release resources. AsyncOpenAI clients do not require explicit close.

Return type:

None

Aggregators

See Aggregators for usage guide and examples.

Aggregation strategies for agent responses.

Every aggregator silently skips responses with success=False unless include_failures=True is passed.

aurora_swarm.aggregators.majority_vote(responses, include_failures=False)[source]

Return (winner, confidence) where confidence is the vote fraction.

Responses are stripped and compared case-insensitively.

Return type:

tuple[str, float]

aurora_swarm.aggregators.concat(responses, separator='\\n', include_failures=False)[source]

Join all response texts with separator.

Return type:

str

aurora_swarm.aggregators.best_of(responses, score_fn, include_failures=False)[source]

Return the single highest-scoring response.

Return type:

Response

aurora_swarm.aggregators.top_k(responses, k, score_fn, include_failures=False)[source]

Return the k highest-scoring responses (descending).

Return type:

list[Response]

aurora_swarm.aggregators.structured_merge(responses, include_failures=False)[source]

Parse each response as JSON and merge into a flat list.

Returns (merged_list, errors) where errors captures parse failures with the agent index and error message.

Return type:

tuple[list[Any], list[dict[str, Any]]]

aurora_swarm.aggregators.statistics(responses, extract_fn=None, include_failures=False)[source]

Compute summary statistics over numeric response values.

If extract_fn is None, response text is converted to float directly.

Returns dict with keys mean, std, median, min, max.

Return type:

dict[str, float]

aurora_swarm.aggregators.failure_report(responses)[source]

Return a diagnostic summary of successes and failures.

Keys: total, success_count, failure_count, failures (list of {agent_index, error} dicts).

Return type:

dict[str, Any]

Communication patterns

Broadcast

async aurora_swarm.patterns.broadcast.broadcast(pool, prompt)[source]

Send prompt to every agent in pool, return all responses in order.

Return type:

list[Response]

async aurora_swarm.patterns.broadcast.broadcast_and_reduce(pool, prompt, reduce_prompt, reducer_agent_index=0)[source]

Two-phase broadcast: gather all responses, then reduce with one agent.

Parameters:
  • pool (AgentPool) – The agent pool to broadcast to.

  • prompt (str) – The prompt sent to every agent.

  • reduce_prompt (str) – A template string containing {responses} which will be replaced with the concatenated agent outputs.

  • reducer_agent_index (int) – Index of the agent (within pool) used for the reduction step.

Return type:

Response

Scatter-Gather

async aurora_swarm.patterns.scatter_gather.scatter_gather(pool, prompts)[source]

Send prompts[i] to agent[i % pool.size], gather in input order.

If there are more prompts than agents the work wraps round-robin. Uses batch API when available (VLLMPool) for improved throughput.

Return type:

list[Response]

async aurora_swarm.patterns.scatter_gather.map_gather(pool, items, prompt_template)[source]

Higher-level scatter: format prompt_template with each item.

The template must contain an {item} placeholder.

Parameters:
  • pool (AgentPool) – Agent pool.

  • items (list[Any]) – Work items — each is str()-ified and inserted into the template.

  • prompt_template (str) – Prompt with an {item} placeholder.

Return type:

list[Response]

Scatter-Gather (embeddings)

async aurora_swarm.patterns.embedding.scatter_gather_embeddings(embed_pool, texts)[source]

Send texts[i] to endpoint[i % pool.size], gather in input order.

Parameters:
  • embed_pool (EmbeddingPool) – Embedding pool (e.g. from parse_hostfile + by_tag).

  • texts (list[str]) – Texts to embed.

Returns:

One response per text, in same order as texts.

Return type:

list[EmbeddingResponse]

Tree-Reduce

async aurora_swarm.patterns.tree_reduce.tree_reduce(pool, prompt, reduce_prompt, fanin=50, items=None)[source]

Run a hierarchical tree-reduce over pool.

Parameters:
  • pool (AgentPool) – The agent pool (used for both leaf work and supervisors).

  • prompt (str) – Leaf-level task. If items is provided the template should contain an {item} placeholder.

  • reduce_prompt (str) – Supervisor summarisation task. Must contain {responses} and may contain {level}.

  • fanin (int) – Number of responses each supervisor handles per group.

  • items (list[Any] | None) – If given, scatter items across leaf agents (one per agent, round-robin). Otherwise the same prompt is broadcast.

Return type:

Response

Blackboard

class aurora_swarm.patterns.blackboard.Blackboard(sections, prompt_fn)[source]

Bases: object

Shared-state workspace for multi-round agent collaboration.

Parameters:
  • sections (list[str]) – Names of the board sections (e.g. ["hypotheses", "critiques"]).

  • prompt_fn (Callable[[str, dict[str, list[str]]], str]) – prompt_fn(role, board_state) -> str — generates the prompt that an agent with the given role should receive, given the current board contents.

property board: dict[str, list[str]]

Current board state (mutable reference).

property round: int

Number of completed rounds.

snapshot()[source]

Return a serialisable deep copy of the board state.

Return type:

dict[str, Any]

async run(pool, max_rounds=10, convergence_fn=None)[source]

Execute rounds until max_rounds or convergence.

Agent roles are determined by the role tag on each endpoint. Agents whose role matches a board section contribute to that section. Agents with no role tag or a role not in the board sections are skipped.

Parameters:
  • pool (AgentPool) – Agent pool with role-tagged endpoints.

  • max_rounds (int) – Upper bound on the number of rounds.

  • convergence_fn (Optional[Callable[[dict[str, list[str]]], bool]]) – Optional convergence_fn(board_state) -> bool. If it returns True after a round the session stops early.

Returns:

The final board.

Return type:

BoardState

Pipeline

class aurora_swarm.patterns.pipeline.Stage(name, prompt_template, n_agents, output_transform=None, output_filter=None)[source]

Bases: object

One step of a pipeline.

name

Human-readable label for the stage.

prompt_template

Must contain {input} which is replaced with the previous stage’s output (or the initial input for the first stage).

n_agents

How many agents this stage should use.

output_transform

f(responses) -> Any — reshapes the list of responses into a single value to feed the next stage. If None, responses are concatenated with newlines.

output_filter

f(response) -> bool — drops responses that return False before the transform step.

async aurora_swarm.patterns.pipeline.run_pipeline(pool, stages, initial_input, reuse_agents=True)[source]

Execute stages sequentially; the output of each flows to the next.

Parameters:
  • pool (AgentPool) – The full agent pool.

  • stages (list[Stage]) – Ordered list of pipeline stages.

  • initial_input (Any) – Value substituted into {input} for the first stage.

  • reuse_agents (bool) – If True all stages draw agents from the same pool (up to n_agents). If False the pool is partitioned so each stage receives a dedicated, non-overlapping subset.

Returns:

The transformed output of the final stage.

Return type:

Any