API Reference¶
Core¶
Package (aurora_swarm)¶
Aurora Swarm — communication patterns for large-scale LLM agent orchestration.
- class aurora_swarm.AgentEndpoint(host, port, tags=<factory>)[source]¶
Bases:
objectA single agent’s network address plus optional metadata tags.
- host: str¶
- port: int¶
- tags: dict[str, str]¶
- property url: str¶
- class aurora_swarm.AgentPool(endpoints, concurrency=512, connector_limit=1024, timeout=120.0)[source]¶
Bases:
objectAsync pool of agent HTTP endpoints with concurrency control.
- Parameters:
endpoints (
Sequence[AgentEndpoint|tuple[str,int]]) – Agent endpoints — eitherAgentEndpointobjects or plain(host, port)tuples (tags will be empty).concurrency (
int) – Maximum number of in-flight requests (asyncio semaphore size).connector_limit (
int) – Maximum number of TCP connections in the aiohttp pool.timeout (
float) – Per-request timeout in seconds.
- property size: int¶
Number of agents in the pool.
- property timeout: float¶
Base per-request timeout in seconds.
- property endpoints: list[AgentEndpoint]¶
- async post(agent_index, prompt, max_tokens=None)[source]¶
Send prompt to the agent at agent_index and return its response.
The call is throttled by the pool-wide semaphore so that at most
concurrencyrequests are in flight at once.- Parameters:
agent_index (
int) – Index of the agent to send the prompt to.prompt (
str) – The prompt text.max_tokens (
int|None) – Optional maximum tokens to generate. Ignored by base AgentPool (only used by VLLMPool and subclasses).
- Return type:
- async send_all(prompts)[source]¶
Send
prompts[i]toagent[i % size]concurrently.Returns responses in input order (i.e.
results[i]corresponds toprompts[i]).- Return type:
list[Response]
- async send_all_batched(prompts, max_tokens=None)[source]¶
Send prompts with batching if supported, otherwise use send_all.
Default implementation for base AgentPool — just delegates to send_all. VLLMPool overrides this to use batch API for efficiency.
- Parameters:
prompts (
list[str]) – List of prompts to send.max_tokens (
int|None) – Optional max tokens override (ignored in base implementation).
- Returns:
Responses in the same order as input prompts.
- Return type:
list[Response]
- class aurora_swarm.EmbeddingPool(endpoints, model, concurrency=512, timeout=60.0)[source]¶
Bases:
objectAsync pool of embedding endpoints (OpenAI-compatible /v1/embeddings).
- Parameters:
endpoints (
Sequence[AgentEndpoint|tuple[str,int]]) – Embedding endpoints — eitherAgentEndpointobjects or(host, port)tuples (tags will be empty).model (
str) – Embedding model id (e.g. sentence-transformers/all-MiniLM-L6-v2).concurrency (
int) – Maximum number of in-flight requests (asyncio semaphore size).timeout (
float) – Per-request timeout in seconds.
- property size: int¶
Number of endpoints in the pool.
- property timeout: float¶
Per-request timeout in seconds.
- property endpoints: list[AgentEndpoint]¶
- async embed_one(agent_index, text)[source]¶
Request embedding for text from the endpoint at agent_index.
- Return type:
- async embed_all(texts)[source]¶
Scatter texts across endpoints round-robin; return responses in input order.
- Return type:
list[EmbeddingResponse]
- class aurora_swarm.EmbeddingResponse(success, embedding, error=None, agent_index=-1)[source]¶
Bases:
objectResult of a single embedding request.
- success: bool¶
- embedding: list[float] | None¶
- error: str | None = None¶
- agent_index: int = -1¶
- class aurora_swarm.Response(success, text, error=None, agent_index=-1)[source]¶
Bases:
objectResult of a single agent call.
- success: bool¶
- text: str¶
- error: str | None = None¶
- agent_index: int = -1¶
- class aurora_swarm.VLLMPool(endpoints, model='openai/gpt-oss-120b', max_tokens=None, max_tokens_aggregation=None, model_max_context=None, buffer=512, use_batch=True, concurrency=512, connector_limit=1024, timeout=300.0, batch_concurrency=256, timeout_per_sequence=None, batch_timeout_cap=None)[source]¶
Bases:
AgentPoolAgent pool that communicates via vLLM’s OpenAI-compatible API.
- Parameters:
endpoints (
list[AgentEndpoint]) – Agent endpoints (host + port where vLLM is listening).model (
str) – Model identifier passed in the"model"field of every request (e.g."openai/gpt-oss-120b").max_tokens (
int|None) – Maximum tokens to generate per request (default context). Can be overridden viaAURORA_SWARM_MAX_TOKENSenv var.max_tokens_aggregation (
int|None) – Maximum tokens for aggregation/reduce steps (larger prompts). Can be overridden viaAURORA_SWARM_MAX_TOKENS_AGGREGATIONenv var. Defaults to 2 * max_tokens if not specified.model_max_context (
int|None) – Model’s maximum context length. If None, will be fetched from vLLM’s/v1/modelsendpoint on first request. Can be overridden viaAURORA_SWARM_MODEL_MAX_CONTEXTenv var.buffer (
int) – Safety margin (in tokens) for dynamic sizing to account for reasoning overhead. Defaults to 512.use_batch (
bool) – If True, use batch prompting via the completions API for send_all_batched. If False, fall back to individual requests. Defaults to True.concurrency (
int) – Maximum number of in-flight requests.connector_limit (
int) – Maximum TCP connections in the aiohttp pool.timeout (
float) – Base per-request timeout in seconds. Single requests use this; batch requests use max(timeout, scaled) where scaled depends on batch size.batch_concurrency (
int) – vLLM’s max concurrent sequences (waves). Used to scale batch timeout; default 256.timeout_per_sequence (
float|None) – Estimated seconds per sequence for batch timeout scaling. Can be set via AURORA_SWARM_TIMEOUT_PER_SEQUENCE. Default 60.0.batch_timeout_cap (
float|None) – If set, cap the computed batch timeout so one huge batch does not get an extreme value. Optional.
- async post(agent_index, prompt, max_tokens=None)[source]¶
Send prompt via the OpenAI chat-completions API on the agent.
The prompt is wrapped as a single
usermessage.- Parameters:
agent_index (
int) – Index of the agent to send the prompt to.prompt (
str) – The prompt text.max_tokens (
int|None) – Optional override for max tokens. If None, uses dynamic sizing based on prompt length and model context limit.
- Return type:
- async post_batch(agent_index, prompts, max_tokens=None)[source]¶
Send multiple prompts to one agent via the completions API.
Uses the OpenAI completions endpoint which supports batch prompts (a list of strings). This reduces N HTTP requests to 1.
- Parameters:
agent_index (
int) – Index of the agent to send prompts to.prompts (
list[str]) – List of prompts to send in one batch.max_tokens (
int|None) – Optional override for max tokens. If None, uses dynamic sizing based on average prompt length.
- Returns:
One Response per prompt, in the same order as the input.
- Return type:
list[Response]
- async send_all_batched(prompts, max_tokens=None)[source]¶
Send prompts using batch API, grouping by target agent.
Groups prompts by their target agent (round-robin based on index), then sends one batched request per agent. Reconstructs results in input order.
- Parameters:
prompts (
list[str]) – List of prompts to send.max_tokens (
int|None) – Optional max tokens override.
- Returns:
Responses in the same order as input prompts.
- Return type:
list[Response]
- aurora_swarm.parse_hostfile(path)[source]¶
Parse a hostfile and return a list of
AgentEndpointobjects.- Parameters:
path (
str|Path) – Path to the hostfile.- Returns:
Parsed endpoints in file order.
- Return type:
list[AgentEndpoint]
Hostfile¶
- class aurora_swarm.hostfile.AgentEndpoint(host, port, tags=<factory>)[source]¶
Bases:
objectA single agent’s network address plus optional metadata tags.
- host: str¶
- port: int¶
- tags: dict[str, str]¶
- property url: str¶
- aurora_swarm.hostfile.parse_hostfile(path)[source]¶
Parse a hostfile and return a list of
AgentEndpointobjects.- Parameters:
path (
str|Path) – Path to the hostfile.- Returns:
Parsed endpoints in file order.
- Return type:
list[AgentEndpoint]
Pool¶
- class aurora_swarm.pool.Response(success, text, error=None, agent_index=-1)[source]¶
Bases:
objectResult of a single agent call.
- success: bool¶
- text: str¶
- error: str | None = None¶
- agent_index: int = -1¶
- class aurora_swarm.pool.AgentPool(endpoints, concurrency=512, connector_limit=1024, timeout=120.0)[source]¶
Bases:
objectAsync pool of agent HTTP endpoints with concurrency control.
- Parameters:
endpoints (
Sequence[AgentEndpoint|tuple[str,int]]) – Agent endpoints — eitherAgentEndpointobjects or plain(host, port)tuples (tags will be empty).concurrency (
int) – Maximum number of in-flight requests (asyncio semaphore size).connector_limit (
int) – Maximum number of TCP connections in the aiohttp pool.timeout (
float) – Per-request timeout in seconds.
- property size: int¶
Number of agents in the pool.
- property timeout: float¶
Base per-request timeout in seconds.
- property endpoints: list[AgentEndpoint]¶
- async post(agent_index, prompt, max_tokens=None)[source]¶
Send prompt to the agent at agent_index and return its response.
The call is throttled by the pool-wide semaphore so that at most
concurrencyrequests are in flight at once.- Parameters:
agent_index (
int) – Index of the agent to send the prompt to.prompt (
str) – The prompt text.max_tokens (
int|None) – Optional maximum tokens to generate. Ignored by base AgentPool (only used by VLLMPool and subclasses).
- Return type:
- async send_all(prompts)[source]¶
Send
prompts[i]toagent[i % size]concurrently.Returns responses in input order (i.e.
results[i]corresponds toprompts[i]).- Return type:
list[Response]
- async send_all_batched(prompts, max_tokens=None)[source]¶
Send prompts with batching if supported, otherwise use send_all.
Default implementation for base AgentPool — just delegates to send_all. VLLMPool overrides this to use batch API for efficiency.
- Parameters:
prompts (
list[str]) – List of prompts to send.max_tokens (
int|None) – Optional max tokens override (ignored in base implementation).
- Returns:
Responses in the same order as input prompts.
- Return type:
list[Response]
VLLM pool¶
VLLMPool is an AgentPool subclass for vLLM OpenAI-compatible endpoints. It supports:
Config-based and dynamic context length (see Context length configuration)
Batch prompting for high-throughput inference (see Batch Prompting)
Both
/v1/completions(batch mode) and/v1/chat/completions(non-batch) endpoints
- class aurora_swarm.vllm_pool.VLLMPool(endpoints, model='openai/gpt-oss-120b', max_tokens=None, max_tokens_aggregation=None, model_max_context=None, buffer=512, use_batch=True, concurrency=512, connector_limit=1024, timeout=300.0, batch_concurrency=256, timeout_per_sequence=None, batch_timeout_cap=None)[source]¶
Bases:
AgentPoolAgent pool that communicates via vLLM’s OpenAI-compatible API.
- Parameters:
endpoints (
list[AgentEndpoint]) – Agent endpoints (host + port where vLLM is listening).model (
str) – Model identifier passed in the"model"field of every request (e.g."openai/gpt-oss-120b").max_tokens (
int|None) – Maximum tokens to generate per request (default context). Can be overridden viaAURORA_SWARM_MAX_TOKENSenv var.max_tokens_aggregation (
int|None) – Maximum tokens for aggregation/reduce steps (larger prompts). Can be overridden viaAURORA_SWARM_MAX_TOKENS_AGGREGATIONenv var. Defaults to 2 * max_tokens if not specified.model_max_context (
int|None) – Model’s maximum context length. If None, will be fetched from vLLM’s/v1/modelsendpoint on first request. Can be overridden viaAURORA_SWARM_MODEL_MAX_CONTEXTenv var.buffer (
int) – Safety margin (in tokens) for dynamic sizing to account for reasoning overhead. Defaults to 512.use_batch (
bool) – If True, use batch prompting via the completions API for send_all_batched. If False, fall back to individual requests. Defaults to True.concurrency (
int) – Maximum number of in-flight requests.connector_limit (
int) – Maximum TCP connections in the aiohttp pool.timeout (
float) – Base per-request timeout in seconds. Single requests use this; batch requests use max(timeout, scaled) where scaled depends on batch size.batch_concurrency (
int) – vLLM’s max concurrent sequences (waves). Used to scale batch timeout; default 256.timeout_per_sequence (
float|None) – Estimated seconds per sequence for batch timeout scaling. Can be set via AURORA_SWARM_TIMEOUT_PER_SEQUENCE. Default 60.0.batch_timeout_cap (
float|None) – If set, cap the computed batch timeout so one huge batch does not get an extreme value. Optional.
- async post(agent_index, prompt, max_tokens=None)[source]¶
Send prompt via the OpenAI chat-completions API on the agent.
The prompt is wrapped as a single
usermessage.- Parameters:
agent_index (
int) – Index of the agent to send the prompt to.prompt (
str) – The prompt text.max_tokens (
int|None) – Optional override for max tokens. If None, uses dynamic sizing based on prompt length and model context limit.
- Return type:
- async post_batch(agent_index, prompts, max_tokens=None)[source]¶
Send multiple prompts to one agent via the completions API.
Uses the OpenAI completions endpoint which supports batch prompts (a list of strings). This reduces N HTTP requests to 1.
- Parameters:
agent_index (
int) – Index of the agent to send prompts to.prompts (
list[str]) – List of prompts to send in one batch.max_tokens (
int|None) – Optional override for max tokens. If None, uses dynamic sizing based on average prompt length.
- Returns:
One Response per prompt, in the same order as the input.
- Return type:
list[Response]
- async send_all_batched(prompts, max_tokens=None)[source]¶
Send prompts using batch API, grouping by target agent.
Groups prompts by their target agent (round-robin based on index), then sends one batched request per agent. Reconstructs results in input order.
- Parameters:
prompts (
list[str]) – List of prompts to send.max_tokens (
int|None) – Optional max tokens override.
- Returns:
Responses in the same order as input prompts.
- Return type:
list[Response]
Embedding pool¶
EmbeddingPool provides scatter-gather over OpenAI-compatible /v1/embeddings endpoints. It uses the same hostfile/endpoint model as AgentPool (e.g. parse_hostfile() and by_tag() for role-based filtering). Use it with scatter_gather_embeddings() for the same “pool + pattern” style as LLM scatter-gather.
- class aurora_swarm.embedding_pool.EmbeddingResponse(success, embedding, error=None, agent_index=-1)[source]¶
Bases:
objectResult of a single embedding request.
- success: bool¶
- embedding: list[float] | None¶
- error: str | None = None¶
- agent_index: int = -1¶
- class aurora_swarm.embedding_pool.EmbeddingPool(endpoints, model, concurrency=512, timeout=60.0)[source]¶
Bases:
objectAsync pool of embedding endpoints (OpenAI-compatible /v1/embeddings).
- Parameters:
endpoints (
Sequence[AgentEndpoint|tuple[str,int]]) – Embedding endpoints — eitherAgentEndpointobjects or(host, port)tuples (tags will be empty).model (
str) – Embedding model id (e.g. sentence-transformers/all-MiniLM-L6-v2).concurrency (
int) – Maximum number of in-flight requests (asyncio semaphore size).timeout (
float) – Per-request timeout in seconds.
- property size: int¶
Number of endpoints in the pool.
- property timeout: float¶
Per-request timeout in seconds.
- property endpoints: list[AgentEndpoint]¶
- async embed_one(agent_index, text)[source]¶
Request embedding for text from the endpoint at agent_index.
- Return type:
- async embed_all(texts)[source]¶
Scatter texts across endpoints round-robin; return responses in input order.
- Return type:
list[EmbeddingResponse]
Aggregators¶
See Aggregators for usage guide and examples.
Aggregation strategies for agent responses.
Every aggregator silently skips responses with success=False unless
include_failures=True is passed.
- aurora_swarm.aggregators.majority_vote(responses, include_failures=False)[source]¶
Return
(winner, confidence)where confidence is the vote fraction.Responses are stripped and compared case-insensitively.
- Return type:
tuple[str,float]
- aurora_swarm.aggregators.concat(responses, separator='\\n', include_failures=False)[source]¶
Join all response texts with separator.
- Return type:
str
- aurora_swarm.aggregators.best_of(responses, score_fn, include_failures=False)[source]¶
Return the single highest-scoring response.
- Return type:
- aurora_swarm.aggregators.top_k(responses, k, score_fn, include_failures=False)[source]¶
Return the k highest-scoring responses (descending).
- Return type:
list[Response]
- aurora_swarm.aggregators.structured_merge(responses, include_failures=False)[source]¶
Parse each response as JSON and merge into a flat list.
Returns
(merged_list, errors)where errors captures parse failures with the agent index and error message.- Return type:
tuple[list[Any],list[dict[str,Any]]]
- aurora_swarm.aggregators.statistics(responses, extract_fn=None, include_failures=False)[source]¶
Compute summary statistics over numeric response values.
If extract_fn is
None, response text is converted to float directly.Returns dict with keys
mean,std,median,min,max.- Return type:
dict[str,float]
Communication patterns¶
Broadcast¶
- async aurora_swarm.patterns.broadcast.broadcast(pool, prompt)[source]¶
Send prompt to every agent in pool, return all responses in order.
- Return type:
list[Response]
- async aurora_swarm.patterns.broadcast.broadcast_and_reduce(pool, prompt, reduce_prompt, reducer_agent_index=0)[source]¶
Two-phase broadcast: gather all responses, then reduce with one agent.
- Parameters:
pool (
AgentPool) – The agent pool to broadcast to.prompt (
str) – The prompt sent to every agent.reduce_prompt (
str) – A template string containing{responses}which will be replaced with the concatenated agent outputs.reducer_agent_index (
int) – Index of the agent (within pool) used for the reduction step.
- Return type:
Scatter-Gather¶
- async aurora_swarm.patterns.scatter_gather.scatter_gather(pool, prompts)[source]¶
Send
prompts[i]toagent[i % pool.size], gather in input order.If there are more prompts than agents the work wraps round-robin. Uses batch API when available (VLLMPool) for improved throughput.
- Return type:
list[Response]
Scatter-Gather (embeddings)¶
- async aurora_swarm.patterns.embedding.scatter_gather_embeddings(embed_pool, texts)[source]¶
Send
texts[i]toendpoint[i % pool.size], gather in input order.- Parameters:
embed_pool (
EmbeddingPool) – Embedding pool (e.g. from parse_hostfile + by_tag).texts (
list[str]) – Texts to embed.
- Returns:
One response per text, in same order as texts.
- Return type:
list[EmbeddingResponse]
Tree-Reduce¶
- async aurora_swarm.patterns.tree_reduce.tree_reduce(pool, prompt, reduce_prompt, fanin=50, items=None)[source]¶
Run a hierarchical tree-reduce over pool.
- Parameters:
pool (
AgentPool) – The agent pool (used for both leaf work and supervisors).prompt (
str) – Leaf-level task. If items is provided the template should contain an{item}placeholder.reduce_prompt (
str) – Supervisor summarisation task. Must contain{responses}and may contain{level}.fanin (
int) – Number of responses each supervisor handles per group.items (
list[Any] |None) – If given, scatter items across leaf agents (one per agent, round-robin). Otherwise the same prompt is broadcast.
- Return type:
Blackboard¶
- class aurora_swarm.patterns.blackboard.Blackboard(sections, prompt_fn)[source]¶
Bases:
objectShared-state workspace for multi-round agent collaboration.
- Parameters:
sections (
list[str]) – Names of the board sections (e.g.["hypotheses", "critiques"]).prompt_fn (
Callable[[str,dict[str,list[str]]],str]) –prompt_fn(role, board_state) -> str— generates the prompt that an agent with the given role should receive, given the current board contents.
- property board: dict[str, list[str]]¶
Current board state (mutable reference).
- property round: int¶
Number of completed rounds.
- async run(pool, max_rounds=10, convergence_fn=None)[source]¶
Execute rounds until max_rounds or convergence.
Agent roles are determined by the
roletag on each endpoint. Agents whoserolematches a board section contribute to that section. Agents with noroletag or a role not in the board sections are skipped.- Parameters:
pool (
AgentPool) – Agent pool with role-tagged endpoints.max_rounds (
int) – Upper bound on the number of rounds.convergence_fn (
Optional[Callable[[dict[str,list[str]]],bool]]) – Optionalconvergence_fn(board_state) -> bool. If it returnsTrueafter a round the session stops early.
- Returns:
The final board.
- Return type:
BoardState
Pipeline¶
- class aurora_swarm.patterns.pipeline.Stage(name, prompt_template, n_agents, output_transform=None, output_filter=None)[source]¶
Bases:
objectOne step of a pipeline.
- name¶
Human-readable label for the stage.
- prompt_template¶
Must contain
{input}which is replaced with the previous stage’s output (or the initial input for the first stage).
- n_agents¶
How many agents this stage should use.
- output_transform¶
f(responses) -> Any— reshapes the list of responses into a single value to feed the next stage. IfNone, responses are concatenated with newlines.
- output_filter¶
f(response) -> bool— drops responses that returnFalsebefore the transform step.
- async aurora_swarm.patterns.pipeline.run_pipeline(pool, stages, initial_input, reuse_agents=True)[source]¶
Execute stages sequentially; the output of each flows to the next.
- Parameters:
pool (
AgentPool) – The full agent pool.stages (
list[Stage]) – Ordered list of pipeline stages.initial_input (
Any) – Value substituted into{input}for the first stage.reuse_agents (
bool) – IfTrueall stages draw agents from the same pool (up ton_agents). IfFalsethe pool is partitioned so each stage receives a dedicated, non-overlapping subset.
- Returns:
The transformed output of the final stage.
- Return type:
Any