Starway is an ultra-fast communication library that wraps OpenUCX to deliver lock-free, asynchronous, zero-copy messaging for Python applications.
- Zero-copy transfers into preallocated NumPy buffers.
- Full-duplex messaging with independent async send/recv APIs.
- Choice of connection style: traditional TCP socket address or direct UCX worker-address handshakes (no TCP listener required).
- Flush primitives (
aflush,aflush_ep) to force completion ordering when you need delivery guarantees. - Built-in performance probes via
evaluate_perffor quick transport telemetry.
Starway depends on dynamic OpenUCX libraries:
- Install OpenUCX system-wide (for example via your distro's package manager or an HPC toolchain), or
- Install the
libucx-cu12wheel, which ships redistributable shared objects.
Starway does not hard-depend on libucx-cu12; it is treated as an optional
fallback.
You can use environment variable to control System/Wheel preference:
import os
# Defaults to "true". Set to "false" to prefer the wheel first.
os.environ["STARWAY_USE_SYSTEM_UCX"] = "false"
import starway # falls back to system UCX if the wheel is unavailableimport asyncio
import numpy as np
from starway import Client, Server
SERVER_ADDR, SERVER_PORT = "127.0.0.1", 19198
async def main():
server = Server()
client = Client()
server.listen(SERVER_ADDR, SERVER_PORT)
await client.aconnect(SERVER_ADDR, SERVER_PORT)
# Establish endpoint object on the server side
client_ep = next(iter(server.list_clients()))
send_buf = np.arange(4, dtype=np.uint8)
recv_buf = np.zeros_like(send_buf)
recv_task = server.arecv(recv_buf, tag=1, tag_mask=0xFFFF)
await client.asend(send_buf, tag=1)
await recv_task
await asyncio.gather(client.aflush(), server.aflush_ep(client_ep))
await asyncio.gather(client.aclose(), server.aclose())
asyncio.run(main())import asyncio
import numpy as np
from starway import Client, Server
async def main():
server = Server()
worker_address = server.listen_address() # bytes blob, no TCP listener
client = Client()
await client.aconnect_address(worker_address)
# Accept callback still fires; poll list_clients() to obtain the peer ep
for _ in range(100):
clients = server.list_clients()
if clients:
client_ep = next(iter(clients))
break
await asyncio.sleep(0.01)
recv_buf = np.zeros(4, dtype=np.uint8)
recv_task = server.arecv(recv_buf, tag=2, tag_mask=0xFFFF)
await client.asend(np.arange(4, dtype=np.uint8), tag=2)
await recv_task
await asyncio.gather(client.aclose(), server.aclose())
asyncio.run(main())listen(addr, port)– create a TCP-backed listener.listen_address()– start worker-only mode and return the local UCX worker address for out-of-band distribution.set_accept_cb(callback)– invoked for each new endpoint (both listener and worker-address flows).list_clients()– inspect active UCX endpoints.asend(client_ep, buffer, tag)/arecv(buffer, tag, tag_mask)– async one-sided messaging.aflush()– wait for all server-side operations to complete.aflush_ep(client_ep)– flush operations targeting a specific endpoint.evaluate_perf(client_ep, msg_size)– request UCX transport estimates.get_worker_address()– read back the cached worker address bytes.aclose()– gracefully shut down and release resources.
aconnect(addr, port)– connect to a listener.aconnect_address(worker_address)– connect using a UCX worker address only.asend(buffer, tag)/arecv(buffer, tag, tag_mask)– async messaging APIs.aflush()– wait for in-flight client operations to finish.evaluate_perf(msg_size)– ask UCX for local transport estimates.get_worker_address()– shareable bytes for reciprocal connections.aclose()– close the connection and stop the worker.
Inspect transport metadata for debugging or topology decisions:
.name– UCX endpoint name..local_addr,.local_port,.remote_addr,.remote_port– socket details when available (address-only mode may leave these empty)..view_transports()– list(device, transport)tuples negotiated by UCX.
UCX operations are inherently asynchronous. Starway exposes lightweight futures on top of UCX requests and lets you opt into ordering guarantees:
Client.aflush()andServer.aflush()drain all pending operations on their respective workers.Server.aflush_ep(ep)waits for sends targeting a single endpoint.- All async flushes integrate with Python's event loop, making them suitable for
structured concurrency patterns (
await asyncio.gather(...)).
Pytest drives the integration suite. Typical commands:
uv sync --group dev --group test
uv run pytest tests/test_basic.py -vv