Starway aims to be an ultra-fast communication library, which features:
- Zero Copy, supports sending from pointer and receiving into pre-allocated buffer.
- RDMA support, by utilizing OpenMPI/OpenUCX for transportation.
- Ease of use, generally should work out of the box, and don't require much configuration efforts.
- Full-duplex, asynchronous API.
Current Python alternatives are lacking core features:
- ZeroMQ, no support for RDMA.
- MPI, hard to use, hard to setup environment properly.
Starway depends on OpenUCX, which must be linked dynamically. There are two options:
- Install OpenUCX system-wide, or whatever method you like as long as it can be found dynamically.
- Install
libucx-cu12wheel package, which containslibucx.sofiles that can be loaded by Starway during initialization.
We don't add libucx-cu12 as Starway Python dependency by default, but generally you can install it on your cluster machines as a fallback option: when libucx cannot be found in system, it would look for wheel installation.
You can use environment variable to control System/Wheel preference:
import os
os.environ["STARWAY_USE_SYSTEM_UCX"] = "false" # defaults to true
import starway # now we will use libucx-cu12 pypi wheel package, while falling back to system if not foundStarway now supports full-duplex communication, allowing for simultaneous, two-way data exchange between the client and server. This feature significantly improves the library's responsiveness and efficiency in handling real-time data streams.
Here's an example of how to use the full-duplex communication feature:
import asyncio
import time
import numpy as np
from starway import Client, Server
async def tester():
server = Server("127.0.0.1", 19198)
client = Client("127.0.0.1", 19198)
# Client to Server
send_buf_c2s = np.arange(10, dtype=np.uint8)
recv_buf_c2s = np.empty(10, dtype=np.uint8)
# Server to Client
send_buf_s2c = np.arange(20, dtype=np.uint8)
recv_buf_s2c = np.empty(10, dtype=np.uint8)
# Wait for client to connect
while not (clients := server.list_clients()):
time.sleep(0.1)
client_ep = clients[0]
# Perform concurrent send and receive
client_send_future = client.asend(send_buf_c2s, tag=1)
server_recv_future = server.arecv(recv_buf_c2s, tag=1, tag_mask=0xFFFF)
server_send_future = server.asend(client_ep, send_buf_s2c, tag=2)
client_recv_future = client.arecv(recv_buf_s2c, tag=2, tag_mask=0xFFFF)
await asyncio.gather(
client_send_future,
server_recv_future,
server_send_future,
client_recv_future
)
assert np.allclose(send_buf_c2s, recv_buf_c2s)
assert np.allclose(send_buf_s2c, recv_buf_s2c)
asyncio.run(tester())