-
Couldn't load subscription status.
- Fork 53
Open
Labels
Milestone
Description
A document to track the directions from 0.3, replacing #26. Our mid- and long-term goals, their [priority], (asignee) and any sub-tasks.
Any help is welcome with mentoring available for most tasks!
Remaining enhancements from v0.3
Will be updated after prioritization discussion.
Client-side protocols
Replace capnp RPC and the current monitoring dashboard HTTP API with common protocol.
Part of #11 (more discussion there) but specific to the public API.
- Design the API calls (@gavento) [medium]
- Implement in the server (@gavento) [medium]
- Update in the Python API (using aiohttp for async API) (@gavento) [medium]
- Update the dashboard (@gavento) [medium] (Improved live and post-mortem monitoring #38)
Improve the dashboard with more information and post-mortem analysis
- Design and revamp the dashborad. Depends on the client API development (@gavento) [medium/low] (Improved live and post-mortem monitoring #38)
- Include stats for task/object groups and possibly names/labels from Add task/data object groups and names, clarify names #32 [low] (Improved live and post-mortem monitoring #38)
Fix current bugs
- Rare server crash (parallel inter-task dependencies + other conditions) #7 (occurs under heavy load only) [medium]
- Worker crashes with 'DataObject is not finished' #13 (seems to be bound to Exoscale deployment) [high]
Custom tasks (subworkers) in more languages
- Python subworker as a library [low] (run standalone scripts as opposed to defining them in the client only)
Easier deployment in the cloud
- Deployment in the amazon cloud (@vojtechcima) [medium] (Startup and control scripts for CloudStack and others #37)
Packaging for easier deployment
Multiple options, priorities may vary. (@spirali)
- AppImage/Snap packages [low] (we already have static binaries)
- Snapcraft has a rust plugin
- Deb/other distro packages [low]
- There is cargo-deb
Improve Python API
Pythonize the client API.
- Draft content-type loaders/extensions (@gavento) [low]
- Task/object groups and names/labels (Add task/data object groups and names, clarify names #32) [low]
Improve testing infrastructure
- Scripts/containers/... to test deployment and running in a network. (@vojtechcima) [medium]
- Test
rain startand running on OpenStack, Exoscale, AWS. Does not have to be a part of CI (even for running locally). Depends on / part of Startup and control scripts for CloudStack and others #37.
- Test
More real-world code examples
Lower priority, best based on real use-cases. Ideas: numpy subtasks, C++/Rust subworkers
Enhancements to revisit in the (not so distant) future
- Integration with some popular libraries
- Apache Arrow content-type
- Basic type and loading is implemented. We could add more operations (filter, split, merge, ...)
- XGBoost tasks, etc ...
- Why not now: Not clear what would be the demand
- Apache Arrow content-type
- Worker configuration files (needed for common (CPU) and special resources (GPU), different subworker locatins and configurations, ...)
- Partially done
- Why not now: Needs to be thought-through (esp. w.r.t. resources), not needed now
- Separate session construction and running (save/load session)
- Why not now: Not clear what would be the use-cases, not difficult when API stabilized
- Clients in other languages: Rust, C++, Java, ...
- Why not now: Not clear what would be the demand. Easier after the protocol/Python API stabilization.
- Scale the scheduler, benchmarks
- There is a benchmark in
utils/bench/simple_task_scaling.py. The results as of 0.2 are here. - Why not now: While eventually crucial, the scheduler is sufficient when there are <1000 tasks to be scheduled at once.
- There is a benchmark in