Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

sapkota-aayush
Copy link
Contributor

Optimize example Dockerfiles - Reduce image size by 60%

Fixed broken multi-stage builds in all 17 example Dockerfiles by changing FROM builder AS udf to FROM python:3.10-slim-bullseye AS udf. This removes build tools (gcc, git, build-essential, Poetry) from final images and only copies the virtual environment and application code.
Results
Before: ~971MB images (with build tools)
After: ~385MB images (clean runtime only)
Improvement: 60% size reduction (586MB saved per image)
Testing
Verified with examples/reduce/counter/Dockerfile - build tools successfully removed, functionality preserved.
Files Changed
All 17 example Dockerfiles across map, reduce, sink, source, and sourcetransform categories.
Closes #181

@sapkota-aayush
Copy link
Contributor Author

@vigith @kohlisid - I successfully reduced image size by 60% (from ~971MB to ~385MB) by fixing the broken multi-stage builds in all 17 Dockerfiles. The optimization removes build tools from final images while maintaining full functionality.
However, build times remain similar across both versions. Looking for suggestions on how to optimize build speed while keeping these size improvements.
The key change was switching from FROM builder AS udf to FROM python:3.10-slim-bullseye AS udf to properly separate build and runtime stages.

@vigith
Copy link
Member

vigith commented Jun 30, 2025

The jobs are failing with the following error, please run poetry lock

Using virtualenv: /home/runner/.cache/pypoetry/virtualenvs/pynumaflow-Fn3toN8H-py3.9
Installing dependencies from lock file

pyproject.toml changed significantly since poetry.lock was last generated. Run `poetry lock` to fix the lock file.
Error: Process completed with exit code 1.

pyproject.toml Outdated
aiorun = "^2023.7"
uvloop = "^0.19.0"
psutil = "^6.0.0"
numpy = "^1.26.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you want to add numpy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you want to add numpy?

Thank you for catching that! I had temporarily added numpy while testing one of the examples, but it’s not actually required for any of the examples or the project itself. I’ve now removed it from pyproject.toml and updated the lock file accordingly.

Copy link

codecov bot commented Jun 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.26%. Comparing base (42f9fbd) to head (05a5649).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #232   +/-   ##
=======================================
  Coverage   94.26%   94.26%           
=======================================
  Files          60       60           
  Lines        2441     2441           
  Branches      124      124           
=======================================
  Hits         2301     2301           
  Misses        101      101           
  Partials       39       39           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vigith
Copy link
Member

vigith commented Jun 30, 2025

could you please run black ?

@sapkota-aayush
Copy link
Contributor Author

could you please run black ?
Done

@vigith
Copy link
Member

vigith commented Jun 30, 2025

@kohlisid
Copy link
Contributor

kohlisid commented Jun 30, 2025

@sapkota-aayush You can use the inbuild make lint command for the repo

@sapkota-aayush
Copy link
Contributor Author

@sapkota-aayush You can use the inbuild make lint command for the repo
@vigith @kohlisid

I wasn’t able to run make lint directly because of some setup issues, but I manually fixed the formatting errors with Black and double-checked everything looks good now. That should take care of the lint errors.

Whenever you get a chance, maybe you could suggest the best way to run make lint on my setup?

Thanks!

@vigith
Copy link
Member

vigith commented Jul 1, 2025

looks like it ran successfully now.

@kohlisid
Copy link
Contributor

kohlisid commented Jul 1, 2025

You can try to setup a virtual env for the project, and install the required dependencies from the project toml file

@sapkota-aayush
Copy link
Contributor Author

Thanks @vigith Appreciate the tip @kohlisid — I’ll try setting up the virtual environment properly for smoother runs next time.

Also noticed that even after optimizing the Dockerfile size, each UDF still takes over 2 mins to build, which feels a bit much. Not sure if there’s a better way — maybe we could try caching or something else to speed it up?

Open to any suggestions — just trying to streamline things a bit. Appreciate the help as always!

@vigith
Copy link
Member

vigith commented Jul 1, 2025

did you look into #181 (comment) ?

@kohlisid
Copy link
Contributor

kohlisid commented Jul 1, 2025

@sapkota-aayush We can try to divide it into three multi stage layers

  1. Base
  2. Env setup
  3. Builder

Then for each we can look at specific optimizations

@sapkota-aayush
Copy link
Contributor Author

@sapkota-aayush We can try to divide it into three multi stage layers

  1. Base
  2. Env setup
  3. Builder

Then for each we can look at specific optimizations

Thanks for the suggestion Sid! Working on it, breaking it into Base, Env Setup, and Builder stages makes a lot of sense. Will follow up soon with progress and results. Appreciate the input!

@sapkota-aayush sapkota-aayush force-pushed the optimize-dockerfile-size branch 2 times, most recently from c72542b to d6cbcb7 Compare July 4, 2025 01:39
@sapkota-aayush sapkota-aayush force-pushed the optimize-dockerfile-size branch from d6cbcb7 to 6ef3117 Compare July 4, 2025 02:29
@sapkota-aayush
Copy link
Contributor Author

@vigith @kohlisid
I’ve optimized all the example Dockerfiles using a 3-layer multi-stage build, fixed some file and formatting issues, and removed unnecessary dependencies and tested them as well. The only one I didn’t fully update is examples/reduce/batchmap/flatmap/Dockerfile since it’s missing a pyproject.toml and looks incomplete. If you could run the tests and let me know if anything else needs fixing or if you have any feedback, that’d be great. Thanks!

@kohlisid
Copy link
Contributor

kohlisid commented Jul 7, 2025

@sapkota-aayush Have you tested the new images end-to-end by running in a pipeline?

@sapkota-aayush
Copy link
Contributor Author

@sapkota-aayush Have you tested the new images end-to-end by running in a pipeline?

Yes, I have tested the new images end-to-end by running them in a Numaflow pipeline on Kubernetes.
The pipeline was deployed successfully, all pods were running and healthy, and the custom Docker image was used in the workflow.
Please see the attached screenshot for details.
Let me know if you need any more information!
test3
test2
test1

@kohlisid kohlisid changed the title Reduce Docker image size by 60% using proper multi-stage builds chore: optimize example docker files using multi-stage builds Jul 8, 2025
@kohlisid kohlisid merged commit 9a390a5 into numaproj:main Jul 8, 2025
11 checks passed
@kohlisid
Copy link
Contributor

kohlisid commented Jul 8, 2025

Thanks for taking it up and the collaboration on this @sapkota-aayush :D

@sapkota-aayush
Copy link
Contributor Author

Thanks for taking it up and the collaboration on this @sapkota-aayush :D

Thank you @vigith and @kohlisid for your guidance. Learned a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize example Dockerfiles
3 participants