-
Couldn't load subscription status.
- Fork 48
Description
The Docker v3.12.4 image is > 4GB and keeps growing. To compare, the eclipse-temurin:17-jre bas image is only 90MB.
Docker images should only contain the required runtime libraries and the aim of containers is to be lightweight. Even on a modern system, it takes several minutes to download and extract the vitrivr/cineast image. This makes the deployment harder and leads to longer downtimes. Furthermore, if you use cloud native infrastructure, this could lead in higher costs due the bandwidth and storage requirements. Since all dependencies are compressed into two single JARs it's also impossible for Docker to cache or deduplicate image layers.
I took a look into the image: 2GB are used by resources (I'm not sure if there is any room for optimization) and the cineast-api.jar and cineast-cli.jar are about 1GB each. When you look into the JARs, you will see, that they share a large amount of data. If you further look into that data, most of it are native binaries (e.g. tensorflow or ffmpeg). It even contains the same libraries for multiple platforms, like x86, arm, windows, macos etc.
So here are some thoughts:
- Can the two JARs share a common dependency to avoid duplication?
- Are both JARs required in the context of Docker or would it make sense, two have to different images/tags for api and cli?
- The image runs only on
linux/amd64and therefore it should only contain the libraries forlinux/amd64. Is there a way to use targeted builds with gradle for Docker? In the future,docker buildxcould be used to support other platforms. - Alternately, use the OS package manager to install native libraries if available.
- This may also apply for the general release process of the JARs. This would also avoid any unexpected issues with platforms, that are not included in the JAR archives, e.g. anything else than glibc or older ARM platforms.