-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
add Directories config object #5011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That's a great step towards a more structured (and semantically clear) directory separation. :)
localstack/config.py
Outdated
localstack container, some live only on the host and some only in the container. | ||
|
||
Attributes: | ||
infra: container only; static infrastructure: binaries and libraries packaged with the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the infra
folder is a subtype of the cache. It just happens to be pre-seeded when building the image (basically to have cache-hits on the first-run), but it will also be loaded on-demand if any of it isn't there when needed. So I think a more proper description would be "runtime only / individual; static infrastructure: binaries and libraries packaged with the image or loaded on-demand."
If my assumption here is correct it might as well just be a subfolder of the cache (since newly fetched infra content should persist across runs and reboots).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, agreed. Given that these are static files baked into the image (in any case - online and offline version of the image), perhaps something like static_infra
would be appropriate.
On that point, actually - I also noticed that after some recent changes some of the startup hooks which are managing artifacts in the ./infra folder (e.g., Azure API specs) are now downloaded on each startup of Pro. So, that would be a good candidate for the cached infra folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with the semantics. is just a question of what can you mount into the container. if the static infra is a subtype of cache, you cannot mount in the cache directory from the host, as that would overwrite the subdirectory existing in the container, right?
localstack/config.py
Outdated
infra=INSTALL_DIR_INFRA, | ||
var="/var/lib/localstack/libs", | ||
cache="/var/lib/localstack/cache", | ||
tmp=TMP_FOLDER, # TODO: move to /tmp/localstack | ||
shared_tmp=HOST_TMP_FOLDER, # TODO: move to /var/lib/localstack/tmp | ||
data=DATA_DIR, # TODO: move to /var/lib/localstack/data | ||
config="/etc/localstack/", # TODO: will be introduced once .localstack config file has been refactored | ||
logs="/var/lib/localstack/logs", | ||
init="/docker-entrypoint-initaws.d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The folders which have string values here, and are not assigned with an already existing static, are the ones which are not used yet, right?
Which basically means that this PR is non-breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct! the ones that have static values are not used anywhere. once you start using them, any user code that relies on hardcoding of paths to get logs for example would fail.
/var/lib/localstack. Returns Localstack directory paths as they are defined within the container. Everything | ||
shared and writable lives in /var/lib/localstack or /tmp/localstack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory structure with /var
and /etc
would be problematic when we consider moving to a more permissive user configuration (i.e. using a specific lower-priviledged / non-root user), since localstack itself would not be allowed to write there anymore. Should we move this to the home directory (of the user running localstack in host-mode, or of the localstack
user in the container)? On the other hand, considering the nature of LocalStack with all its low-level services (DNS server,...) it might be really challenging to restrict its permissions while supporting the full feature-set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i may have done a poor job communicating there: /var/ and /etc will only be used inside the container. the local mount would be, e.g. ~/.cache/localstack -> /var/lib/localstack ~/.localstack -> /etc/localstack (ro)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know, I was thinking about restricting the permissions of localstack in the container itself. For example, OpenShift clusters do not allow any root uid usage (without an explicit security context constraints configured). So in the future it might be necessary to restrict localstack (also within the container) to not run as root.
an open question is how to deal with it seems this is only used for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes @thrau 👍 . Added a few minor comments - the most critical to me is how to deal with the existing HOST_TMP_FOLDER
, as it is sort of a special case (should not get created inside the container..)
localstack/config.py
Outdated
localstack container, some live only on the host and some only in the container. | ||
|
||
Attributes: | ||
infra: container only; static infrastructure: binaries and libraries packaged with the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, agreed. Given that these are static files baked into the image (in any case - online and offline version of the image), perhaps something like static_infra
would be appropriate.
On that point, actually - I also noticed that after some recent changes some of the startup hooks which are managing artifacts in the ./infra folder (e.g., Azure API specs) are now downloaded on each startup of Pro. So, that would be a good candidate for the cached infra folder.
localstack/config.py
Outdated
var=TMP_FOLDER, # TODO: add variable | ||
cache=TMP_FOLDER, # TODO: add variable | ||
tmp=TMP_FOLDER, # TODO: move default value to /tmp/localstack/host | ||
shared_tmp=HOST_TMP_FOLDER, # TODO: move default value to /tmp/localstack/shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one - HOST_TMP_FOLDER
is currently only the mapped folder from the host (required for mounting code into spawned Lambda containers). There should not be a need for it to be accessed or created directly inside the container (specifically, should not be created in mkdirs()
in line 120 below). It is currently only used for setting the volume mount flag (-v ...:...
) when spawning Lambda containers, with LAMBDA_REMOTE_DOCKER=false
enabled.
Maybe this is just a matter of nomenclature - the shared_
prefix already indicates that this is one end of a shared folder. Maybe we just need to make it a bit more clear that this is the external (on the host) part of the shared tmp (e.g., shared_tmp_host
), whereas the current tmp
folder is the corresponding internal part of the shared folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we make this explicitly a lambda
dir, that's shared only between the host and lambdas? (see my comment below)
localstack/utils/bootstrap.py
Outdated
target_dirs = Directories.for_container() | ||
|
||
# default shared directories | ||
for name in ["var", "cache", "init", "logs"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we could define this list as string constants (or a static list of strings) on the Directories
class. (to maintain the folder names only in a single place)
localstack/utils/bootstrap.py
Outdated
src = getattr(source_dirs, name, None) | ||
target = getattr(target_dirs, name, None) | ||
if src and target: | ||
container.volumes.add((src, target)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have overlapping volume mounts here that could become problematic, potentially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's definitely possible, should add a guard there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we should just fall back to the behavior that docker provides for these cases, so if the user provides a wonky config then docker will complain.
I think in general caching would not be advisable in the long run, since a new folder is created for every created lambda, which can become quite much. |
Yes, the lambda tmp folder will fill up with ephemeral data. If we add it to
Probably not. Modifying Lambda code on the host would only be relevant for the special S3 bucket name
The shared docker volume is an interesting idea - I think we should definitely explore that 👍 . This could help us alleviate some of the issues that users are currently facing if |
self.var_libs, | ||
self.cache, | ||
self.tmp, | ||
self.functions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thrau I'm still not 100% clear on the naming of the functions
config. Currently (prior to us moving to a Docker volume), this still corresponds to HOST_TMP_FOLDER
- i.e., we should probably remove it here from this list (should not get created inside the container). (Note that this config could also be something like C:\temp
under Windows.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it bad that this is created in the localstack container? i'm not too familiar with the way lambdas and in particular the code mounting works, but it seems to me to make sense that the localstack container can also access the shared lambda volume?
this PR introduces a
Directories
object within the config that holds the localstack directories that we defined as follows:infra
: static infra (pre-seeded packages)var
: variable data (shared between host and container that should persist between reboots, e.g., lazy-loaded packages, ssl cert, decryption key, ...) -> changes semantic to "static" in offline imageshared_tmp
: shared temp folder (shared between host and container, for things like temp lambda code files. needs to be available to e.g., lambda containers)tmp
: temp folder (ephemeral data, can be mounted from the host, but doesn't have to be)data
: data dir (has to be persistent on the host, shared with the container)config
: ~/.localstack/ (host configuration directory with, for, e.g., pre-configured environment variables or credentials)init
: user-defined container provisioning scriptslogs
: log filescurrently, most of these directories map to TMP_DIR, but that should change in the future. I was thinking to have most directories in the container under /var/lib/localstack, that way we can stick to one additional mount from the host (in addition to tmp), which simplifies docker compose configs that would otherwise need very fine grained mount binds to enable all the different directories.
The semantic of the tmp/shared tmp folder is currently not the way I described it in the doc strings, but I think it should be. We can discuss and iterate :)
The PR also changes the references throughout the code from the config constants to the global
dirs
objectUpdate November 26
The revised layout is: