add Directories config object #5011

thrau · 2021-11-25T17:18:13Z

this PR introduces a Directories object within the config that holds the localstack directories that we defined as follows:

infra: static infra (pre-seeded packages)
var: variable data (shared between host and container that should persist between reboots, e.g., lazy-loaded packages, ssl cert, decryption key, ...) -> changes semantic to "static" in offline image
shared_tmp: shared temp folder (shared between host and container, for things like temp lambda code files. needs to be available to e.g., lambda containers)
tmp: temp folder (ephemeral data, can be mounted from the host, but doesn't have to be)
data: data dir (has to be persistent on the host, shared with the container)
config: ~/.localstack/ (host configuration directory with, for, e.g., pre-configured environment variables or credentials)
init: user-defined container provisioning scripts
logs: log files

currently, most of these directories map to TMP_DIR, but that should change in the future. I was thinking to have most directories in the container under /var/lib/localstack, that way we can stick to one additional mount from the host (in addition to tmp), which simplifies docker compose configs that would otherwise need very fine grained mount binds to enable all the different directories.

The semantic of the tmp/shared tmp folder is currently not the way I described it in the doc strings, but I think it should be. We can discuss and iterate :)

The PR also changes the references throughout the code from the config constants to the global dirs object

Update November 26

The revised layout is:

static_libs: container only; binaries and libraries statically packaged with the image
var_libs:    shared; binaries and libraries+data computed at runtime: lazy-loaded binaries, ssl cert, ...
cache:       shared; ephemeral data that has to persist across localstack runs and reboots
tmp:         shared; ephemeral data that has to persist across localstack runs but not reboots
functions:   shared; volume to communicate between host<->lambda containers
data:        shared; holds localstack state, pods, ...
config:      host only; pre-defined configuration values, cached credentials, machine id, ...
init:        shared; user-defined provisioning scripts executed in the container when it starts
logs:        shared; log files produced by localstack

coveralls · 2021-11-25T17:58:26Z

Coverage increased (+0.05%) to 81.487% when pulling d2a5839 on refactor-directory-config into c39a0c1 on master.

github-actions · 2021-11-25T17:58:47Z

LocalStack integration with Pro

    3 files ±0     3 suites ±0 30m 30s ⏱️ + 1m 17s
694 tests ±0 676 ✔️ ±0 18 💤 ±0 0 ❌ ±0
826 runs ±0 795 ✔️ ±0 31 💤 ±0 0 ❌ ±0

Results for commit d2a5839. ± Comparison against base commit c39a0c1.

♻️ This comment has been updated with latest results.

alexrashed

Thanks! That's a great step towards a more structured (and semantically clear) directory separation. :)

alexrashed · 2021-11-26T08:38:35Z

localstack/config.py

+    localstack container, some live only on the host and some only in the container.
+
+    Attributes:
+        infra:      container only; static infrastructure: binaries and libraries packaged with the image


Actually, I think the infra folder is a subtype of the cache. It just happens to be pre-seeded when building the image (basically to have cache-hits on the first-run), but it will also be loaded on-demand if any of it isn't there when needed. So I think a more proper description would be "runtime only / individual; static infrastructure: binaries and libraries packaged with the image or loaded on-demand."
If my assumption here is correct it might as well just be a subfolder of the cache (since newly fetched infra content should persist across runs and reboots).

Good point, agreed. Given that these are static files baked into the image (in any case - online and offline version of the image), perhaps something like static_infra would be appropriate.

On that point, actually - I also noticed that after some recent changes some of the startup hooks which are managing artifacts in the ./infra folder (e.g., Azure API specs) are now downloaded on each startup of Pro. So, that would be a good candidate for the cached infra folder.

i agree with the semantics. is just a question of what can you mount into the container. if the static infra is a subtype of cache, you cannot mount in the cache directory from the host, as that would overwrite the subdirectory existing in the container, right?

alexrashed · 2021-11-26T08:39:50Z

localstack/config.py

+            infra=INSTALL_DIR_INFRA,
+            var="/var/lib/localstack/libs",
+            cache="/var/lib/localstack/cache",
+            tmp=TMP_FOLDER,  # TODO: move to /tmp/localstack
+            shared_tmp=HOST_TMP_FOLDER,  # TODO: move to /var/lib/localstack/tmp
+            data=DATA_DIR,  # TODO: move to /var/lib/localstack/data
+            config="/etc/localstack/",  # TODO: will be introduced once .localstack config file has been refactored
+            logs="/var/lib/localstack/logs",
+            init="/docker-entrypoint-initaws.d",


The folders which have string values here, and are not assigned with an already existing static, are the ones which are not used yet, right?
Which basically means that this PR is non-breaking.

correct! the ones that have static values are not used anywhere. once you start using them, any user code that relies on hardcoding of paths to get logs for example would fail.

alexrashed · 2021-11-26T08:49:25Z

localstack/config.py

+        /var/lib/localstack. Returns Localstack directory paths as they are defined within the container. Everything
+        shared and writable lives in /var/lib/localstack or /tmp/localstack.


The directory structure with /var and /etc would be problematic when we consider moving to a more permissive user configuration (i.e. using a specific lower-priviledged / non-root user), since localstack itself would not be allowed to write there anymore. Should we move this to the home directory (of the user running localstack in host-mode, or of the localstack user in the container)? On the other hand, considering the nature of LocalStack with all its low-level services (DNS server,...) it might be really challenging to restrict its permissions while supporting the full feature-set.

i may have done a poor job communicating there: /var/ and /etc will only be used inside the container. the local mount would be, e.g. ~/.cache/localstack -> /var/lib/localstack ~/.localstack -> /etc/localstack (ro)

Yeah, I know, I was thinking about restricting the permissions of localstack in the container itself. For example, OpenShift clusters do not allow any root uid usage (without an explicit security context constraints configured). So in the future it might be necessary to restrict localstack (also within the container) to not run as root.

thrau · 2021-11-26T11:18:24Z

an open question is how to deal with shared_tmp, which is very fuzzily defined.

it seems this is only used for the DOCKER_TASK_FOLDER, which is currently the folder in the lambda containers holding the lambda code /var/task. this is mounted from the HOST_TMP_FOLDER so /tmp/localstack -> /var/task (lambda container).
i would like to make this more explicit, but not sure how. is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack? because then we could introduce something like /var/lib/localstack/lambdas on the container, and selectively mount in the ~/.cache/localstack/lambda folder from the host into each lambda container.
is there a situation where the user needs to modify this data on the host? otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

any thoughts @dfangl @whummer ?

whummer

Great changes @thrau 👍 . Added a few minor comments - the most critical to me is how to deal with the existing HOST_TMP_FOLDER, as it is sort of a special case (should not get created inside the container..)

whummer · 2021-11-26T09:52:04Z

localstack/config.py

+    localstack container, some live only on the host and some only in the container.
+
+    Attributes:
+        infra:      container only; static infrastructure: binaries and libraries packaged with the image


Good point, agreed. Given that these are static files baked into the image (in any case - online and offline version of the image), perhaps something like static_infra would be appropriate.

On that point, actually - I also noticed that after some recent changes some of the startup hooks which are managing artifacts in the ./infra folder (e.g., Azure API specs) are now downloaded on each startup of Pro. So, that would be a good candidate for the cached infra folder.

whummer · 2021-11-26T11:03:48Z

localstack/config.py

+            var=TMP_FOLDER,  # TODO: add variable
+            cache=TMP_FOLDER,  # TODO: add variable
+            tmp=TMP_FOLDER,  # TODO: move default value to /tmp/localstack/host
+            shared_tmp=HOST_TMP_FOLDER,  # TODO: move default value to /tmp/localstack/shared


Not sure about this one - HOST_TMP_FOLDER is currently only the mapped folder from the host (required for mounting code into spawned Lambda containers). There should not be a need for it to be accessed or created directly inside the container (specifically, should not be created in mkdirs() in line 120 below). It is currently only used for setting the volume mount flag (-v ...:...) when spawning Lambda containers, with LAMBDA_REMOTE_DOCKER=false enabled.

Maybe this is just a matter of nomenclature - the shared_ prefix already indicates that this is one end of a shared folder. Maybe we just need to make it a bit more clear that this is the external (on the host) part of the shared tmp (e.g., shared_tmp_host), whereas the current tmp folder is the corresponding internal part of the shared folder.

how about we make this explicitly a lambda dir, that's shared only between the host and lambdas? (see my comment below)

whummer · 2021-11-26T11:11:37Z

localstack/utils/bootstrap.py

+    target_dirs = Directories.for_container()
+
+    # default shared directories
+    for name in ["var", "cache", "init", "logs"]:


nit: Maybe we could define this list as string constants (or a static list of strings) on the Directories class. (to maintain the folder names only in a single place)

whummer · 2021-11-26T11:12:33Z

localstack/utils/bootstrap.py

+        src = getattr(source_dirs, name, None)
+        target = getattr(target_dirs, name, None)
+        if src and target:
+            container.volumes.add((src, target))


Could we have overlapping volume mounts here that could become problematic, potentially?

yes that's definitely possible, should add a guard there.

actually we should just fall back to the behavior that docker provides for these cases, so if the user provides a wonky config then docker will complain.

dfangl · 2021-11-26T11:34:56Z

an open question is how to deal with shared_tmp, which is very fuzzily defined.

it seems this is only used for the DOCKER_TASK_FOLDER, which is currently the folder in the lambda containers holding the lambda code /var/task. this is mounted from the HOST_TMP_FOLDER so /tmp/localstack -> /var/task (lambda container). i would like to make this more explicit, but not sure how. is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack? because then we could introduce something like /var/lib/localstack/lambdas on the container, and selectively mount in the ~/.cache/localstack/lambda folder from the host into each lambda container. is there a situation where the user needs to modify this data on the host? otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

any thoughts @dfangl @whummer ?

I think in general caching would not be advisable in the long run, since a new folder is created for every created lambda, which can become quite much.
By default, if LAMBDA_REMOTE_DOCKER=1 (which is the default when run in docker), the code location of lambdas (wherever it might be) does not have to be exposed to the host, I do not think there are cases where you would need to modify this from the host.
We definitely need to consider https://docs.localstack.cloud/tools/lambda-tools/hot-swapping/ though, then it needs to be mounted to the host correctly.

whummer · 2021-11-26T11:42:29Z

it seems this is only used for the DOCKER_TASK_FOLDER, which is currently the folder in the lambda containers holding the lambda code /var/task. this is mounted from the HOST_TMP_FOLDER so /tmp/localstack -> /var/task (lambda container).
i would like to make this more explicit, but not sure how. is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack? because then we could introduce something like /var/lib/localstack/lambdas on the container, and selectively mount in the ~/.cache/localstack/lambda folder from the host into each lambda container.

is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack?

Yes, the lambda tmp folder will fill up with ephemeral data. If we add it to ~/.cache/localstack, we'd need to revisit some of our cleanup logic (currently Lambda folders can survive LS restarts, e.g., if the Lambda fails we keep the folder in certain circumstances). Which is not a big problem in /tmp, but should be considered if we move to ~/.cache. The other option would be to add a cleanup routine somewhere in the Lambda executor, to clean up some of the old Lambda handler folders (e.g., based on creation/modification timestamp - clean any folders that are older than X days).

is there a situation where the user needs to modify this data on the host? otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

Probably not. Modifying Lambda code on the host would only be relevant for the special S3 bucket name __local__ which mounts code into the Lambda containers, but not in any of the tmp lambda folders. 👍

otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

The shared docker volume is an interesting idea - I think we should definitely explore that 👍 . This could help us alleviate some of the issues that users are currently facing if HOST_TMP_FOLDER is not properly configured. The more I think about this, the more I like this idea, actually!

whummer · 2021-11-27T13:13:34Z

localstack/config.py

+            self.var_libs,
+            self.cache,
+            self.tmp,
+            self.functions,


@thrau I'm still not 100% clear on the naming of the functions config. Currently (prior to us moving to a Docker volume), this still corresponds to HOST_TMP_FOLDER - i.e., we should probably remove it here from this list (should not get created inside the container). (Note that this config could also be something like C:\temp under Windows.)

is it bad that this is created in the localstack container? i'm not too familiar with the way lambdas and in particular the code mounting works, but it seems to me to make sense that the localstack container can also access the shared lambda volume?

add Directories config object

debd495

thrau temporarily deployed to localstack-ext-tests November 25, 2021 17:18 Inactive

add more paths and better documentation

f555325

thrau temporarily deployed to localstack-ext-tests November 25, 2021 19:40 Inactive

thrau requested review from alexrashed and whummer November 25, 2021 22:40

alexrashed approved these changes Nov 26, 2021

View reviewed changes

whummer reviewed Nov 26, 2021

View reviewed changes

revise folder structure

d2a5839

thrau temporarily deployed to localstack-ext-tests November 26, 2021 12:46 Inactive

thrau merged commit b21178e into master Nov 26, 2021

thrau deleted the refactor-directory-config branch November 26, 2021 13:22

whummer reviewed Nov 27, 2021

View reviewed changes

thrau mentioned this pull request Jun 19, 2022

add new localstack filesystem hierarchy #6302

Merged

		/var/lib/localstack. Returns Localstack directory paths as they are defined within the container. Everything
		shared and writable lives in /var/lib/localstack or /tmp/localstack.

Uh oh!

add Directories config object #5011

add Directories config object #5011

Uh oh!

Conversation

thrau commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update November 26

Uh oh!

coveralls commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LocalStack integration with Pro

Uh oh!

alexrashed left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexrashed Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thrau commented Nov 26, 2021

Uh oh!

whummer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dfangl commented Nov 26, 2021

Uh oh!

whummer commented Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whummer Nov 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thrau commented Nov 25, 2021 •

edited

Loading

coveralls commented Nov 25, 2021 •

edited

Loading

github-actions bot commented Nov 25, 2021 •

edited

Loading

alexrashed Nov 26, 2021 •

edited

Loading

whummer commented Nov 26, 2021 •

edited

Loading

whummer Nov 27, 2021 •

edited

Loading