Thanks to visit codestin.com
Credit goes to github.com

Skip to content

add Directories config object #5011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 26, 2021
Merged

add Directories config object #5011

merged 3 commits into from
Nov 26, 2021

Conversation

thrau
Copy link
Member

@thrau thrau commented Nov 25, 2021

this PR introduces a Directories object within the config that holds the localstack directories that we defined as follows:

  • infra: static infra (pre-seeded packages)
  • var: variable data (shared between host and container that should persist between reboots, e.g., lazy-loaded packages, ssl cert, decryption key, ...) -> changes semantic to "static" in offline image
  • shared_tmp: shared temp folder (shared between host and container, for things like temp lambda code files. needs to be available to e.g., lambda containers)
  • tmp: temp folder (ephemeral data, can be mounted from the host, but doesn't have to be)
  • data: data dir (has to be persistent on the host, shared with the container)
  • config: ~/.localstack/ (host configuration directory with, for, e.g., pre-configured environment variables or credentials)
  • init: user-defined container provisioning scripts
  • logs: log files

currently, most of these directories map to TMP_DIR, but that should change in the future. I was thinking to have most directories in the container under /var/lib/localstack, that way we can stick to one additional mount from the host (in addition to tmp), which simplifies docker compose configs that would otherwise need very fine grained mount binds to enable all the different directories.

The semantic of the tmp/shared tmp folder is currently not the way I described it in the doc strings, but I think it should be. We can discuss and iterate :)

The PR also changes the references throughout the code from the config constants to the global dirs object

Update November 26

The revised layout is:

static_libs: container only; binaries and libraries statically packaged with the image
var_libs:    shared; binaries and libraries+data computed at runtime: lazy-loaded binaries, ssl cert, ...
cache:       shared; ephemeral data that has to persist across localstack runs and reboots
tmp:         shared; ephemeral data that has to persist across localstack runs but not reboots
functions:   shared; volume to communicate between host<->lambda containers
data:        shared; holds localstack state, pods, ...
config:      host only; pre-defined configuration values, cached credentials, machine id, ...
init:        shared; user-defined provisioning scripts executed in the container when it starts
logs:        shared; log files produced by localstack

@thrau thrau temporarily deployed to localstack-ext-tests November 25, 2021 17:18 Inactive
@coveralls
Copy link

coveralls commented Nov 25, 2021

Coverage Status

Coverage increased (+0.05%) to 81.487% when pulling d2a5839 on refactor-directory-config into c39a0c1 on master.

@github-actions
Copy link

github-actions bot commented Nov 25, 2021

LocalStack integration with Pro

    3 files  ±0      3 suites  ±0   30m 30s ⏱️ + 1m 17s
694 tests ±0  676 ✔️ ±0  18 💤 ±0  0 ±0 
826 runs  ±0  795 ✔️ ±0  31 💤 ±0  0 ±0 

Results for commit d2a5839. ± Comparison against base commit c39a0c1.

♻️ This comment has been updated with latest results.

@thrau thrau temporarily deployed to localstack-ext-tests November 25, 2021 19:40 Inactive
@thrau thrau requested review from alexrashed and whummer November 25, 2021 22:40
Copy link
Member

@alexrashed alexrashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's a great step towards a more structured (and semantically clear) directory separation. :)

localstack container, some live only on the host and some only in the container.

Attributes:
infra: container only; static infrastructure: binaries and libraries packaged with the image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the infra folder is a subtype of the cache. It just happens to be pre-seeded when building the image (basically to have cache-hits on the first-run), but it will also be loaded on-demand if any of it isn't there when needed. So I think a more proper description would be "runtime only / individual; static infrastructure: binaries and libraries packaged with the image or loaded on-demand."
If my assumption here is correct it might as well just be a subfolder of the cache (since newly fetched infra content should persist across runs and reboots).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, agreed. Given that these are static files baked into the image (in any case - online and offline version of the image), perhaps something like static_infra would be appropriate.

On that point, actually - I also noticed that after some recent changes some of the startup hooks which are managing artifacts in the ./infra folder (e.g., Azure API specs) are now downloaded on each startup of Pro. So, that would be a good candidate for the cached infra folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with the semantics. is just a question of what can you mount into the container. if the static infra is a subtype of cache, you cannot mount in the cache directory from the host, as that would overwrite the subdirectory existing in the container, right?

Comment on lines 109 to 117
infra=INSTALL_DIR_INFRA,
var="/var/lib/localstack/libs",
cache="/var/lib/localstack/cache",
tmp=TMP_FOLDER, # TODO: move to /tmp/localstack
shared_tmp=HOST_TMP_FOLDER, # TODO: move to /var/lib/localstack/tmp
data=DATA_DIR, # TODO: move to /var/lib/localstack/data
config="/etc/localstack/", # TODO: will be introduced once .localstack config file has been refactored
logs="/var/lib/localstack/logs",
init="/docker-entrypoint-initaws.d",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The folders which have string values here, and are not assigned with an already existing static, are the ones which are not used yet, right?
Which basically means that this PR is non-breaking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct! the ones that have static values are not used anywhere. once you start using them, any user code that relies on hardcoding of paths to get logs for example would fail.

Comment on lines +103 to +104
/var/lib/localstack. Returns Localstack directory paths as they are defined within the container. Everything
shared and writable lives in /var/lib/localstack or /tmp/localstack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory structure with /var and /etc would be problematic when we consider moving to a more permissive user configuration (i.e. using a specific lower-priviledged / non-root user), since localstack itself would not be allowed to write there anymore. Should we move this to the home directory (of the user running localstack in host-mode, or of the localstack user in the container)? On the other hand, considering the nature of LocalStack with all its low-level services (DNS server,...) it might be really challenging to restrict its permissions while supporting the full feature-set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i may have done a poor job communicating there: /var/ and /etc will only be used inside the container. the local mount would be, e.g. ~/.cache/localstack -> /var/lib/localstack ~/.localstack -> /etc/localstack (ro)

Copy link
Member

@alexrashed alexrashed Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know, I was thinking about restricting the permissions of localstack in the container itself. For example, OpenShift clusters do not allow any root uid usage (without an explicit security context constraints configured). So in the future it might be necessary to restrict localstack (also within the container) to not run as root.

@thrau
Copy link
Member Author

thrau commented Nov 26, 2021

an open question is how to deal with shared_tmp, which is very fuzzily defined.

it seems this is only used for the DOCKER_TASK_FOLDER, which is currently the folder in the lambda containers holding the lambda code /var/task. this is mounted from the HOST_TMP_FOLDER so /tmp/localstack -> /var/task (lambda container).
i would like to make this more explicit, but not sure how. is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack? because then we could introduce something like /var/lib/localstack/lambdas on the container, and selectively mount in the ~/.cache/localstack/lambda folder from the host into each lambda container.
is there a situation where the user needs to modify this data on the host? otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

any thoughts @dfangl @whummer ?

Copy link
Member

@whummer whummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes @thrau 👍 . Added a few minor comments - the most critical to me is how to deal with the existing HOST_TMP_FOLDER, as it is sort of a special case (should not get created inside the container..)

localstack container, some live only on the host and some only in the container.

Attributes:
infra: container only; static infrastructure: binaries and libraries packaged with the image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, agreed. Given that these are static files baked into the image (in any case - online and offline version of the image), perhaps something like static_infra would be appropriate.

On that point, actually - I also noticed that after some recent changes some of the startup hooks which are managing artifacts in the ./infra folder (e.g., Azure API specs) are now downloaded on each startup of Pro. So, that would be a good candidate for the cached infra folder.

var=TMP_FOLDER, # TODO: add variable
cache=TMP_FOLDER, # TODO: add variable
tmp=TMP_FOLDER, # TODO: move default value to /tmp/localstack/host
shared_tmp=HOST_TMP_FOLDER, # TODO: move default value to /tmp/localstack/shared
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one - HOST_TMP_FOLDER is currently only the mapped folder from the host (required for mounting code into spawned Lambda containers). There should not be a need for it to be accessed or created directly inside the container (specifically, should not be created in mkdirs() in line 120 below). It is currently only used for setting the volume mount flag (-v ...:...) when spawning Lambda containers, with LAMBDA_REMOTE_DOCKER=false enabled.

Maybe this is just a matter of nomenclature - the shared_ prefix already indicates that this is one end of a shared folder. Maybe we just need to make it a bit more clear that this is the external (on the host) part of the shared tmp (e.g., shared_tmp_host), whereas the current tmp folder is the corresponding internal part of the shared folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we make this explicitly a lambda dir, that's shared only between the host and lambdas? (see my comment below)

target_dirs = Directories.for_container()

# default shared directories
for name in ["var", "cache", "init", "logs"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we could define this list as string constants (or a static list of strings) on the Directories class. (to maintain the folder names only in a single place)

src = getattr(source_dirs, name, None)
target = getattr(target_dirs, name, None)
if src and target:
container.volumes.add((src, target))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have overlapping volume mounts here that could become problematic, potentially?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's definitely possible, should add a guard there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we should just fall back to the behavior that docker provides for these cases, so if the user provides a wonky config then docker will complain.

@dfangl
Copy link
Member

dfangl commented Nov 26, 2021

an open question is how to deal with shared_tmp, which is very fuzzily defined.

it seems this is only used for the DOCKER_TASK_FOLDER, which is currently the folder in the lambda containers holding the lambda code /var/task. this is mounted from the HOST_TMP_FOLDER so /tmp/localstack -> /var/task (lambda container). i would like to make this more explicit, but not sure how. is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack? because then we could introduce something like /var/lib/localstack/lambdas on the container, and selectively mount in the ~/.cache/localstack/lambda folder from the host into each lambda container. is there a situation where the user needs to modify this data on the host? otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

any thoughts @dfangl @whummer ?

I think in general caching would not be advisable in the long run, since a new folder is created for every created lambda, which can become quite much.
By default, if LAMBDA_REMOTE_DOCKER=1 (which is the default when run in docker), the code location of lambdas (wherever it might be) does not have to be exposed to the host, I do not think there are cases where you would need to modify this from the host.
We definitely need to consider https://docs.localstack.cloud/tools/lambda-tools/hot-swapping/ though, then it needs to be mounted to the host correctly.

@whummer
Copy link
Member

whummer commented Nov 26, 2021

it seems this is only used for the DOCKER_TASK_FOLDER, which is currently the folder in the lambda containers holding the lambda code /var/task. this is mounted from the HOST_TMP_FOLDER so /tmp/localstack -> /var/task (lambda container).
i would like to make this more explicit, but not sure how. is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack? because then we could introduce something like /var/lib/localstack/lambdas on the container, and selectively mount in the ~/.cache/localstack/lambda folder from the host into each lambda container.

is this a folder that will quickly fill up with ephemeral data? then it makes sense to keep this in the /tmp/localstack folder, otherwise we can think about putting it into ~/.cache/localstack?

Yes, the lambda tmp folder will fill up with ephemeral data. If we add it to ~/.cache/localstack, we'd need to revisit some of our cleanup logic (currently Lambda folders can survive LS restarts, e.g., if the Lambda fails we keep the folder in certain circumstances). Which is not a big problem in /tmp, but should be considered if we move to ~/.cache. The other option would be to add a cleanup routine somewhere in the Lambda executor, to clean up some of the old Lambda handler folders (e.g., based on creation/modification timestamp - clean any folders that are older than X days).

is there a situation where the user needs to modify this data on the host? otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

Probably not. Modifying Lambda code on the host would only be relevant for the special S3 bucket name __local__ which mounts code into the Lambda containers, but not in any of the tmp lambda folders. 👍

otherwise we can think about simply creating a shared docker volume and circumvent the HOST_TMP_DIRECTORY in the first place.

The shared docker volume is an interesting idea - I think we should definitely explore that 👍 . This could help us alleviate some of the issues that users are currently facing if HOST_TMP_FOLDER is not properly configured. The more I think about this, the more I like this idea, actually!

@thrau thrau temporarily deployed to localstack-ext-tests November 26, 2021 12:46 Inactive
@thrau thrau merged commit b21178e into master Nov 26, 2021
@thrau thrau deleted the refactor-directory-config branch November 26, 2021 13:22
self.var_libs,
self.cache,
self.tmp,
self.functions,
Copy link
Member

@whummer whummer Nov 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thrau I'm still not 100% clear on the naming of the functions config. Currently (prior to us moving to a Docker volume), this still corresponds to HOST_TMP_FOLDER - i.e., we should probably remove it here from this list (should not get created inside the container). (Note that this config could also be something like C:\temp under Windows.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it bad that this is created in the localstack container? i'm not too familiar with the way lambdas and in particular the code mounting works, but it seems to me to make sense that the localstack container can also access the shared lambda volume?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants