Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@malob
Copy link
Contributor

@malob malob commented Feb 4, 2020

No description provided.

@joshmgross joshmgross added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Feb 19, 2020
@fosskers
Copy link

Looks like there's a conflict eh.

@malob
Copy link
Contributor Author

malob commented Mar 27, 2020

@fosskers, rebased off master so conflict is now fixed.

examples.md Outdated
name: Cache .stack-work
with:
path: .stack-work
key: ${{ runner.os }}-stack-work-${{ hashFiles('stack.yaml') }}-${{ hashFiles('package.yaml') }}-${{ '**/*.hs' }}
Copy link

@andys8 andys8 Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.hs are source files. Is it a good or bad idea to create hash over all sources?

With the restore keys other hashes should still match. But will this create lot's of caches and lead to issues?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yes, I'd disagree with that as well. In my own post on the matter I recommend just considering the hash of the stack.yaml.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting :) What about package.yaml? All dependencies are listed there (which seems to be the equivalent to package.json or pom.xml). My stack.yaml(.lock) mainly contains the resolver. With restore-keys it'll fall back to use a cache that is defined by the stack.yaml. Would you advice against using package.yaml (with hpack)?

Btw. I wanted to ask for feedback in my own PR, but mixed those two up ☺️ #236 (comment)

Copy link
Contributor Author

@malob malob Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fosskers, your post describes caching the ~/.stack folder, and I definitely agree having hashes in the key related to source files doesn't make any sense for that. The example I'm adding with this PR has a cache for both the global ~/.stack folder, and the project's .stack-work folder.

For the global cache (~/.stack), I agree with @andys8 that both stack.yaml and package.yaml should be included in the key, since changes to either will effect that cache.

For the project cache (.stack-work), I'm pretty sure we want to include all source files in the key since changes to those files are the main thing that change the contents of that cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there's a discussion to be had though about whether the example should include a cache for .stack-work.

I definitely find it helpful to include in my workflows, especially for large projects. I'm very keen for my workflows to run as quickly as possible, and not caching .stack-work means the workflow will always have the recompile the project from scratch.

If we are going to cache .stack-work then having the key include a hash of all the source files seems like the right call, since the contents of the cache will likely change if any of those files change. If we only include hashes for the .yaml configuration files, then the cache will only update when those files change, which is much less often than changes to the source files. As a result, the cache will become stale pretty quickly, and workflow runtimes will increase, since more of the project will need to be recompiled on each workflow run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked through my open PR and realized this was still pending. Any thoughts on my comments?

gasi added a commit to zoomhub/zoomhub that referenced this pull request Feb 28, 2021
gasi added a commit to zoomhub/zoomhub that referenced this pull request Feb 28, 2021
gasi added a commit to zoomhub/zoomhub that referenced this pull request Sep 6, 2021
@vsvipul
Copy link
Contributor

vsvipul commented Feb 21, 2022

@malob Looks good. I also have the same concern as @fosskers . Why do we need to include all source files in the key? If you could point to haskell docs which say that we need to compulsorily do that and we can't just use package.yaml and stack.yaml, that'll be great. I can review this further after getting that info.

@malob
Copy link
Contributor Author

malob commented Feb 21, 2022

@vsvipul, thanks for bumping this :)

If you could point to haskell docs which say that we need to compulsorily do that and we can't just use package.yaml and stack.yaml, that'll be great.

The Stack User Guide, is the best official documentation I could quickly find, but unfortunately it doesn't go into much depth about what the project specific working directory (which defaults the .stack-work in the project root contains).

None of this is compulsory, i.e., nothing will break/fail if the cache of the .stack-work folder is only updated when the package.yaml or stack.yaml files change, just like nothing will break/fail if nothing is every cached at all. However, the value of caching .stack-work (where the value of caching anything in this context comes from reduction in the time it takes for the workflow to build a Stack project) will be substantially diminished by not including the source files in the key, since changes to source files will invalidate some or all of the cache, even if the package.yaml or stack.yaml files are never changed. In fact, not including the source files in the key for the .stack-work cache could result in the workflow taking longer to run (compared to not caching .stack-work at all) in the case where most or all of the cache is invalid, since the time it takes for the cache action to run could very well take longer than the time saved by having the outdated cache present at all.

So my overall take here is something like (mostly restating what I said in my previous comments):

  1. the usefulness of caches comes down to the time they save;
  2. the time loading a cache saves you is directly related to how up to date that cache is;
  3. keeping a cache up to date requires updating that cache whenever files that effect the contents of that cache are modified;
  4. for Stack projects, changes to source files are one of the most frequent (if not the most frequent), reason the contents of .stack-work change; so
  5. it seems to me like it doesn't make much sense to cache .stack-work unless source files are included in the key.

It may very well be the case that some folks won't want to cache .stack-work because of how frequently it will be updated, and for large projects that could potentially use up the available space for caches quickly, so it may make sense to add some text to that effect along with the example, but if the example includes caching .stack-work (which I think it should since it can substantially reduce the time it takes workflows that build Stack projects to run) then I think including the source files in the key is the right way to go.


Maybe a concrete example would be helpful?

Here's a small Haskell project of mine: https://github.com/malob/prefmanager.

The first use of the cache action in my example in this PR would create a cache for the global ~/.stack folder:

cache/examples.md

Lines 106 to 112 in 44222d2

- uses: actions/cache@v1
name: Cache ~/.stack
with:
path: ~/.stack
key: ${{ runner.os }}-stack-global-${{ hashFiles('stack.yaml') }}-${{ hashFiles('package.yaml') }}
restore-keys: |
${{ runner.os }}-stack-global-

This cache will contain the GHC compiler required to build the project, as well as built versions of the external dependencies of the project (and some other stuff). The contents of this directory will only change if I make changes to the stack.yaml and package.yaml files.

For example:

  • If I add a new package to the dependency list in package.yaml that package (and it's dependencies) will be added to this cache.
  • If I change the resolver in stack.yaml, the version of GHC in the cache might change, as well as the versions of the external packages my project depends on.

(I think we all agree about the above, but I'm just including it for completeness.)

The second use of the cache action in my example in this PR would create a cache for the project's .stack-work folder:

cache/examples.md

Lines 113 to 119 in 44222d2

- uses: actions/cache@v1
name: Cache .stack-work
with:
path: .stack-work
key: ${{ runner.os }}-stack-work-${{ hashFiles('stack.yaml') }}-${{ hashFiles('package.yaml') }}-${{ '**/*.hs' }}
restore-keys: |
${{ runner.os }}-stack-work-

This is the working directory that Stack uses when building my project, and it contains build artifacts for the project.

I just built the prefmanager project on my machine. This is what the folder contains:

❯ tree .stack-work/
.stack-work/
├── dist
│   └── aarch64-osx-nix
│       └── Cabal-3.2.1.0
│           ├── build
│           │   ├── Defaults
│           │   │   ├── Pretty.dyn_hi
│           │   │   ├── Pretty.dyn_o
│           │   │   ├── Pretty.hi
│           │   │   ├── Pretty.o
│           │   │   ├── Types.dyn_hi
│           │   │   ├── Types.dyn_o
│           │   │   ├── Types.hi
│           │   │   └── Types.o
│           │   ├── Defaults.dyn_hi
│           │   ├── Defaults.dyn_o
│           │   ├── Defaults.hi
│           │   ├── Defaults.o
│           │   ├── Paths_prefmanager.dyn_hi
│           │   ├── Paths_prefmanager.dyn_o
│           │   ├── Paths_prefmanager.hi
│           │   ├── Paths_prefmanager.o
│           │   ├── Prelude.dyn_hi
│           │   ├── Prelude.dyn_o
│           │   ├── Prelude.hi
│           │   ├── Prelude.o
│           │   ├── autogen
│           │   │   ├── Paths_prefmanager.hs
│           │   │   └── cabal_macros.h
│           │   ├── libHSprefmanager-0.1.0.0-1iWRVnWMmZMJ686FXZvjSD-ghc8.10.7.dylib
│           │   ├── libHSprefmanager-0.1.0.0-1iWRVnWMmZMJ686FXZvjSD.a
│           │   └── prefmanager
│           │       ├── autogen
│           │       │   ├── Paths_prefmanager.hs
│           │       │   └── cabal_macros.h
│           │       ├── prefmanager
│           │       └── prefmanager-tmp
│           │           ├── Main.hi
│           │           ├── Main.o
│           │           ├── Paths_prefmanager.hi
│           │           └── Paths_prefmanager.o
│           ├── build-lock
│           ├── package.conf.inplace
│           │   ├── package.cache
│           │   ├── package.cache.lock
│           │   └── prefmanager-0.1.0.0-1iWRVnWMmZMJ686FXZvjSD.conf
│           ├── setup-config
│           ├── stack-build-caches
│           │   └── 22db420d881b88e3a9fab458cd4f7448ded2894604a37ea67e13cbd8a3d2f2b6
│           │       ├── exe-prefmanager
│           │       └── lib
│           ├── stack-cabal-mod
│           └── stack-setup-config-mod
├── install
│   └── aarch64-osx-nix
│       └── 22db420d881b88e3a9fab458cd4f7448ded2894604a37ea67e13cbd8a3d2f2b6
│           └── 8.10.7
│               ├── bin
│               │   └── prefmanager
│               ├── doc
│               │   └── prefmanager-0.1.0.0
│               │       └── LICENSE
│               ├── lib
│               │   └── aarch64-osx-ghc-8.10.7
│               │       ├── libHSprefmanager-0.1.0.0-1iWRVnWMmZMJ686FXZvjSD-ghc8.10.7.dylib
│               │       └── prefmanager-0.1.0.0-1iWRVnWMmZMJ686FXZvjSD
│               │           ├── Defaults
│               │           │   ├── Pretty.dyn_hi
│               │           │   ├── Pretty.hi
│               │           │   ├── Types.dyn_hi
│               │           │   └── Types.hi
│               │           ├── Defaults.dyn_hi
│               │           ├── Defaults.hi
│               │           ├── Paths_prefmanager.dyn_hi
│               │           ├── Paths_prefmanager.hi
│               │           ├── Prelude.dyn_hi
│               │           ├── Prelude.hi
│               │           └── libHSprefmanager-0.1.0.0-1iWRVnWMmZMJ686FXZvjSD.a
│               └── pkgdb
│                   ├── package.cache
│                   ├── package.cache.lock
│                   └── prefmanager-0.1.0.0-1iWRVnWMmZMJ686FXZvjSD.conf
├── stack.sqlite3
└── stack.sqlite3.pantry-write-lock

If I make changes to package.yaml or stack.yaml like I mentioned above, some or maybe all of the contents of this directory will change since a rebuild will be required due to changes to external package dependencies or the compiler used to build the project.

However, the contents of this directory will also change whenever I change a source file in my project. In this project, no other source files depend on app/Main.hs, as such, none of the other modules will be rebuilt if I make a change to Main.

For example, I just added a new function to Main , then rebuilt the project, and only the following files in .stack-work changed:

.stack-work/dist/aarch64-osx-nix/Cabal-3.2.1.0/build/prefmanager/prefmanager
.stack-work/dist/aarch64-osx-nix/Cabal-3.2.1.0/build/prefmanager/prefmanager-tmp/Main.hi
.stack-work/dist/aarch64-osx-nix/Cabal-3.2.1.0/build/prefmanager/prefmanager-tmp/Main.o
.stack-work/dist/aarch64-osx-nix/Cabal-3.2.1.0/stack-build-caches/22db420d881b88e3a9fab458cd4f7448ded2894604a37ea67e13cbd8a3d2f2b6/exe-prefmanager
.stack-work/install/aarch64-osx-nix/22db420d881b88e3a9fab458cd4f7448ded2894604a37ea67e13cbd8a3d2f2b6/8.10.7/bin/prefmanager

So let's say I setup this project on GitHub with a workflow that included the two cache actions from my example in this PR, as well as a step that build the project using Stack. The first time I pushed my code to GitHub, Stack would do all the work to setup the ~/.stack folder (download the required version of GHC and build all the required external dependencies etc.), then build my project which would result in .stack-work being generated containing all the build artifacts for my code, and the cache action would cache those two folders.

If I then pushed a new commit that only contained changes to the Defaults.Types module, Defaults.Pretty, Defaults, and Main would all be rebuilt since they depend either directly or indirectly on Defaults.Types, and the cache action would update the .stack-work cache.

If I then pushed another commit that only made a change to Main, then only Main would be rebuilt, since the .stack-work folder that the cache action put into place would already contain the correct build artifacts for all the other modules I just mentioned.

However, if the cache action for .stack-work didn't contain all the projects source files in the key, then the cache of .stack-work would not have been updated after the first commit (that only made changes to the Defaults.Types) was pushed, and so the entire project would be rebuilt when the second commit (that only made changes to Main) was pushed.


In the above example, that really won't be a big deal since this is a very small project, and so rebuilding the whole project takes very little time, but for much larger projects this can have a much bigger impact.

Imagine a project with dozens or hundreds of source files. If someone pushed a commit that included a change to a source file that required most or all of the project to be rebuilt, then a later commit made a change to (or added) say a test (that nothing else in the project depended on), if the cache action for .stack-work did not include the source files in the key, the entire project would be rebuilt every time the action ran until someone happened to change the package.yaml or stack.yaml files (since those are the only files that would cause the cache action to update the .stack-work cache). I've worked on such a project, and the difference in time it took the workflow to run was sometimes something like 2-5 minutes vs 20+ minutes.

Copy link
Contributor

@vsvipul vsvipul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malob Wow. Thank you for such a detailed explanation. Now I clearly understand it. The changes look good. Will go ahead and merge. Thanks for contributing.
🎉

@vsvipul vsvipul merged commit 29dbbce into actions:master Feb 22, 2022
@vsvipul
Copy link
Contributor

vsvipul commented Feb 22, 2022

@malob I didn't see the target branch and this ended up getting merged to master. We use "main" branch as the default branch now. Can you raise a new PR against that with same changes so I can merge? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants