Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@houdini91
Copy link
Contributor

@houdini91 houdini91 commented Jul 27, 2021

Hi guys.
My first contribution here, sorry if i am missing something.

Add support of using directory type sources on the power-user command.

Following support have been addressed

  • Digest, contents and secret catalogers require read permissions, encountering a permission error fails the catalogers.
    Fix: Log the error (debug) instead of failing, skip ahead.
  • Digest and metadata cataloger will fail saying all files sent do not exists in resolver - cataloger fails.
    Reason: directoryResolver.AllLocations, directoryResolver.FilesByGlob returns locations that is not connected to the directory resolver file references.
    PR: Return locations that are connected to directory resolver file references.

Test

  • Added cli test test/cli/power_user_cmd_test.go, default-dir-results-w-pkg-coverage using the dir:test-fixtures/image-pkg-coverage source.
  • Added cli test test/cli/power_user_cmd_test.go, defaut-secrets-dir-results-w-reveal-values using the dir:test-fixtures/image-secrets source.
  • Unit test - location assertion changed from structural compare to string compare on the location real path field.
    (TestDirectoryResolverDoesNotIgnoreRelativeSystemPaths, TestClassifierCataloger_DefaultClassifiers_PositiveCases)

Open issue

  • Should directory resolver locations include the relative path in the virtual path fields and use the reference path for the real path field?

Hope minor patch helps.
Ps love your work.

@houdini91 houdini91 force-pushed the power-user-dir branch 2 times, most recently from a545caf to 6b921c8 Compare August 1, 2021 10:13
@houdini91
Copy link
Contributor Author

houdini91 commented Aug 1, 2021

Update: Added support for Filetree sharing between directory resolver - this feature is needed so that catalogers running concurrently can work on the same references.

Following issues are targeted

  • Share Filetree between resolver via source struct.
  • Thread save directory resolver indexing - Adding a lock around the indexing logic
  • Share indexed file information - copied shared tree references before indexing.
  • Fix file metadata GID,UID sample (Linux based only).
  • indexAllRoots skip duplication root indexing.

Test

  • Added shared directory unit test - TestDirectoryResolver_SharedTreeMultipleFilesByPath

Signed-off-by: Mikey Strauss <[email protected]>

Signed-off-by: houdini91 <[email protected]>
Signed-off-by: houdini91 <[email protected]>
* Shared directory resolver filetree

Signed-off-by: houdini91 <[email protected]>
@wagoodman
Copy link
Contributor

@houdini91 very nice addition!

I see the performance reason for introducing the shared trees between directoryResolver instances, and it's a good one. However, I think there is an opportunity to simplify that approach while still keeping the same performance. Instead of creating multiple instances of the directoryResolver and injecting a shared file tree as a dependency, it could be simpler to memoize the Source. FileResolver() method such that only one directoryResolver instance is created and cached for future calls. Additional benefit is that the FileTree remains an internal concern of the directoryResolver and there would no longer be a need to keep a mutex on Source.

What do you think about this as an alternative approach to the shared filetree?


uid := -1
gid := -1
if stat, ok := info.Sys().(*syscall.Stat_t); ok {
Copy link
Contributor

@wagoodman wagoodman Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch 👍

@houdini91
Copy link
Contributor Author

@wagoodman I totally agree, sharing a directoryResolver is probably simpler than sharing a FileTree pointer.
Originally thought about sharing the directoryResolver pointer but i was not sure how it will effect parts i have not yet explored.

Can you please clarify a bit on the Mutex comment?
It seems to me that even if source.FileResolver() creates only one directoryResolver (Stores pointer in struct) the initialization process still can create a race (Two threads initialization the first directoryResolver)
Eg:

// Called by catalogers child threads concurrently 
func (s Source) FileResolver(scope Scope) (FileResolver, error) {
	switch s.Metadata.Scheme {
	case DirectoryScheme:
                if s.dirResolver == nil {
                          // Not thread safe
		          resolver, err := newDirectoryResolver(s.Tree, s.Metadata.Path)
                          if err == nil {
                                 s.dirResolver = resolver
                          }
                }
                return s.dirResolver, err
...
...
  • Maybe we can initialize the directory resolver in the source New() constructor function? (Called by main thread)?
  • Maybe leave the Mutex and only replace the FileTree pointer with a directoryResolver pointer?

@wagoodman
Copy link
Contributor

Maybe we can initialize the directory resolver in the source New() constructor function? (Called by main thread)?

This wouldn't be ideal since any FileResolver calls may be passed different scopes. I realize that for the case of the directory resolve this doesn't technically matter (since a directory resolver is like having an image with a single layer, so different scope options have no effect), but doing resolver initialization in two places where it could be focused to one is where the concern is.

Maybe leave the Mutex and only replace the FileTree pointer with a directoryResolver pointer?

You're right, I overlooked this --the mutex for the resolver technically should already be there today. Let's leave the mutex 👍

* Use pointer to source struct

Signed-off-by: houdini91 <[email protected]>
@houdini91
Copy link
Contributor Author

houdini91 commented Aug 25, 2021

@wagoodman

it could be simpler to memoize the Source. FileResolver() method such that only one directoryResolver instance

Updated code to reflect change.

  • One issue i had was that the source struct was not passed around by value causing any pointer initialization not be reflected outside of the instance scope.
    I have updated code to use a pointer of the source struct instead.

@wagoodman wagoodman added the enhancement New feature or request label Sep 3, 2021
@wagoodman
Copy link
Contributor

@houdini91 this looks good, thanks for the updates! Only one last change to get past validations: https://github.com/anchore/syft/pull/467/checks?check_run_id=3508971289 a linter failure for a variable name. Once passing we'll merge this in.

Signed-off-by: houdini91 <[email protected]>
@houdini91
Copy link
Contributor Author

@wagoodman Fixed lint error,
your welcome It was my pleasure.

Copy link
Contributor

@luhring luhring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Approving per review by @wagoodman

@luhring luhring merged commit 2f99a35 into anchore:main Sep 8, 2021
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* Power-user directory source support
Signed-off-by: Mikey Strauss <[email protected]>

Signed-off-by: houdini91 <[email protected]>

* Remove newline

Signed-off-by: houdini91 <[email protected]>

* Shared filetree (#1)

* Shared directory resolver filetree

Signed-off-by: houdini91 <[email protected]>

* PR - change error ErrObserve to ErrPath

Signed-off-by: houdini91 <[email protected]>

* PR - share directory resolver
* Use pointer to source struct

Signed-off-by: houdini91 <[email protected]>

* Fix Lint

Signed-off-by: houdini91 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants