Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Comments

feat: Add Revamped Artifact V2 Adapter with Storage Backend#74

Open
abhisek wants to merge 27 commits intomainfrom
feat/add-packageregistry-adapter-v2
Open

feat: Add Revamped Artifact V2 Adapter with Storage Backend#74
abhisek wants to merge 27 commits intomainfrom
feat/add-packageregistry-adapter-v2

Conversation

@abhisek
Copy link
Member

@abhisek abhisek commented Oct 13, 2025

  • feat: Add initial artifact v2 base implementation
  • fix: Add support for multiple ID strategy
  • feat: Add npm adapter with tests

Open with Devin

@abhisek abhisek requested a review from a team October 13, 2025 03:34
@abhisek abhisek force-pushed the feat/add-packageregistry-adapter-v2 branch from d3b7da7 to a61bde7 Compare October 14, 2025 12:13
@abhisek abhisek marked this pull request as ready for review October 29, 2025 08:29
Copilot AI review requested due to automatic review settings October 29, 2025 08:29
@safedep
Copy link

safedep bot commented Oct 29, 2025

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

This report is generated by SafeDep Github App

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a next-generation artifact adapter system (v2) for the package registry with improved storage abstraction, caching, and content-addressable design. The system provides a unified interface for fetching, storing, and managing package artifacts across different ecosystems.

Key Changes:

  • Extended storage interface with new methods (Exists, GetMetadata, List, Delete)
  • Implemented storage manager with multiple artifact ID strategies (convention, content-hash, hybrid)
  • Created NPM adapter v2 with HTTP mirror support and intelligent retry logic
  • Added archive utilities for tar.gz file operations with index caching
  • Implemented in-memory metadata store for artifact tracking

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
storage/storage.go Extended Storage interface with context-aware metadata and list operations
storage/gcs.go Implemented new storage interface methods for Google Cloud Storage
storage/fs.go Implemented new storage interface methods for filesystem storage
packageregistry/artifactv2/types.go Core type definitions for artifact adapter v2 system
packageregistry/artifactv2/storage.go Storage manager implementation with artifact ID strategies
packageregistry/artifactv2/npm_adapter.go NPM-specific artifact adapter with mirror support
packageregistry/artifactv2/metadata.go In-memory metadata store implementation
packageregistry/artifactv2/config.go Configuration system with functional options pattern
packageregistry/artifactv2/archive_utils.go Archive reading utilities with index caching
packageregistry/artifactv2/adapter_utils.go HTTP fetching utilities with retry and mirror logic
.tool-versions Go version update to 1.25.1
.github/workflows/go.yml Workflow file formatting improvements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings December 4, 2025 02:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

abhisek and others added 7 commits December 16, 2025 19:44
* feat: Add struct validation utils

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Abhisek Datta <[email protected]>

---------

Signed-off-by: Abhisek Datta <[email protected]>
Co-authored-by: Copilot <[email protected]>
* feat: Add support for container exec IO capture

* fix: Linter issues

* refactor: Test to use Go TDD
* feat: support token limit error handing

* Apply suggestions from code review

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Abhisek Datta <[email protected]>

---------

Signed-off-by: Abhisek Datta <[email protected]>
Co-authored-by: Abhisek Datta <[email protected]>
Co-authored-by: Copilot <[email protected]>
@github-actions
Copy link

vet Summary Report

This report is generated by vet

Policy Checks

  • ✅ Vulnerability
  • ✅ Malware
  • ✅ License
  • ✅ Popularity
  • ✅ Maintenance
  • ✅ Security Posture
  • ✅ Threats

Malicious Package Analysis

Malicious package analysis was performed using SafeDep Cloud API

Malicious Package Analysis Report
Ecosystem Package Version Status Report
  • ℹ️ 0 packages have been actively analyzed for malicious behaviour.
  • ✅ No malicious packages found.

Copilot AI review requested due to automatic review settings December 16, 2025 14:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View issue and 6 additional flags in Devin Review.

Open in Devin Review

Copilot AI review requested due to automatic review settings February 8, 2026 13:43
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 11 additional findings in Devin Review.

Open in Devin Review

Comment on lines +202 to +203
reqCtx, cancel := context.WithTimeout(ctx, config.Timeout)
defer cancel()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Single timeout context is shared across all retry attempts, making retries ineffective

The Timeout field is documented as "Timeout for each fetch attempt" (adapter_utils.go:24), but fetchHTTPWithMirrors creates a single context.WithTimeout at line 202 that spans ALL retry attempts including sleep delays between them.

Root Cause

At packageregistry/artifactv2/adapter_utils.go:202:

reqCtx, cancel := context.WithTimeout(ctx, config.Timeout)
defer cancel()

This creates one context for the entire retry loop. With default settings (Timeout: 30s, RetryAttempts: 3, RetryDelay: 1s with linear backoff), the first attempt uses most of the timeout budget. Subsequent attempts reuse the same reqCtx which may already be expired or nearly expired, especially after time.Sleep(delay) calls at line 217. The retry loop at lines 205-290 creates HTTP requests with this same reqCtx, so later retries will immediately fail with context.DeadlineExceeded.

Impact: Retries after the first attempt may be ineffective or fail immediately because the shared context has expired. For example, if the first attempt takes 25 seconds to fail and there's a 1-second retry delay, the second attempt only has ~4 seconds instead of the configured 30 seconds.

Prompt for agents
In packageregistry/artifactv2/adapter_utils.go, move the context.WithTimeout call inside the retry loop so each attempt gets its own fresh timeout. Replace the single reqCtx at line 202-203 with a per-attempt context created inside the for loop (after the sleep delay). Each iteration should create its own context: reqCtx, cancel := context.WithTimeout(ctx, config.Timeout), and cancel should be deferred or called at the end of each iteration. Alternatively, rename the Timeout field documentation from "Timeout for each fetch attempt" to "Timeout for the entire fetch operation including retries" if the current behavior is intended.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


checksum := ""
if attrs.MD5 != nil {
checksum = string(attrs.MD5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 GCS GetMetadata produces garbled checksum by converting raw MD5 bytes to string instead of hex-encoding

The GCS GetMetadata at storage/gcs.go:170 uses string(attrs.MD5) to convert the raw MD5 hash bytes to a string, producing non-printable binary characters instead of a hex-encoded checksum.

Root Cause

At storage/gcs.go:169-170:

if attrs.MD5 != nil {
    checksum = string(attrs.MD5)
}

GCS ObjectAttrs.MD5 is []byte containing the raw 16-byte MD5 hash. Using string() converts these raw bytes directly to a string with non-printable characters. The filesystem driver at storage/fs.go:129 correctly uses hex.EncodeToString(hash.Sum(nil)) to produce a human-readable hex string.

Impact: Any code comparing checksums across storage backends (or expecting hex-encoded checksums from ObjectMetadata.Checksum) will get garbled binary data from GCS instead of the expected hex string. Checksum comparisons will silently fail.

Suggested change
checksum = string(attrs.MD5)
checksum = hex.EncodeToString(attrs.MD5)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return err
}

keys = append(keys, relPath)
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List returns relPath using OS-specific separators. Other parts of the codebase build storage keys with path.Join (forward slashes), so on Windows this will return keys with \ and break prefix-based callers. Consider normalizing via filepath.ToSlash(relPath) before appending.

Suggested change
keys = append(keys, relPath)
keys = append(keys, filepath.ToSlash(relPath))

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +149
func (sm *storageManager) Store(ctx context.Context, info ArtifactInfo, reader io.Reader) (string, error) {
var artifactID string
var buf bytes.Buffer
var contentHash string

needsContentHash := sm.config.ArtifactIDStrategy == ArtifactIDStrategyContentHash ||
sm.config.ArtifactIDStrategy == ArtifactIDStrategyHybrid ||
sm.config.IncludeContentHash

if needsContentHash {
hash := sha256.New()
tee := io.TeeReader(reader, &buf)

if _, err := io.Copy(hash, tee); err != nil {
return "", fmt.Errorf("failed to compute hash: %w", err)
}

hashBytes := hash.Sum(nil)
contentHash = hex.EncodeToString(hashBytes[:8])
} else {
if _, err := io.Copy(&buf, reader); err != nil {
return "", fmt.Errorf("failed to read content: %w", err)
}
}

artifactID = generateArtifactID(info, sm.config.ArtifactIDStrategy, contentHash)

if sm.config.CacheEnabled {
exists, err := sm.Exists(ctx, artifactID)
if err == nil && exists {
return artifactID, nil
}
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageManager.Store fully buffers (and sometimes hashes) the artifact before checking the cache. For ArtifactIDStrategyConvention the ID can be generated without reading the content, so you can check Exists first and skip the expensive read when the artifact is already present.

Copilot uses AI. Check for mistakes.
Comment on lines 202 to 205
reqCtx, cancel := context.WithTimeout(ctx, config.Timeout)
defer cancel()

for attempt := 0; attempt <= config.RetryAttempts; attempt++ {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchHTTPWithMirrors creates a single reqCtx with config.Timeout for the whole retry loop, but the struct comment says the timeout is per fetch attempt. As written, earlier delays/retries consume the same deadline and later attempts can fail immediately. Create a fresh per-attempt context (and make the retry sleep respect ctx.Done()), or update the comment if total-timeout is intended.

Copilot uses AI. Check for mistakes.
Comment on lines 13 to +27
type Storage interface {
Put(key string, reader io.Reader) error
Get(key string) (io.ReadCloser, error)

// Exists checks if a key exists in storage
Exists(ctx context.Context, key string) (bool, error)

// GetMetadata retrieves metadata for a stored object
GetMetadata(ctx context.Context, key string) (*ObjectMetadata, error)

// List returns keys matching a prefix
List(ctx context.Context, prefix string) ([]string, error)

// Delete removes an object from storage
Delete(ctx context.Context, key string) error
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Storage interface mixes context-less methods (Put/Get) with context-aware methods (Exists/List/Delete/GetMetadata). This makes it impossible to propagate cancellation/timeouts for the most expensive operations on backends like GCS. Consider adding context.Context to Put/Get (or adding new PutCtx/GetCtx methods) for a consistent contract.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +171
checksum := ""
if attrs.MD5 != nil {
checksum = string(attrs.MD5)
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetMetadata, attrs.MD5 is raw bytes; converting it with string(attrs.MD5) will produce non-printable data and is not a stable textual checksum. Encode it (e.g., hex/base64) or use GCS-provided hash fields consistently with other backends (filesystem uses SHA256 hex).

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +182
func (a *npmAdapterV2) Exists(ctx context.Context, info ArtifactInfo) (bool, string, error) {
// Try to find by metadata first (more efficient)
if a.config.metadataEnabled && a.config.storageManager != nil {
// For Convention strategy, we can predict the artifact ID using common function
if a.config.artifactIDStrategy == ArtifactIDStrategyConvention {
// Use common ID generation function (single source of truth)
predictedID := generateArtifactID(info, ArtifactIDStrategyConvention, "")

exists, err := a.storage.Exists(ctx, predictedID)
if err == nil && exists {
return true, predictedID, nil
}
}

// For other strategies, we need to query metadata
// This is not implemented in the current MetadataStore interface
// but could be added via GetByPackage/GetByArtifact
}

return false, "", nil
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exists currently only attempts a predicted-ID check for Convention strategy and only when metadataEnabled is true; otherwise it returns (false, "", nil) even if the artifact is already in storage. This breaks adapter-level caching and also ignores the MetadataStore.GetByArtifact capability already present in the interface for looking up existing artifacts by (ecosystem,name,version).

Copilot uses AI. Check for mistakes.
Comment on lines +345 to +351
fileKey := path.Join(baseKey, fileInfo.Path)

// Stream file content directly to storage using LimitReader to avoid memory buffering
// LimitReader ensures we only read fileInfo.Size bytes from the tar stream
limitedReader := io.LimitReader(fileInfo.Reader, fileInfo.Size)

if err := store.Put(fileKey, limitedReader); err != nil {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path traversal risk during extraction: fileInfo.Path comes directly from the tar header and is joined into fileKey without validation. A malicious archive can use paths like ../artifact or absolute paths to escape baseKey and overwrite sibling keys. Sanitize/validate entry paths (reject absolute paths and any cleaned path starting with ..), and ensure the final key remains within baseKey.

Copilot uses AI. Check for mistakes.
Comment on lines 53 to 58
func applyFetchConfigDefaults(config *fetchConfig) {
if config.RetryAttempts == 0 {
config.RetryAttempts = defaultRetryAttempts
}
if config.RetryDelay == 0 {
config.RetryDelay = defaultRetryDelay
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchConfig doc says RetryAttempts of 0 means "no retries" (single attempt), but applyFetchConfigDefaults overwrites RetryAttempts==0 with the default (3). This makes it impossible to disable retries and contradicts the contract; treat 0 as a valid value and only default when the caller truly left it unset (e.g., use a pointer/optional or a separate boolean).

Copilot uses AI. Check for mistakes.
SHA256: sha256Hash,
Size: int64(len(content)),
FetchedAt: time.Now(),
StorageKey: computeStorageKeyFromID(artifactID, ""),
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storeArtifactWithMetadata records StorageKey using computeStorageKeyFromID(artifactID, ""), which ignores any configured storage prefix (WithStoragePrefix / StorageConfig.KeyPrefix). This will store incorrect metadata when a prefix is used; compute the key with the active prefix (or have StorageManager expose a helper for the effective key).

Suggested change
StorageKey: computeStorageKeyFromID(artifactID, ""),
StorageKey: "",

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +178
if sm.config.MetadataEnabled && sm.metadata != nil {
_, err := sm.metadata.Get(ctx, artifactID)
if err == nil {
return true, nil
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In StorageManager.Exists, returning true purely because metadata exists can lead to false positives (e.g., metadata present but artifact missing/corrupted in the underlying storage). This can cause cache hits where Get later fails. Consider checking both metadata and the storage backend (or making metadata a hint and still verifying storage existence).

Suggested change
if sm.config.MetadataEnabled && sm.metadata != nil {
_, err := sm.metadata.Get(ctx, artifactID)
if err == nil {
return true, nil
}
// Metadata can be consulted as a hint, but must not be treated as authoritative
// for existence; we always verify against the underlying storage backend.
if sm.config.MetadataEnabled && sm.metadata != nil {
_, _ = sm.metadata.Get(ctx, artifactID)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants