Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kriswest
Copy link
Contributor

resolves #946

Updates push parsing to extract commiter emails and to associate commits with users via email, rather than the git user.name config.

Note that:

The Git username is not the same as your GitHub username.

https://docs.github.com/en/get-started/git-basics/setting-your-username-in-git

and

GitHub uses your commit email address to associate commits with your account on GitHub.

https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-personal-account-on-github/managing-email-preferences/setting-your-commit-email-address

@netlify
Copy link

netlify bot commented Apr 11, 2025

Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name Link
🔨 Latest commit 5b9fe88
🔍 Latest deploy log https://app.netlify.com/projects/endearing-brigadeiros-63f9d0/deploys/68945033da7b2f0008630f96

@kriswest kriswest changed the title 946 associate commits by email fix: 946 associate commits by email Apr 11, 2025
@kriswest
Copy link
Contributor Author

Looks like I didn't check the CLI test cases and need to sort something out there

@github-actions github-actions bot added the fix label Apr 11, 2025
@kriswest
Copy link
Contributor Author

Fixed the failing tests but am waiting on a review (in git proxy), will update monday

@codecov
Copy link

codecov bot commented Apr 15, 2025

Codecov Report

❌ Patch coverage is 97.77778% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.25%. Comparing base (9913250) to head (5b9fe88).
⚠️ Report is 54 commits behind head on main.

Files with missing lines Patch % Lines
src/proxy/processors/push-action/parsePush.ts 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #973      +/-   ##
==========================================
+ Coverage   81.28%   83.25%   +1.97%     
==========================================
  Files          59       59              
  Lines        2458     2449       -9     
  Branches      279      280       +1     
==========================================
+ Hits         1998     2039      +41     
+ Misses        416      366      -50     
  Partials       44       44              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kriswest
Copy link
Contributor Author

Code coverage is as good as I can get it, will pick up a little more when #979 is merged

@kriswest
Copy link
Contributor Author

@JamieSlome @coopernetes @grovesy this is ready for a review. I think this will be important to supporting anything other than GitHub (alongside a fix for #950).

@kriswest
Copy link
Contributor Author

@JamieSlome This is the PR we discussed expediting the review of - it'll be needed for supporting git repositories other than Github (work that is going well this week - but will generate another fairly large PR soon).

Copy link
Contributor

@sam-holmes2 sam-holmes2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! LGTM aside from the conflicts, may need additional approval before being merged.

@kriswest
Copy link
Contributor Author

@jescalada @coopernetes @06kellyjac I've maintained this against the recent merges to main and it should be ready to go again.

One thing to note is that we're now doubling up on some of the tests, those in /test/testCheckUserPushPermission.test.js are integration tests, those in /test/processors/checkUserPushPermission.test.js are better unit tests (don't require data to be set up in the DB). However, I noted this latter set did NOT fail when there was an issue that was denying access to users that should have passed through. I've left both sets in place for now.

@kriswest
Copy link
Contributor Author

kriswest commented Jul 31, 2025

Scratch that, there are still type errors that have lead me to some code in main that may not be correct in getMissingData where, the committer and author information is being confused:

committer: entry.author_name || '',

I'd like to understand the use case for getMissingData - why do we need to pull the git log to get data that should be in the push? Why would commitData be null?

Next, checking the push permissions for the authors and committer happens before that data is retrieved (getMissingData being the last (Edit: later) step in the push chain) with checkuserPushPermissions specifically passing through pushes that are missing that data. Hence, it seems like you can just bypasss that check through the omission. That seems a serious flaw if true (I haven't had/won't have time to test that in the next few days - but it has me concerned).

Finally, I've been meaning to mention that fact that we have duplicate types in multiple places, e.g.:

  • src/types/models.ts
  • src/proxy/actions/Action.ts

@jescalada @coopernetes @06kellyjac Can you let me know your thoughts and confirm whether the above was discussed in review or overlooked please.

@kriswest
Copy link
Contributor Author

kriswest commented Jul 31, 2025

Looking closer at getMissingData I can see there is an attempt to validate a user has push permissions, but it is only the author of the last commit - I get that you don't see the committer in the git log - which is why I'm asking about the use case for that addition. Thats not the same as a check on committer (as per checkUserPushPermissions) nor is it ensuring that the all the authors were allowed (which we only check for 'legality' in checkAuthorEmails, right?, we don't require them all to have push perms).

@kriswest
Copy link
Contributor Author

With getMissingData and the exception in checkUserPushPermissions, I suspect I could construct an artificial push with missing commitData, cherry-pick a commit from someone approved to push into the first (or possibly last) slot authored by an approved committer and pass the validation of pushPermissions, without actually having that permission.

If thats the not the case, I'd love to se some comments added explaining.

@kriswest
Copy link
Contributor Author

With getMissingData and the exception in checkUserPushPermissions, I suspect I could construct an artificial push with missing commitData, cherry-pick a commit from someone approved to push into the first (or possibly last) slot authored by an approved committer and pass the validation of pushPermissions, without actually having that permission.

If thats the not the case, I'd love to see some comments added explaining.

@jescalada
Copy link
Contributor

@kriswest Thanks for letting me know about the getMissingData issues! It's meant to intercept certain cases where commitData is missing, such as:

  • Empty branch pushes
  • Fast-forward merges
  • New branches from unapproved/unpushed commits

This was the PR description for the fix:

This PR builds on the previous vulnerability fix https://github.com/jescalada/git-proxy-security-fixes/pull/5 and fixes an error when making a new branch from an unapproved commit.

The flow to reproduce the issue is:

  • Make a commit in branch a
  • Make a new branch b from that commit without approving it
  • Make a new commit in b, then approve it
  • Go back to a, and attempt to push this commit to the proxy

Alternatively: You can make multiple commits to branch a in 1), then continue the flow. The new code is capable of handling multiple unapproved prior commits.

This was causing an undefined error due to the commitData being empty. Git assumes that the proxy already has the required commit in its history (which is true). However, we don't actually store this commitData anywhere, and the pulled repo gets deleted in clearBareClone.

My solution was to create an additional action getMissingData which activates only when the commitData is empty. It fetches the commitData directly from the git log to populate the missing commits. It also fetches and validates the user data, which relies on the latest commit.

Changelog

  • Add getMissingData for fetching action.commitData and action.user for fast-forward pushes (or any push missing the commitData in the pack file)
  • Extend checkUserPushPermission action to export user validation logic
  • Patch parsePush and checkUserPushPermission actions to defer commitData-reliant checks to getMissingData if missing rather than erroring out

@jescalada
Copy link
Contributor

As for the checkUserPushPermission issue, validateUser gets deferred rather than skipped. I extracted the validateUser logic into its own function, which gets executed:

  • when the user is present:
  if (!user) {
    console.log('Action has no user set. This may be due to a fast-forward ref update. Deferring to getMissingData action.');
    return action;
  }

  return await validateUser(user, action, step);
  • or otherwise, when the commitData is missing (in other words, user must be obtained through git.log):
if (action.commitData && action.commitData.length > 0) {
    console.log('getMissingData', action);
    return action;
  }

  if (await isEmptyBranch(action)) {
    step.setError('Push blocked: Empty branch. Please make a commit before pushing a new branch.');
    action.addStep(step);
    step.error = true;
    return action;
  }
  console.log(`commitData not found, fetching missing commits from git...`);

  try {
    const path = `${action.proxyGitPath}/${action.repoName}`;
    const git = simpleGit(path);
    const log = await git.log({ from: action.commitFrom, to: action.commitTo });

    action.commitData = [...log.all].reverse().map((entry, i, array) => {
      const parent = i === 0 ? action.commitFrom : array[i - 1].hash;
      const timestamp = Math.floor(new Date(entry.date).getTime() / 1000).toString();
      return {
        message: entry.message || '',
        committer: entry. || '',
        tree: entry.hash || '',
        parent: parent || EMPTY_COMMIT_HASH,
        author: entry.author_name || '',
        authorEmail: entry.author_email || '',
        commitTimestamp: timestamp,
      }
    });
    console.log(`Updated commitData:`, { commitData: action.commitData });

    if (action.commitFrom === EMPTY_COMMIT_HASH) {
      action.commitFrom = action.commitData[action.commitData.length - 1].parent;
    }

    const user = action.commitData[action.commitData.length - 1].committer;
    action.user = user;
  } catch (e: any) {
    step.setError(e.toString('utf-8'));
  } finally {
    action.addStep(step);
  }
  return await validateUser(action.user || '', action, step);

@jescalada
Copy link
Contributor

If you're referring to the issue with the committer being set to entry.author_name, this was an oversight of mine when trying to map the git.log entries to the local Commit objects in commitData (I mistakenly assumed they'd always be the same).

A possible fix for this is to modify the git.log() call to extract the data:

const path = `${action.proxyGitPath}/${action.repoName}`;
const git = simpleGit(path);
const log = await git.log({ 
  from: action.commitFrom,
  to: action.commitTo,
  format: {
    hash: '%H',
    date: '%ad',
    message: '%s',
    author_name: '%an',
    author_email: '%ae',
    tree: '%T',
    parent: '%P',
    committer: '%cn',
  }
});

action.commitData = [...log.all].reverse().map((entry, i, array) => {
  const parent = i === 0 ? action.commitFrom : array[i - 1].hash;
  const timestamp = Math.floor(new Date(entry.date).getTime() / 1000).toString();
  return {
    message: entry.message || '',
    committer: entry.committer || '',
    tree: entry.hash || '',
    parent: parent || EMPTY_COMMIT_HASH,
    author: entry.author_name || '',
    authorEmail: entry.author_email || '',
    commitTimestamp: timestamp,
  }
});

There is another potential solution using git.raw() with a custom prettified output, but I think that one introduces safety concerns from injections.

@jescalada
Copy link
Contributor

I've looked at this in more detail: I don't recall 100%, but I'm pretty sure that the flow mentioned previously (making a new branch from an unapproved commit) was why I added the getMissingCommits action.

When reproducing this flow, I notice that @fabiovincenzi checkHiddenCommits action is preventing branches from unapproved commits by going through (and checkHiddenCommits executes first):

image

In other words, getMissingCommits may not be necessary anymore and can be safely deleted. It'd be great to have a specific set of commands to validate your suspicion that this flow isn't safe @kriswest. I'm also happy to figure out if the getMissingData can be safely removed - but we need to make sure that checkHiddenCommits is safe against non-linear pushes instead.

@kriswest
Copy link
Contributor Author

kriswest commented Aug 1, 2025

@jescalada thank you for looking into this. I wonder if we can create a test to investigate the issue, then leave it in place if getMissingData is removed/unreachable?

I'm out of the office today, but will should be able to think more on it next week.

I think the differentiation between authors and committers is important (to us as a large enterprise). We might have multiple authors contribute to a branch, then rely on a trusted committer and reviewer (who have fulfilled other policy requirements such as specified training or approval by leadership) to contribute that onwards to an external project. Hence, we'd be keen to make sure push are always tied to those doing the pushing, rather than the authors. Its is awkward that git log doesn't return the committer details by default, but you can pass args to change the format to include it (try the 'fuller' format or a custom one). That may be moot (if getMissingData is unreachable), but it is at least possible. All that being said, I imagine there must be committer details with the branch, even if the commitData is missing... perhaps its just not where we are usually extracting it from (on the commits themselves).

Finally, if getMissingData is removed, the exception in checkUserPushPermission must also go (as thats whats exploitable I think).

P.S. I imagine other orgs might want to break out authors and govern those as well (beyond the 'legality' of their email address), but that's a conversation for another time ;-).

@jescalada
Copy link
Contributor

@kriswest I've been doing manual testing with various kinds of merge commits and it seems the getMissingData never gets executed (only the first check for empty branches goes through).

As for the empty branch check, I made a unit test that covers this in parsePush.ts. Specifically by checking that commitData is empty after being processed here. I wasn't able to write extra passing unit tests for this, but I did find a few issues along the way while testing for cases with empty commitData:

  • Deleting branches via --delete is not supported
  • Pushing the first commit of a repo is not supported
    • Both cases error out on missing PACK file check

@jescalada
Copy link
Contributor

@kriswest Just merged in the getMissingData PR, this should be good to merge after the conflicts are solved. Sorry about that!

This will probably go into 2.0.0-rc.1, and #1043 can go into an rc.2 release to simplify debugging if needed. Would love to have #1043 ready to merge as well - I can take a final look once it's ready.

@kriswest
Copy link
Contributor Author

kriswest commented Aug 7, 2025

np @jescalada, conflicts resolved and ready for one last look!

I've had a go at resolving the conflicts between #973 and #1043 and have some commits lined up to resolve. I agree on the version numbering and can add the version number change to #1043 if desired.

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look at the code changes, and everything seems good to go!

@jescalada jescalada merged commit ddff723 into finos:main Aug 7, 2025
14 checks passed
@kriswest kriswest deleted the 946-associate-commits-by-email-rebase branch August 7, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Commits should be associated with users via email rather than the git user.name config

5 participants