Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

pinakiz
Copy link

@pinakiz pinakiz commented Aug 27, 2025

Fix: allow . in github-identifier-pattern

This updates the regex in the schema so that repository identifiers
can include the . character. Previously, such identifiers were rejected.

Changelog:

  • Updated github-identifier-pattern regex to include . .

Github Issue: Fixes #7776

@pinakiz pinakiz requested a review from a team as a code owner August 27, 2025 17:32
@pinakiz pinakiz requested review from lotas, petemoore and matt-boris and removed request for a team August 27, 2025 17:32
Added a new file under changelog/ to document the fix for taskcluster#7776.
Copy link
Member

@petemoore petemoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @pinakiz!

Unfortunately the fix will need to be a bit more sophisticated. See line 5 and 6 above your change, it says

Specifically, the length limitation and the fact that identifiers can't contain dots . is critical.

The issue is that github repositories appear in AMQP routing keys, for example in Pull Request exchanges.

You see the github repository is the third (index 2) entry in the routing key, so having a . in the name would break the message binding.

In order to fix that we would need to change the routing key to be an encoded form of the github repository name, for example, which could also break existing clients and software that relies on the raw (unencoded) repository name being used in the routing key.

Your fix would have been fine, if it wasn't for this, but this rather complicates matters. Probably the real solution is for all routing key entries to escape the . somehow, but that would be a major change.

However, we appreciate your offer of a contribution here, and sorry that it did not work out this time!

Many thanks again.

@pinakiz
Copy link
Author

pinakiz commented Aug 28, 2025

@petemoore Thanks a lot for the detailed explanation 🙏. Makes perfect sense about the routing key issue , I definitely learned something new here! Looking forward to contributing again in the future.

@pinakiz pinakiz closed this Aug 28, 2025
@lotas
Copy link
Contributor

lotas commented Aug 28, 2025

hey @pinakiz I checked code as well, and found two places that say that dots are actually might be fine:

}, {
name: 'organization',
summary: 'The GitHub `organization` which had an event. ' +
'All periods have been replaced by % - such that ' +
'foo.bar becomes foo%bar - and all other special ' +
'characters aside from - and _ have been stripped.',
maxSize: 100,
required: true,
}, {
name: 'repository',
summary: 'The GitHub `repository` which had an event.' +
'All periods have been replaced by % - such that ' +
'foo.bar becomes foo%bar - and all other special ' +
'characters aside from - and _ have been stripped.',
maxSize: 100,
required: true,
},

And there's a sanitization that is being used in API:

// Strips/replaces undesirable characters which GitHub allows in
// repository/organization names (notably .)
function sanitizeGitHubField(field) {
return field.replace(/[^a-zA-Z0-9-_\.]/gi, '').replace(/\./g, '%');
}

So it might be that this restriction on the schema level can be lifted, but the challenge here would be to actually test it, and see if existing tests are covering all cases where this might happen.

Maybe we can reopen this and make sure sanitization is used everywhere

@pinakiz pinakiz reopened this Aug 28, 2025
@petemoore
Copy link
Member

Ah, I just saw in the description of repository in my link earlier:

The GitHub repository which had an event. All periods have been replaced by % - such that foo.bar becomes foo%bar - and all other special characters aside from - and _ have been stripped.

So indeed, tests here are key, to ensure the dots are replaced as desired, and anything consuming messages from the message bus are listening for the converted repository name, rather than the raw name.

An orthogonal concern from this comment above (not from the change in this PR) is about two different repositories sharing the same identifier (e.g. repositories foo and foo^ both being indexed under foo). If the repository name is used as a key, and isn't guaranteed to be unique, this could potentially be exploited, or just cause corrupted data (where records overwrite each other).

# all common identifiers. It's not personal, it's just that without these
# limitation, the identifiers won't be useful as routing keys in RabbitMQ
# topic exchanges. Specifically, the length limitation and the fact that
# identifiers can't contain dots `.` is critical.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should update this comment to tell that github service is doing sanitization and all dots are being replaced

# topic exchanges. Specifically, the length limitation and the fact that
# identifiers can't contain dots `.` is critical.
github-identifier-pattern: "^([a-zA-Z0-9-_%]*)$"
github-identifier-pattern: "^([a-zA-Z0-9-_.%]*)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please run yarn generate to update all autogenerated schemas as well? Or just let me know, I can run it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I didn't spot any tests that would cover such checks. Would you be comfortable checking if you could add some new tests for repositories with dots?
Might be tricky to setup data for the tests though, I suggest starting with api_test.js

Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I allow '%' ? Since ' . ' is later transformed into '%', allowing '%' could create ambiguity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image looks like github is already strict with repository names, I wonder how '%' got there in the first place

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git_Identifier_pattern also allows whitespace-only strings. Is this something we should be concerned about?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The github/builds API returns 500s if any repository in the return value contains a .
3 participants