Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit c6a116b

Browse files
authored
[#1118] Remove authorship tag validation regex (#1857)
GitHub's IDs have certain requirements to be valid, but Git does not have such requirements. This is evident in .gitconfig files, where the username can be set to anything the user may like. The current validation regex is based off GitHub's ID requirement. However, we are currently relying on Git author names found in .gitconfig files to attribute code authorship. This may result in missing code authorship attribution, when the Git author names do not match GitHub's ID requirements. Let's remove this restriction to allow any valid author names to be accepted and properly attributed.
1 parent 7dccf1f commit c6a116b

File tree

10 files changed

+84
-57
lines changed

10 files changed

+84
-57
lines changed

docs/ug/usingAuthorTags.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
If you want to override the code authorship deduced by RepoSense (which is based on Git blame/log data), you can use `@@author` tags to specify certain code segments that should be credited to a certain author irrespective of git history. An example scenario where this is useful is when a method was originally written by one author but a second author did some minor refactoring to it; in this case, RepoSense might attribute the code to the second author while you may want to attribute the code to the first author.
1616

1717
There are 2 types of `@@author` tags:
18-
- Start Tags (format: `@@author AUTHOR_GITHUB_ID`): A start tag indicates the start of a code segment written by the author identified by the `AUTHOR_GITHUB_ID`.
19-
- End Tags (format: `@@author`): Optional. An end tag indicates the end of a code segment written by the author identified by the `AUTHOR_GITHUB_ID` of the start tag.
18+
- Start Tags (format: `@@author AUTHOR_GIT_AUTHOR_NAME`): A start tag indicates the start of a code segment written by the author identified by the `AUTHOR_GIT_AUTHOR_NAME`.
19+
- End Tags (format: `@@author`): Optional. An end tag indicates the end of a code segment written by the author identified by the `AUTHOR_GIT_AUTHOR_NAME` of the start tag.
2020

2121
<box type="info" seamless>
2222

@@ -28,7 +28,7 @@ If an end tag is not provided, the code till the next start tag (or the end of t
2828
If an end tag is provided without a corresponding start tag, the code until the next start tag, the next end tag, or the end of the file, will not be attributed to any author. This should only be used if the code should not be attributed to any author.
2929
</box>
3030

31-
The `@@author` tags should be enclosed within a comment, using the comment syntax of the file in concern. Below are some examples:
31+
The `@@author` tags should be enclosed within a single-line comment, using the comment syntax of the file in concern. Below are some examples:
3232

3333
![author tags](../images/add-author-tags.png)
3434

@@ -46,11 +46,10 @@ Currently, the following comment formats are supported:
4646

4747
<box type="info" seamless>
4848

49-
First, RepoSense checks whether the line matches the supported comment formats. If the line does not match the formats,
50-
RepoSense treats it as a normal line. Else, it continues to check whether the GitHub username is in valid format.
49+
RepoSense checks whether the line matches the supported comment formats. If the line does not match the formats,
50+
RepoSense treats it as a normal line.
5151

52-
If the username is valid, the code till the next start tag, the end tag, or the end of file will be attributed to that author.
53-
Otherwise, the code will not be attributed to any author.
52+
The code until the next start tag, the end tag, or the end of file will be attributed to that author.
5453
</box>
5554

5655
Note: Remember to **commit** the files after the changes. (reason: RepoSense can see committed code only)

src/main/java/reposense/authorship/analyzer/AnnotatorAnalyzer.java

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,14 @@
2121
*/
2222
public class AnnotatorAnalyzer {
2323
private static final String AUTHOR_TAG = "@@author";
24-
// GitHub username format
25-
private static final String REGEX_AUTHOR_NAME_FORMAT = "^[a-zA-Z0-9](?:[a-zA-Z0-9]|-(?=[a-zA-Z0-9])){0,38}$";
26-
private static final Pattern PATTERN_AUTHOR_NAME_FORMAT = Pattern.compile(REGEX_AUTHOR_NAME_FORMAT);
27-
private static final String REGEX_AUTHOR_TAG_FORMAT = "@@author(\\s+[^\\s]+)?";
24+
private static final String REGEX_AUTHOR_TAG_FORMAT = "@@author(\\s+.*)?";
2825

2926
private static final String[][] COMMENT_FORMATS = {
30-
{"//", "\\s"},
27+
{"//", null},
3128
{"/\\*", "\\*/"},
32-
{"#", "\\s"},
29+
{"#", null},
3330
{"<!--", "-->"},
34-
{"%", "\\s"},
31+
{"%", null},
3532
{"\\[.*]:\\s*#\\s*\\(", "\\)"},
3633
{"<!---", "--->"}
3734
};
@@ -106,18 +103,23 @@ public static Optional<String> extractAuthorName(String line) {
106103
.map(l -> l.split(AUTHOR_TAG))
107104
.filter(array -> array.length >= 2)
108105
// separates by end-comment format to obtain the author's name at the zeroth index
109-
.map(array -> array[1].trim().split(COMMENT_FORMATS[getCommentTypeIndex(line)][1]))
106+
.map(array -> COMMENT_FORMATS[getCommentTypeIndex(line)][1] != null
107+
? array[1].trim().split(COMMENT_FORMATS[getCommentTypeIndex(line)][1])
108+
: new String[]{ array[1].trim() })
110109
.filter(array -> array.length > 0)
111110
.map(array -> array[0].trim())
112-
// checks if the author name is valid
113-
.filter(trimmedParameters -> PATTERN_AUTHOR_NAME_FORMAT.matcher(trimmedParameters).find());
111+
// checks if the author name is not empty
112+
.filter(trimmedParameters -> !trimmedParameters.isEmpty());
114113
}
115114

116115
/**
117116
* Generates regex for valid comment formats in which author tag is found, with {@code REGEX_AUTHOR_TAG_FORMAT}
118117
* flanked by {@code commentStart} and {@code commentEnd}.
119118
*/
120119
private static String generateCommentRegex(String commentStart, String commentEnd) {
120+
if (commentEnd == null) {
121+
return "^[\\s]*" + commentStart + "[\\s]*" + REGEX_AUTHOR_TAG_FORMAT + "[\\s]*$";
122+
}
121123
return "^[\\s]*" + commentStart + "[\\s]*" + REGEX_AUTHOR_TAG_FORMAT + "[\\s]*(" + commentEnd + ")?[\\s]*$";
122124
}
123125

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
[{"path":"README.md","fileType":"md","lines":[{"lineNumber":1,"author":{"gitId":"eugenepeh"},"content":"This is a test repository for [RepoSense](https://github.com/reposense/RepoSense)."}],"authorContributionMap":{"eugenepeh":1}},{"path":"_reposense/config.json","fileType":"json","lines":[{"lineNumber":1,"author":{"gitId":"Eugene Peh"},"content":"{"},{"lineNumber":2,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"about-us/**\", \"**index.html\"],"},{"lineNumber":3,"author":{"gitId":"Eugene Peh"},"content":" \"formats\": [\"html\", \"css\"],"},{"lineNumber":4,"author":{"gitId":"FH-30"},"content":" \"ignoreCommitsList\": [\"\", \"67890def\"],"},{"lineNumber":5,"author":{"gitId":"Eugene Peh"},"content":" \"authors\":"},{"lineNumber":6,"author":{"gitId":"Eugene Peh"},"content":" ["},{"lineNumber":7,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":8,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"alice\","},{"lineNumber":9,"author":{"gitId":"Eugene Peh"},"content":" \"displayName\": \"Alice T.\","},{"lineNumber":10,"author":{"gitId":"Eugene Peh"},"content":" \"authorNames\": [\"AT\", \"A\"],"},{"lineNumber":11,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"**.css\"]"},{"lineNumber":12,"author":{"gitId":"Eugene Peh"},"content":" },"},{"lineNumber":13,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":14,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"bob\""},{"lineNumber":15,"author":{"gitId":"Eugene Peh"},"content":" }"},{"lineNumber":16,"author":{"gitId":"Eugene Peh"},"content":" ]"},{"lineNumber":17,"author":{"gitId":"Eugene Peh"},"content":"}"}],"authorContributionMap":{"FH-30":1,"Eugene Peh":16}},{"path":"annotationTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"fakeAuthor"},"content":"fake all the lines in this file is writtened by fakeAuthor"},{"lineNumber":2,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":5,"author":{"gitId":"harryggg"},"content":"//@@author harryggg"},{"lineNumber":6,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":7,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":8,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":9,"author":{"gitId":"harryggg"},"content":"//@@author"},{"lineNumber":10,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":11,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":12,"author":{"gitId":"-"},"content":"//@@author -invalidGitUsername_TreatedAsUnknownUser"},{"lineNumber":13,"author":{"gitId":"-"},"content":"unknown"},{"lineNumber":14,"author":{"gitId":"-"},"content":"System.out.println(\"//@@author invalidAuthorLineFormat\"); unknown"},{"lineNumber":15,"author":{"gitId":"-"},"content":"unknown"},{"lineNumber":16,"author":{"gitId":"-"},"content":"//@@author"},{"lineNumber":17,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":18,"author":{"gitId":"fakeAuthor"},"content":"//@@author harryggg invalidAuthorLineFormat"},{"lineNumber":19,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":20,"author":{"gitId":"-"},"content":"//@@author"},{"lineNumber":21,"author":{"gitId":"-"},"content":"unknown"},{"lineNumber":22,"author":{"gitId":"-"},"content":"unknown"}],"authorContributionMap":{"fakeAuthor":9,"harryggg":5,"-":8}},{"path":"blameTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 3"}],"authorContributionMap":{"fakeAuthor":1,"harryggg":3}},{"path":"newFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"}],"authorContributionMap":{"harryggg":2}},{"path":"newPos/movedFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 4"}],"authorContributionMap":{"harryggg":4}},{"path":"space test.txt","fileType":"txt","lines":[{"lineNumber":1,"author":{"gitId":"chan-j-d"},"content":"1"}],"authorContributionMap":{"chan-j-d":1}}]
1+
[{"path":"README.md","fileType":"md","lines":[{"lineNumber":1,"author":{"gitId":"eugenepeh"},"content":"This is a test repository for [RepoSense](https://github.com/reposense/RepoSense)."}],"authorContributionMap":{"eugenepeh":1}},{"path":"_reposense/config.json","fileType":"json","lines":[{"lineNumber":1,"author":{"gitId":"Eugene Peh"},"content":"{"},{"lineNumber":2,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"about-us/**\", \"**index.html\"],"},{"lineNumber":3,"author":{"gitId":"Eugene Peh"},"content":" \"formats\": [\"html\", \"css\"],"},{"lineNumber":4,"author":{"gitId":"FH-30"},"content":" \"ignoreCommitsList\": [\"\", \"67890def\"],"},{"lineNumber":5,"author":{"gitId":"Eugene Peh"},"content":" \"authors\":"},{"lineNumber":6,"author":{"gitId":"Eugene Peh"},"content":" ["},{"lineNumber":7,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":8,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"alice\","},{"lineNumber":9,"author":{"gitId":"Eugene Peh"},"content":" \"displayName\": \"Alice T.\","},{"lineNumber":10,"author":{"gitId":"Eugene Peh"},"content":" \"authorNames\": [\"AT\", \"A\"],"},{"lineNumber":11,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"**.css\"]"},{"lineNumber":12,"author":{"gitId":"Eugene Peh"},"content":" },"},{"lineNumber":13,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":14,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"bob\""},{"lineNumber":15,"author":{"gitId":"Eugene Peh"},"content":" }"},{"lineNumber":16,"author":{"gitId":"Eugene Peh"},"content":" ]"},{"lineNumber":17,"author":{"gitId":"Eugene Peh"},"content":"}"}],"authorContributionMap":{"FH-30":1,"Eugene Peh":16}},{"path":"annotationTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"fakeAuthor"},"content":"fake all the lines in this file is writtened by fakeAuthor"},{"lineNumber":2,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":5,"author":{"gitId":"harryggg"},"content":"//@@author harryggg"},{"lineNumber":6,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":7,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":8,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":9,"author":{"gitId":"harryggg"},"content":"//@@author"},{"lineNumber":10,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":11,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":12,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"//@@author -invalidGitUsername_TreatedAsUnknownUser"},{"lineNumber":13,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"unknown"},{"lineNumber":14,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"System.out.println(\"//@@author invalidAuthorLineFormat\"); unknown"},{"lineNumber":15,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"unknown"},{"lineNumber":16,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"//@@author"},{"lineNumber":17,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":18,"author":{"gitId":"harryggg invalidAuthorLineFormat"},"content":"//@@author harryggg invalidAuthorLineFormat"},{"lineNumber":19,"author":{"gitId":"harryggg invalidAuthorLineFormat"},"content":"fake"},{"lineNumber":20,"author":{"gitId":"harryggg invalidAuthorLineFormat"},"content":"//@@author"},{"lineNumber":21,"author":{"gitId":"fakeAuthor"},"content":"unknown"},{"lineNumber":22,"author":{"gitId":"fakeAuthor"},"content":"unknown"}],"authorContributionMap":{"fakeAuthor":9,"-invalidGitUsername_TreatedAsUnknownUser":5,"harryggg":5,"harryggg invalidAuthorLineFormat":3}},{"path":"blameTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 3"}],"authorContributionMap":{"fakeAuthor":1,"harryggg":3}},{"path":"newFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"}],"authorContributionMap":{"harryggg":2}},{"path":"newPos/movedFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 4"}],"authorContributionMap":{"harryggg":4}},{"path":"space test.txt","fileType":"txt","lines":[{"lineNumber":1,"author":{"gitId":"chan-j-d"},"content":"1"}],"authorContributionMap":{"chan-j-d":1}}]

0 commit comments

Comments
 (0)