-
Notifications
You must be signed in to change notification settings - Fork 553
Add stats #2924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stats #2924
Conversation
| System.out.println("unique terms: " + results.get("unique_terms")); | ||
| System.out.println("total terms: " + results.get("total_terms")); | ||
| System.out.println("physical location: " + indexPath.toAbsolutePath()); | ||
| System.out.println("total size on disk: " + new File(indexPath.toString()).length() + " bytes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want to gather this in Map<String, Object> results = IndexReaderUtils.getIndexStats(reader, args.field);? So that other calls to getIndexStats will also have these kv pairs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a much better way to handle it, I will fix it
|
Reorganizing of imports is fine. |
| long totalSize = findDirectorySize(indexPath); | ||
| results.put("total_size_disk", totalSize); | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I saw that -- have been making changes and will commit shortly to resolve some things
lintool
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please show a sample of the output.
| reader.close(); | ||
| } | ||
|
|
||
| public static long findDirectorySize(Path path) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two-space indent please.
| return String.format("%.1f %s", size, units[unitIndex]); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trim extra lines.
|
Also, please fix broken tests and add additional test case to cover new code. |
Change "physical location of index" to "index path" and you can align all the values? |
…ini into add-index-stats
lintool
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something still seems off... see comments?
| System.out.println("documents (non-empty): " + results.get("non_empty_documents")); | ||
| System.out.println("unique terms: " + results.get("unique_terms")); | ||
| System.out.println("total terms: " + results.get("total_terms")); | ||
| System.out.println("physical location of index: " + results.get("physical_location")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem right?
how about just "index_path" and "total_size"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, you are right -- I'll change them. As a side note, I will squash the commits and fix up the git history.
This PR adds
Sample of output:
Took a subset of the MS MARCO corpus (10,000 passages) and built an index. Then, I ran
bin/run.sh io.anserini.index.IndexReaderUtils -index indexes/msmarco-passage/small-index -statsOUTPUT: