-
Couldn't load subscription status.
- Fork 57
feat: add SaveStrategy to allow flexibility in saving localized evaluation outputs #281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
4de8c13 to
605814c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most comments are nits about docstring issues, but I also left some suggestions on the save strategy source code and unit tests. Thanks!
| }, | ||
| ) | ||
| ] * 3 | ||
| with patch.object(s3_client, "upload_part", return_value={"ETag": 1}) as upload_part, patch.object( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a side effect for upload_part where it returns {"ETag": "etag_1"}, {"ETag": "etag_2"}, {"ETag": "etag_3"}? Later, we can validate that self._parts_info contains these values.
| for _ in range(num_of_save_times): | ||
| save_strategy.save(records) | ||
| assert upload_part.call_count == 3 | ||
| assert complete_multipart_upload.call_count == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up from the comment above, at this point, self._parts_info[PARTS] should look like
[
{PART_NUMBER: 1, E_TAG: "etag_1"},
{PART_NUMBER: 1, E_TAG: "etag_2"},
{PART_NUMBER: 1, E_TAG: "etag_3"}
]
so we can use
complete_multipart_upload.assert_called_once_with(
Bucket=# mocked value,
Key=# mocked value,
UploadId="1234",
MultipartUpload={PARTS: # the list above},
)
* feat: add SaveStrategy to allow flexibility in saving localized evaluation outputs (#281) * feat: modify GeneratePrompt transform to take placeholder_dict (#288) * feat: modify GeneratePrompt transform to take placeholder_dict * fix: unit test * fix: requested changes --------- Co-authored-by: keerthanvasist <[email protected]> Co-authored-by: Xiaoyi Cheng <[email protected]>
Add
SaveStrategyto allow flexibility in saving localized evaluation outputsDescription of changes:
Introduces a new class
SaveStrategythat allows users define their own saving strategy for localized evaluation outputs. Due to the distributed nature of the computations. If the dataset is large, and all of the data is pulled to the head node, it might lead to OOM errors. In order to avoid that, the data is pulled in batches, andsavefunction is called on each batch at a time. In order to allow this mechanism, while allowing more flexbility in the way outputs are saved, this class works as aContextManager.This PR looks big, but is in essence a small change that reflects in every evaluation algorithm. The main change is in
save_strategy.pyandeval_algorithms/common.py(which are new files).Incidentally, this PR also updates ray version from
2.9.1to2.23.0By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.