Releases: IBM/eval-assist
Releases · IBM/eval-assist
v1.0.2
What's Changed
- Support for in context learning examples in the frontend
- Downloading the test case as a notebook now generates evalassist code, not unitxt
- Improved fix instance feature: new model with text difference visuallization
Full Changelog: v1.0.0...v1.0.2
v1.0.1
What's Changed
- Add system prompt to in-house judges by @martinscooper in #133
- Update documentation by @mclanza in #132
- fix: incorrect model used in borderline generation by @martinscooper in #134
More changes:
- Correct self_consistency attribute type
- Add idx to parser failures logging
- JSON parser: sanitize output only if parsing fails
- Add json object as the response format for litellm
- Convert persona prompt into message format
- Add more comments to sanitize_and_parse_json
- use logger instead of root_pkg_logger
- Format frontend code
- Add more logs to the parser
- Improve json sanitizer
- First benchmark updates after replacing langchain
- Update tests after sanitizer changes
New Contributors
Full Changelog: v0.3.2...v1.0.1
v1.0.0
!!Breaking changes were introduced in this version.
Changes:
- Criteria's
prediction_fieldwas renamed toto_evaluate_field. DirectInstanceandPairwiseInstancemodels were removed and unified under theInstancemodel.Instancenow just holds afieldsattribute.context,responseandresponsesfields were removed.- The logic behind how the text to be evaluated and how the context is evaluated was re-designed. Now, the criteria defines the role of each of the instance fields.
- Lanchain usage was heavily reduced and replaced by custom logic and the dependency will be removed soon.
- EvalAssist in-house judges prompts were changes: the prompts are now using system prompts and the message format.
- Lanchain's output fixer were replaced with custom logic.
- Synthetic instance generation was improved.
- More tests were added.
v0.3.2
v0.3.1
v0.3.0
What's Changed
- Pairwise with tie by @martinscooper in #129
Important:
- In-house DirectJudge's prompt was changes, so you may see slighly different (and better) results.
- Some types changes in order to accomodate Pairwise tie as a possible option. Moreover, the pairwise comparison result type was updated to accomodate both global results (selected option and explanation) and detailed results (all vs all strategy)
Full Changelog: v0.2.4...v0.3.0
v0.2.4
v0.2.3
What's Changed
- Add support for in-context examples by @martinscooper in #125
- Minor fixes by @martinscooper in #127
Full Changelog: v0.2.2...v0.2.3
v0.2.2
What's Changed
- Fix no event loop issue by @martinscooper in #122
- Improvements by @martinscooper in #123
- Add more judge test using patched inference engine by @martinscooper in #124
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- Add job. that runs run_judge* files by @martinscooper in #118
- Rate limit in judge examples by @martinscooper in #119
- main judge: Use coroutines to aparse responses by @martinscooper in #120
- Add evaluate_with_custom_prompt method by @martinscooper in #121
Full Changelog: v0.2.0...v0.2.1