Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: IBM/eval-assist

v1.0.2

14 Nov 10:42

Choose a tag to compare

What's Changed

  • Support for in context learning examples in the frontend
  • Downloading the test case as a notebook now generates evalassist code, not unitxt
  • Improved fix instance feature: new model with text difference visuallization

Full Changelog: v1.0.0...v1.0.2

v1.0.1

29 Oct 21:39

Choose a tag to compare

What's Changed

More changes:

  • Correct self_consistency attribute type
  • Add idx to parser failures logging
  • JSON parser: sanitize output only if parsing fails
  • Add json object as the response format for litellm
  • Convert persona prompt into message format
  • Add more comments to sanitize_and_parse_json
  • use logger instead of root_pkg_logger
  • Format frontend code
  • Add more logs to the parser
  • Improve json sanitizer
  • First benchmark updates after replacing langchain
  • Update tests after sanitizer changes

New Contributors

Full Changelog: v0.3.2...v1.0.1

v1.0.0

27 Oct 16:37

Choose a tag to compare

!!Breaking changes were introduced in this version.

Changes:

  • Criteria's prediction_field was renamed to to_evaluate_field .
  • DirectInstance and PairwiseInstance models were removed and unified under the Instance model.
  • Instance now just holds a fields attribute. context, response and responses fields were removed.
  • The logic behind how the text to be evaluated and how the context is evaluated was re-designed. Now, the criteria defines the role of each of the instance fields.
  • Lanchain usage was heavily reduced and replaced by custom logic and the dependency will be removed soon.
  • EvalAssist in-house judges prompts were changes: the prompts are now using system prompts and the message format.
  • Lanchain's output fixer were replaced with custom logic.
  • Synthetic instance generation was improved.
  • More tests were added.

v0.3.2

15 Oct 02:46

Choose a tag to compare

What's Changed

Full Changelog: v0.3.1...v0.3.2

v0.3.1

14 Oct 17:39

Choose a tag to compare

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0

09 Oct 18:11

Choose a tag to compare

What's Changed

Important:

  • In-house DirectJudge's prompt was changes, so you may see slighly different (and better) results.
  • Some types changes in order to accomodate Pairwise tie as a possible option. Moreover, the pairwise comparison result type was updated to accomodate both global results (selected option and explanation) and detailed results (all vs all strategy)

Full Changelog: v0.2.4...v0.3.0

v0.2.4

30 Sep 23:08

Choose a tag to compare

What's Changed

Full Changelog: v0.2.3...v0.2.4

v0.2.3

22 Sep 15:10

Choose a tag to compare

What's Changed

Full Changelog: v0.2.2...v0.2.3

v0.2.2

15 Sep 20:02

Choose a tag to compare

What's Changed

Full Changelog: v0.2.1...v0.2.2

v0.2.1

12 Sep 14:21

Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1