Codestin Search App

What's Changed

Support for in context learning examples in the frontend
Downloading the test case as a notebook now generates evalassist code, not unitxt
Improved fix instance feature: new model with text difference visuallization

Full Changelog: v1.0.0...v1.0.2

@martinscooper

What's Changed

Add system prompt to in-house judges by @martinscooper in #133
Update documentation by @mclanza in #132
fix: incorrect model used in borderline generation by @martinscooper in #134

More changes:

Correct self_consistency attribute type
Add idx to parser failures logging
JSON parser: sanitize output only if parsing fails
Add json object as the response format for litellm
Convert persona prompt into message format
Add more comments to sanitize_and_parse_json
use logger instead of root_pkg_logger
Format frontend code
Add more logs to the parser
Improve json sanitizer
First benchmark updates after replacing langchain
Update tests after sanitizer changes

New Contributors

@mclanza made their first contribution in #132

Full Changelog: v0.3.2...v1.0.1

!!Breaking changes were introduced in this version.

Changes:

Criteria's prediction_field was renamed to to_evaluate_field .
DirectInstance and PairwiseInstance models were removed and unified under the Instance model.
Instance now just holds a fields attribute. context, response and responses fields were removed.
The logic behind how the text to be evaluated and how the context is evaluated was re-designed. Now, the criteria defines the role of each of the instance fields.
Lanchain usage was heavily reduced and replaced by custom logic and the dependency will be removed soon.
EvalAssist in-house judges prompts were changes: the prompts are now using system prompts and the message format.
Lanchain's output fixer were replaced with custom logic.
Synthetic instance generation was improved.
More tests were added.

@martinscooper

What's Changed

Remove async parser by @martinscooper in #131

Full Changelog: v0.3.1...v0.3.2

@martinscooper

What's Changed

Several improvements by @martinscooper in #130

Full Changelog: v0.3.0...v0.3.1

@martinscooper

What's Changed

Pairwise with tie by @martinscooper in #129

Important:

In-house DirectJudge's prompt was changes, so you may see slighly different (and better) results.
Some types changes in order to accomodate Pairwise tie as a possible option. Moreover, the pairwise comparison result type was updated to accomodate both global results (selected option and explanation) and detailed results (all vs all strategy)

Full Changelog: v0.2.4...v0.3.0

@martinscooper

What's Changed

Async fix by @martinscooper in #128

Full Changelog: v0.2.3...v0.2.4

@martinscooper

What's Changed

Add support for in-context examples by @martinscooper in #125
Minor fixes by @martinscooper in #127

Full Changelog: v0.2.2...v0.2.3

@martinscooper

What's Changed

Fix no event loop issue by @martinscooper in #122
Improvements by @martinscooper in #123
Add more judge test using patched inference engine by @martinscooper in #124

Full Changelog: v0.2.1...v0.2.2

@martinscooper

What's Changed

Add job. that runs run_judge* files by @martinscooper in #118
Rate limit in judge examples by @martinscooper in #119
main judge: Use coroutines to aparse responses by @martinscooper in #120
Add evaluate_with_custom_prompt method by @martinscooper in #121

Full Changelog: v0.2.0...v0.2.1

Releases: IBM/eval-assist

v1.0.2

What's Changed

Uh oh!

v1.0.1

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.0

Uh oh!

v0.3.2

What's Changed

Contributors

Uh oh!

v0.3.1

What's Changed

Contributors

Uh oh!

v0.3.0

What's Changed

Important:

Contributors

Uh oh!

v0.2.4

What's Changed

Contributors

Uh oh!

v0.2.3

What's Changed

Contributors

Uh oh!

v0.2.2

What's Changed

Contributors

Uh oh!

v0.2.1

What's Changed

Contributors

Uh oh!