feat: update GetModelResponse transform to support multiple model invocations on the same input #220

danielezhu · 2024-03-19T23:51:04Z

Description of changes:
This PR updates GetModelResponse so that it supports the semantic robustness use case where a model is invoked on the same input multiple times.

Currently, the instance attribute input_to_output_keys maps a model input key to a list representing the model response keys (where a response is of the form (model_output, log_probability) where both model_output and log_probability are optional).

This PR changes input_to_output_keys such that a model input key is mapped to a list of tuples, where each tuple is of the form (model_output, log_probability).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ocations on the same input

lucfra

Since we're making changes to this part of the code I would strongly suggest to decouple generation and input probability computation (i.e. making two different functions).

This slipped in the first version of the library but I see no reason to compute them together in the same function. No eval algo except stereotyping needs the probability. And stereotyping only need the probability, not the output. So basically in all evals we're wasting a lot of compute.

Let me know if I'm missing something :)

lucfra · 2024-03-20T09:50:32Z

I see that actually we would need to change another part of the code. but still I think it's worthwhile at least having two different transforms if we don't want to change the predict method now

danielezhu · 2024-03-20T15:52:28Z

I see your point about having a separate transform for getting log probabilities, but note that GetModelResponse doesn't do extra computation to get the log probabilities. The model outputs and log probabilities are returned when we invoke the model runner's predict method, and GetModelResponse simply does a tiny bit of processing to parse the response payload. The time taken to execute the code that parses the response is trivial in comparison with the latency of the predict method.

lucfra · 2024-03-21T08:23:02Z

The implementation of input log probability and generation are model/framework dependent. For example see this https://github.com/aws/fmeval/blob/main/examples/custom_model_runner_hf.ipynb for the HF model runner. There, the model is called twice. So are you referring to Jumpstart models? Also many models like those in Bedrock don't return the log prob so I think it would be much cleaner to have a NotImplemented or some similar error rather than returning a tuple with a None.
Also note that for stereotyping having the two things together may result in generating output that is unused (and this is computationally expensive).

Finally, it's pretty standard for other eval frameworks to separate the two. See the base class LM here https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/api/model.py#L275. Our log prob is called likelihood_rolling there https://github.com/EleutherAI/lm-evaluation-harness/blob/c7b03ad404784c8fb112eabf737b27829cbf0db8/lm_eval/api/model.py#L57. We don't have an equivalent of their likelihood function yet and I think it would be very nice to have in the future for unlocking some evals. Overall carrying around tuples rather than having separate transforms is uncomfortable and unclear and not much justified from usage perspective.

I'm fine to change this later on, but let's do it at a certain time

Daniel Zhu and others added 2 commits March 19, 2024 16:46

feat: update GetModelResponse transform to support multiple model inv…

937646f

…ocations on the same input

Merge branch 'main' into multiple_invocations_same_input

1f13dd3

lucfra suggested changes Mar 20, 2024

View reviewed changes

oyangz approved these changes Mar 20, 2024

View reviewed changes

nathanng17 approved these changes Mar 20, 2024

View reviewed changes

danielezhu merged commit 520d510 into aws:main Mar 20, 2024

danielezhu deleted the multiple_invocations_same_input branch March 20, 2024 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: update GetModelResponse transform to support multiple model invocations on the same input #220

feat: update GetModelResponse transform to support multiple model invocations on the same input #220

Uh oh!

danielezhu commented Mar 19, 2024

Uh oh!

lucfra left a comment •

edited

Loading

Uh oh!

lucfra commented Mar 20, 2024

Uh oh!

danielezhu commented Mar 20, 2024 •

edited

Loading

Uh oh!

lucfra commented Mar 21, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: update GetModelResponse transform to support multiple model invocations on the same input #220

feat: update GetModelResponse transform to support multiple model invocations on the same input #220

Uh oh!

Conversation

danielezhu commented Mar 19, 2024

Uh oh!

lucfra left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucfra commented Mar 20, 2024

Uh oh!

danielezhu commented Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucfra commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lucfra left a comment •

edited

Loading

danielezhu commented Mar 20, 2024 •

edited

Loading

lucfra commented Mar 21, 2024 •

edited

Loading