Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@danielezhu
Copy link
Contributor

Description of changes:
This PR updates GetModelResponse so that it supports the semantic robustness use case where a model is invoked on the same input multiple times.

Currently, the instance attribute input_to_output_keys maps a model input key to a list representing the model response keys (where a response is of the form (model_output, log_probability) where both model_output and log_probability are optional).

This PR changes input_to_output_keys such that a model input key is mapped to a list of tuples, where each tuple is of the form (model_output, log_probability).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

@lucfra lucfra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're making changes to this part of the code I would strongly suggest to decouple generation and input probability computation (i.e. making two different functions).

This slipped in the first version of the library but I see no reason to compute them together in the same function. No eval algo except stereotyping needs the probability. And stereotyping only need the probability, not the output. So basically in all evals we're wasting a lot of compute.

Let me know if I'm missing something :)

@lucfra
Copy link

lucfra commented Mar 20, 2024

I see that actually we would need to change another part of the code. but still I think it's worthwhile at least having two different transforms if we don't want to change the predict method now

@danielezhu
Copy link
Contributor Author

danielezhu commented Mar 20, 2024

I see your point about having a separate transform for getting log probabilities, but note that GetModelResponse doesn't do extra computation to get the log probabilities. The model outputs and log probabilities are returned when we invoke the model runner's predict method, and GetModelResponse simply does a tiny bit of processing to parse the response payload. The time taken to execute the code that parses the response is trivial in comparison with the latency of the predict method.

@danielezhu danielezhu merged commit 520d510 into aws:main Mar 20, 2024
@danielezhu danielezhu deleted the multiple_invocations_same_input branch March 20, 2024 18:05
@lucfra
Copy link

lucfra commented Mar 21, 2024

The implementation of input log probability and generation are model/framework dependent. For example see this https://github.com/aws/fmeval/blob/main/examples/custom_model_runner_hf.ipynb for the HF model runner. There, the model is called twice. So are you referring to Jumpstart models? Also many models like those in Bedrock don't return the log prob so I think it would be much cleaner to have a NotImplemented or some similar error rather than returning a tuple with a None.
Also note that for stereotyping having the two things together may result in generating output that is unused (and this is computationally expensive).

Finally, it's pretty standard for other eval frameworks to separate the two. See the base class LM here https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/api/model.py#L275. Our log prob is called likelihood_rolling there https://github.com/EleutherAI/lm-evaluation-harness/blob/c7b03ad404784c8fb112eabf737b27829cbf0db8/lm_eval/api/model.py#L57. We don't have an equivalent of their likelihood function yet and I think it would be very nice to have in the future for unlocking some evals. Overall carrying around tuples rather than having separate transforms is uncomfortable and unclear and not much justified from usage perspective.

I'm fine to change this later on, but let's do it at a certain time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants