[FEATURE REQUEST] Bad words list

I don't have access to the GPT-3 API yet (A guy can dream, eh?), but I have been reading through the docs and it seems like the completion module would be perfect for my use case _except_ for the exclusion of a "bad words list" feature.

This feature would not allow certain words to be generated in the completion output. I am aware of the `logit_bias` argument, but this only stops individual tokens from being generated. 
My idea would take an arbitrary string (Or list of token IDs) as input, and then not allow the completion of this string given the words before it.

I have successfully asked for this feature from the Huggingface .generate API many moons ago. Please see my feature request for a fuller run-down of how it could be implemented (link: https://github.com/huggingface/transformers/issues/3061).

It would be a useful feature for customers because it could give peace of mind that the models that they are serving are not going to output any unsavoury language. I can see that an alternative to this feature would just be to train the model not to output generally bad language (E.g. overly aggressive or xenophobic language) through thoughtful use of training data, but since everyone's definition of bad language is different, it would be nice to customise the model accordingly.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE REQUEST] Bad words list #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST] Bad words list #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions