Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Introduce modular files for speech models #35902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Apr 4, 2025

Conversation

nikosanto13
Copy link
Contributor

@nikosanto13 nikosanto13 commented Jan 27, 2025

What does this PR do?

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker @Cyrilvallez

Additional details

  • Added modular files for models that have heavy duplication with classes from modeling_wav2vec2.py: Hubert, WavLM, Data2VecAudio, Wav2Vec2Conformer, Wav2Vec2Bert, UniSpeech, UniSpeechSat
  • Added some modifications on the modular converter script, from issues that came up during writing the above modular scripts (see inline comments for justification)

"""
for assignment, node in assignments.items():
should_keep = any(re.search(pattern, assignment) for pattern in ASSIGNMENTS_REGEX_TO_KEEP)

# If it's a DOCSTRING var and is assigned to None, the parent's docstring is kept.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this because for many of the models I've used, their docstring was kinda custom (e.g. contained link to original paper). So instead of just copying the docstring from modular file, I figured it would be best to adopt this hybrid approach.

If you agree with the change, I should also update the modular docs: https://github.com/huggingface/transformers/blob/main/docs/source/en/modular_transformers.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, I don't really get here. This is already the actual behavior to have the docstring use the parent if it's None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I wanted to say "instead of copying the docstring from the parent ..." (my comment on the code is also kinda obscure)
Essentially, now there are two possibilities:

  • either set MYMODEL_INPUT_DOCSTRING = None, in which case the assignment will be copied by the parent (as it is already the case)
  • or set it to something else (new docstring), and the assignment will be copied from the modular file

So it is more flexible than the existing approach.

new_node = node.with_changes(body=node.body.with_changes(body=new_statements))
imports_to_keep.append(new_node)
existing_protected_statements.update({str(stmt) for stmt in new_statements})
import_statements = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this beacuse the code before had problematic behaviour for "safe" imports that had multiple other statements inside them, e.g. L381:395 on modeling_wav2vec2.py

if is_deepspeed_zero3_enabled():
    import deepspeed

    with deepspeed.zero.GatheredParameters(self.conv.weight, modifier_rank=0):
 ...

The whole block after the import statement would be displaced in the top of the new modeling script (in the import statements).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's one of the current limitations. However, removing everything else does not seem like a good solution either. Could not wrap my mind around a nice rule for this. For now, the best is maybe to patch the original modeling file to dissociate safe import and other logic? Would that require a lot of change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's do it like this, thanks

Copy link
Contributor Author

@nikosanto13 nikosanto13 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, in the example above it would be better to move:

if is_deepspeed_zero3_enabled():
    import deepspeed

outside of constructor, because in the current state the newly created module (prior to running ruff inside modular converter) would have two such statements, and the first one would become:

if is_deepspeed_zero3_enabled():
   pass

after the run_ruff call.

But if we move it top-side, deepspeed would no longer be lazily imported. I think this is not a problem, right?

@Rocketknight1
Copy link
Member

cc @ArthurZucker @qubvel

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for the contribution! I just looked at the modular part, let me know if something is unclear!! 🤗

Comment on lines 58 to 70
# Exclude names to prevent edge cases where we want to keep a name that may
# exist in the mapping, e.g. `Wav2Vec2BaseModelOutput` where `Wav2Vec2` is
# a "base" model identifier but we want the type to pass as is in the produced modeling file
EXCLUDE_NAMES = ["Wav2Vec2BaseModelOutput"]


def preserve_case_replace(text, patterns: dict, default_name: str):
# Create a regex pattern to match all variations
regex_pattern = "|".join(re.escape(key) for key in patterns.keys())
compiled_regex = re.compile(f"(?<![a-z0-9])({regex_pattern})(.|$)", re.IGNORECASE | re.DOTALL)

# Create exclude pattern
exclude_pattern = "|".join(re.escape(key) for key in EXCLUDE_NAMES)
compiled_regex = re.compile(f"(?<![a-z0-9])(?!{exclude_pattern})({regex_pattern})(.|$)", re.IGNORECASE | re.DOTALL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a fan of having exclusions here. And the regex is already way too complicated 🥲 Moreover, I don't think we actually want an output type from another model, do we?

Copy link
Contributor Author

@nikosanto13 nikosanto13 Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, it felt bad while doing it 😂 Unfortunately we need output types from other models in the files I introduced (almost all of them need the Wav2Vec2BaseModelOutput).

But it could be done cleaner, with "type aliasing" e.g. for WavLM model that needs Wav2Vec2ModelBaseOutput, we could add
WavLMBaseOutput = Wav2Vec2ModelBaseOutput
inside modular.

What do you think?

"""
for assignment, node in assignments.items():
should_keep = any(re.search(pattern, assignment) for pattern in ASSIGNMENTS_REGEX_TO_KEEP)

# If it's a DOCSTRING var and is assigned to None, the parent's docstring is kept.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, I don't really get here. This is already the actual behavior to have the docstring use the parent if it's None

Comment on lines +1019 to +1045

# Keep return annotation in `modular_xxx.py` if any, else original return annotation
new_return_annotation = updated_methods[name].returns if updated_methods[name].returns else func.returns

if not re.match(
r"\ndef .*\(.*\):\n raise.*Error\(.*",
mapper.python_module.code_for_node(updated_methods[name]),
):
func = func.with_changes(body=updated_methods[name].body, params=new_params, decorators=new_decorators)
func = func.with_changes(
body=updated_methods[name].body,
params=new_params,
decorators=new_decorators,
returns=new_return_annotation,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this one! Nice!

new_node = node.with_changes(body=node.body.with_changes(body=new_statements))
imports_to_keep.append(new_node)
existing_protected_statements.update({str(stmt) for stmt in new_statements})
import_statements = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's one of the current limitations. However, removing everything else does not seem like a good solution either. Could not wrap my mind around a nice rule for this. For now, the best is maybe to patch the original modeling file to dissociate safe import and other logic? Would that require a lot of change?

@nikosanto13 nikosanto13 force-pushed the modular-speech-ssl-models branch from edba3d2 to 5c47d86 Compare March 3, 2025 20:09
@nikosanto13
Copy link
Contributor Author

@Cyrilvallez hey, could you take a look again?

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Mar 14, 2025

Hey @nikosanto13! Super sorry about the delay, last week we were on an offsite with the whole transformers team, and this week was a bit crazy because of some big refactoring of core parts and releases! 🙂 This PR is still very much welcome, and I'll get a deeper look into all models asap! Please bear with me in the meantime 🙏 Be assured that this is definitely on my to-do list! 🤗 You can check out #36688 as well, which propose similar changes about the assignments 😉

@Cyrilvallez
Copy link
Member

Actually, tagging @eustlb as this is mostly audio models, maybe you have some time to help review the modular parts?

In the meantime, @nikosanto13 I believe some change can be reverted since #36279, the imports inside functions should not need to be moved 😉

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUGE! Kudos @nikosanto13 that's a big big work!

  • most of the modular files are missing a licence! (we can probably add it automatically!)
if is_peft_available():
    from peft.tuners.lora import LoraLayer

seems imported in quite a few places where it was not needed before

  • lets not add new features at the same time (its alrady super huge like that!)

Otherwise LGTM to me! Let's run all tests and GO! 🚀

@@ -1188,14 +1163,21 @@ def forward(
if not return_dict:
return (hidden_states, extract_features) + encoder_outputs[1:]

return Wav2Vec2BaseModelOutput(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol yeah good catch here!

Comment on lines 965 to 1132
if is_deepspeed_zero3_enabled():
import deepspeed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general this should stay as an import here rather than at the top

Comment on lines +103 to +121
class UniSpeechSatPositionalConvEmbedding(Wav2Vec2PositionalConvEmbedding):
pass


class UniSpeechSatFeatureEncoder(Wav2Vec2FeatureEncoder):
pass


class UniSpeechSatFeatureProjection(Wav2Vec2FeatureProjection):
pass


class UniSpeechSatEncoder(Wav2Vec2Encoder):
pass


class UniSpeechSatEncoderStableLayerNorm(Wav2Vec2EncoderStableLayerNorm):
pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused ones should be not needed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the modeling file is the same if I remove them, this would lead to several undefined-name (F821) violations (in the modular file) since the defined classes are needed for latter parts of the modular file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay no worries I did not check if they are used / unused!

Comment on lines 504 to 513
# layer normalization (has no effect when `config.do_stable_layer_norm == False`)
# extract_features = self.layer_norm_for_extract(extract_features)
# quantized_features, codevector_perplexity = self.quantizer(extract_features)
#
# project quantized features twice
# quantized_features = self.project_q(quantized_features)
# quantized_features = self.project_hid(quantized_features)
#
# loss = None
# logits = quantized_features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to cleanunp!

Comment on lines 72 to 74
if is_peft_available():
from peft.tuners.lora import LoraLayer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be here!

@@ -855,11 +860,18 @@ def _merge_assignments(self, assignments: dict[str, cst.CSTNode], object_mapping

Merging rule: if any assignment with the same name was redefined in the modular, we use it and its dependencies ONLY if it matches
a pattern in `ASSIGNMENTS_REGEX_TO_KEEP`. Otherwise, we use the original value and dependencies. This rule was chosen to avoid having to rewrite the
big docstrings.
big docstrings. If the assignment is a DOCSTRING var and is assigned to None, the parent's docstring is kept.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super super intuitive but noit a real problem!

@nikosanto13
Copy link
Contributor Author

@Cyrilvallez hey, no worries for the delay! thanks for giving me pointers to latest changes on the modular converter

#36279 fix was not enough in the lazy imports inside class definitions, as it only works for functions. But inspired by this I added a similar change to work on class definitions (see inline comment)

@@ -677,14 +678,18 @@ def leave_FunctionDef(self, node):

def visit_If(self, node):
# If we are inside a function, do not add the import to the list of imports
if self.current_function is None:
if self.current_function is None and self.current_class is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Cyrilvallez fix similar to #36279

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, when adding it for functions I thought we never had imports directly inside classes, so did not add it...Turns out I was wrong... 🥲🥲

@nikosanto13
Copy link
Contributor Author

@ArthurZucker ty for your review 🤗 the deepspeed and peft lazy import statements should be ok now, I added a fix on modular converter that makes my previous changes redundant

let me know if there is anything else

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good for me! @eustlb can you give a final look and merge? 🤗

Copy link
Contributor

@eustlb eustlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the good work @nikosanto13 !! 🤗
LGTM, I'll just run slow tests for the affected models.

BTW subsequent work could focus on propagating to other speech models that rely partially on wav2vec modelling: seamless m4t, speecht5, sew, sew_d

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Collaborator

run-slow: wavlm

@eustlb
Copy link
Contributor

eustlb commented Mar 21, 2025

run-slow: wav2vec2_bert, wav2vec2_conformer, unispeech, unispeech_sat, hubert, data2vec

Copy link

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/data2vec', 'models/hubert', 'models/unispeech', 'models/unispeech_sat', 'models/wav2vec2_bert', 'models/wav2vec2_conformer']
quantizations: [] ...

@eustlb
Copy link
Contributor

eustlb commented Mar 28, 2025

run-slow: wav2vec2_bert, wav2vec2_conformer, unispeech, unispeech_sat, hubert, data2vec

Copy link

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/data2vec', 'models/hubert', 'models/unispeech', 'models/unispeech_sat', 'models/wav2vec2_bert', 'models/wav2vec2_conformer']
quantizations: [] ...

@eustlb
Copy link
Contributor

eustlb commented Mar 31, 2025

run-slow: wav2vec2_bert, wav2vec2_conformer, unispeech, unispeech_sat, hubert, data2vec

Copy link

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/data2vec', 'models/hubert', 'models/unispeech', 'models/unispeech_sat', 'models/wav2vec2_bert', 'models/wav2vec2_conformer']
quantizations: [] ...

@eustlb eustlb closed this Apr 2, 2025
@eustlb eustlb reopened this Apr 2, 2025
@github-actions github-actions bot marked this pull request as draft April 2, 2025 14:46
@eustlb eustlb marked this pull request as ready for review April 2, 2025 14:48
@huggingface huggingface deleted a comment from github-actions bot Apr 2, 2025
@eustlb eustlb merged commit f74d7da into huggingface:main Apr 4, 2025
20 checks passed
@nikosanto13
Copy link
Contributor Author

@eustlb thanks for taking care of this

BTW subsequent work could focus on propagating to other speech models that rely partially on wav2vec modelling: seamless m4t, speecht5, sew, sew_d

yeah I skipped them by mistake, maybe I could open another PR now that this has been merged

@eustlb
Copy link
Contributor

eustlb commented Apr 4, 2025

@nikosanto13 thanks again for the work !!
I would love to review such a PR 🤗

yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Apr 5, 2025
* WAV_2_VEC_2 to WAV2VEC2

* added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio

* remove unnessary definitions in modulars

* added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer

* docstring fix for UniSpeechForCTC

* removed unneccessary re-definition of modular classes

* reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput

* top-level import of deepspeed in seamless_m4t, speecht5

* avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations

* convert modular

* tiny modular typing fixes

* some more modular fixes

* make style

---------

Co-authored-by: eustlb <[email protected]>
Co-authored-by: Eustache Le Bihan <[email protected]>
@nikosanto13 nikosanto13 mentioned this pull request Apr 13, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants