Introduce modular files for speech models #35902

nikosanto13 · 2025-01-27T08:35:10Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @Cyrilvallez

Additional details

Added modular files for models that have heavy duplication with classes from modeling_wav2vec2.py: Hubert, WavLM, Data2VecAudio, Wav2Vec2Conformer, Wav2Vec2Bert, UniSpeech, UniSpeechSat
Added some modifications on the modular converter script, from issues that came up during writing the above modular scripts (see inline comments for justification)

nikosanto13 · 2025-01-27T08:38:17Z

utils/modular_model_converter.py

        """
        for assignment, node in assignments.items():
            should_keep = any(re.search(pattern, assignment) for pattern in ASSIGNMENTS_REGEX_TO_KEEP)

+            # If it's a DOCSTRING var and is assigned to None, the parent's docstring is kept.


I had to add this because for many of the models I've used, their docstring was kinda custom (e.g. contained link to original paper). So instead of just copying the docstring from modular file, I figured it would be best to adopt this hybrid approach.

If you agree with the change, I should also update the modular docs: https://github.com/huggingface/transformers/blob/main/docs/source/en/modular_transformers.md

Humm, I don't really get here. This is already the actual behavior to have the docstring use the parent if it's None

Well, I wanted to say "instead of copying the docstring from the parent ..." (my comment on the code is also kinda obscure)
Essentially, now there are two possibilities:

either set MYMODEL_INPUT_DOCSTRING = None, in which case the assignment will be copied by the parent (as it is already the case)

or set it to something else (new docstring), and the assignment will be copied from the modular file

So it is more flexible than the existing approach.

nikosanto13 · 2025-01-27T08:42:08Z

utils/modular_model_converter.py

-                new_node = node.with_changes(body=node.body.with_changes(body=new_statements))
-                imports_to_keep.append(new_node)
-                existing_protected_statements.update({str(stmt) for stmt in new_statements})
+            import_statements = [


I added this beacuse the code before had problematic behaviour for "safe" imports that had multiple other statements inside them, e.g. L381:395 on modeling_wav2vec2.py

if is_deepspeed_zero3_enabled(): import deepspeed with deepspeed.zero.GatheredParameters(self.conv.weight, modifier_rank=0): ...

The whole block after the import statement would be displaced in the top of the new modeling script (in the import statements).

Yes it's one of the current limitations. However, removing everything else does not seem like a good solution either. Could not wrap my mind around a nice rule for this. For now, the best is maybe to patch the original modeling file to dissociate safe import and other logic? Would that require a lot of change?

yeah let's do it like this, thanks

however, in the example above it would be better to move:

if is_deepspeed_zero3_enabled(): import deepspeed

outside of constructor, because in the current state the newly created module (prior to running ruff inside modular converter) would have two such statements, and the first one would become:

if is_deepspeed_zero3_enabled(): pass

after the run_ruff call.

But if we move it top-side, deepspeed would no longer be lazily imported. I think this is not a problem, right?

Rocketknight1 · 2025-01-27T17:31:07Z

cc @ArthurZucker @qubvel

Cyrilvallez

Hey! Thanks for the contribution! I just looked at the modular part, let me know if something is unclear!! 🤗

Cyrilvallez · 2025-01-28T17:38:33Z

utils/modular_model_converter.py

+# Exclude names to prevent edge cases where we want to keep a name that may
+# exist in the mapping, e.g. `Wav2Vec2BaseModelOutput` where `Wav2Vec2` is
+# a "base" model identifier but we want the type to pass as is in the produced modeling file
+EXCLUDE_NAMES = ["Wav2Vec2BaseModelOutput"]
+
+
 def preserve_case_replace(text, patterns: dict, default_name: str):
    # Create a regex pattern to match all variations
    regex_pattern = "|".join(re.escape(key) for key in patterns.keys())
-    compiled_regex = re.compile(f"(?<![a-z0-9])({regex_pattern})(.|$)", re.IGNORECASE | re.DOTALL)
+
+    # Create exclude pattern
+    exclude_pattern = "|".join(re.escape(key) for key in EXCLUDE_NAMES)
+    compiled_regex = re.compile(f"(?<![a-z0-9])(?!{exclude_pattern})({regex_pattern})(.|$)", re.IGNORECASE | re.DOTALL)


Definitely not a fan of having exclusions here. And the regex is already way too complicated 🥲 Moreover, I don't think we actually want an output type from another model, do we?

Yeah you're right, it felt bad while doing it 😂 Unfortunately we need output types from other models in the files I introduced (almost all of them need the Wav2Vec2BaseModelOutput).

But it could be done cleaner, with "type aliasing" e.g. for WavLM model that needs Wav2Vec2ModelBaseOutput, we could add
WavLMBaseOutput = Wav2Vec2ModelBaseOutput
inside modular.

What do you think?

Cyrilvallez · 2025-01-28T17:41:44Z

utils/modular_model_converter.py

        """
        for assignment, node in assignments.items():
            should_keep = any(re.search(pattern, assignment) for pattern in ASSIGNMENTS_REGEX_TO_KEEP)

+            # If it's a DOCSTRING var and is assigned to None, the parent's docstring is kept.


Humm, I don't really get here. This is already the actual behavior to have the docstring use the parent if it's None

Cyrilvallez · 2025-01-28T17:44:29Z

utils/modular_model_converter.py

+
+            # Keep return annotation in `modular_xxx.py` if any, else original return annotation
+            new_return_annotation = updated_methods[name].returns if updated_methods[name].returns else func.returns
+
            if not re.match(
                r"\ndef .*\(.*\):\n    raise.*Error\(.*",
                mapper.python_module.code_for_node(updated_methods[name]),
            ):
-                func = func.with_changes(body=updated_methods[name].body, params=new_params, decorators=new_decorators)
+                func = func.with_changes(
+                    body=updated_methods[name].body,
+                    params=new_params,
+                    decorators=new_decorators,
+                    returns=new_return_annotation,
+                )


Love this one! Nice!

Cyrilvallez · 2025-01-28T17:55:21Z

utils/modular_model_converter.py

-                new_node = node.with_changes(body=node.body.with_changes(body=new_statements))
-                imports_to_keep.append(new_node)
-                existing_protected_statements.update({str(stmt) for stmt in new_statements})
+            import_statements = [


Yes it's one of the current limitations. However, removing everything else does not seem like a good solution either. Could not wrap my mind around a nice rule for this. For now, the best is maybe to patch the original modeling file to dissociate safe import and other logic? Would that require a lot of change?

…or Wav2Vec2BaseModelOutput

nikosanto13 · 2025-03-03T20:24:37Z

@Cyrilvallez hey, could you take a look again?

Cyrilvallez · 2025-03-14T10:04:57Z

Hey @nikosanto13! Super sorry about the delay, last week we were on an offsite with the whole transformers team, and this week was a bit crazy because of some big refactoring of core parts and releases! 🙂 This PR is still very much welcome, and I'll get a deeper look into all models asap! Please bear with me in the meantime 🙏 Be assured that this is definitely on my to-do list! 🤗 You can check out #36688 as well, which propose similar changes about the assignments 😉

Cyrilvallez · 2025-03-14T10:15:25Z

Actually, tagging @eustlb as this is mostly audio models, maybe you have some time to help review the modular parts?

In the meantime, @nikosanto13 I believe some change can be reverted since #36279, the imports inside functions should not need to be moved 😉

ArthurZucker

HUGE! Kudos @nikosanto13 that's a big big work!

most of the modular files are missing a licence! (we can probably add it automatically!)

if is_peft_available():
    from peft.tuners.lora import LoraLayer

seems imported in quite a few places where it was not needed before

lets not add new features at the same time (its alrady super huge like that!)

Otherwise LGTM to me! Let's run all tests and GO! 🚀

ArthurZucker · 2025-03-19T15:18:53Z

src/transformers/models/data2vec/modeling_data2vec_audio.py

@@ -1188,14 +1163,21 @@ def forward(
        if not return_dict:
            return (hidden_states, extract_features) + encoder_outputs[1:]

-        return Wav2Vec2BaseModelOutput(


lol yeah good catch here!

ArthurZucker · 2025-03-19T15:20:47Z

src/transformers/models/hubert/modeling_hubert.py

            if is_deepspeed_zero3_enabled():
-                import deepspeed
-


in general this should stay as an import here rather than at the top

ArthurZucker · 2025-03-19T15:25:19Z

src/transformers/models/unispeech_sat/modular_unispeech_sat.py

+class UniSpeechSatPositionalConvEmbedding(Wav2Vec2PositionalConvEmbedding):
+    pass
+
+
+class UniSpeechSatFeatureEncoder(Wav2Vec2FeatureEncoder):
+    pass
+
+
+class UniSpeechSatFeatureProjection(Wav2Vec2FeatureProjection):
+    pass
+
+
+class UniSpeechSatEncoder(Wav2Vec2Encoder):
+    pass
+
+
+class UniSpeechSatEncoderStableLayerNorm(Wav2Vec2EncoderStableLayerNorm):
+    pass
+


unused ones should be not needed!

While the modeling file is the same if I remove them, this would lead to several undefined-name (F821) violations (in the modular file) since the defined classes are needed for latter parts of the modular file.

okay no worries I did not check if they are used / unused!

ArthurZucker · 2025-03-19T15:26:12Z

src/transformers/models/unispeech_sat/modular_unispeech_sat.py

+        # layer normalization (has no effect when `config.do_stable_layer_norm == False`)
+        #        extract_features = self.layer_norm_for_extract(extract_features)
+        #        quantized_features, codevector_perplexity = self.quantizer(extract_features)
+        #
+        # project quantized features twice
+        #        quantized_features = self.project_q(quantized_features)
+        #        quantized_features = self.project_hid(quantized_features)
+        #
+        #        loss = None
+        #        logits = quantized_features


to cleanunp!

ArthurZucker · 2025-03-19T15:26:39Z

src/transformers/models/wav2vec2/modeling_wav2vec2.py

+if is_peft_available():
+    from peft.tuners.lora import LoraLayer
+


should not be here!

ArthurZucker · 2025-03-19T15:29:42Z

utils/modular_model_converter.py

@@ -855,11 +860,18 @@ def _merge_assignments(self, assignments: dict[str, cst.CSTNode], object_mapping

        Merging rule: if any assignment with the same name was redefined in the modular, we use it and its dependencies ONLY if it matches
        a pattern in `ASSIGNMENTS_REGEX_TO_KEEP`. Otherwise, we use the original value and dependencies. This rule was chosen to avoid having to rewrite the
-        big docstrings.
+        big docstrings. If the assignment is a DOCSTRING var and is assigned to None, the parent's docstring is kept.


not super super intuitive but noit a real problem!

…models

…imports in their original locations

nikosanto13 · 2025-03-19T20:11:58Z

@Cyrilvallez hey, no worries for the delay! thanks for giving me pointers to latest changes on the modular converter

#36279 fix was not enough in the lazy imports inside class definitions, as it only works for functions. But inspired by this I added a similar change to work on class definitions (see inline comment)

nikosanto13 · 2025-03-19T20:12:11Z

utils/modular_model_converter.py

@@ -677,14 +678,18 @@ def leave_FunctionDef(self, node):

    def visit_If(self, node):
        # If we are inside a function, do not add the import to the list of imports
-        if self.current_function is None:
+        if self.current_function is None and self.current_class is None:


cc @Cyrilvallez fix similar to #36279

Indeed, when adding it for functions I thought we never had imports directly inside classes, so did not add it...Turns out I was wrong... 🥲🥲

nikosanto13 · 2025-03-19T20:20:10Z

@ArthurZucker ty for your review 🤗 the deepspeed and peft lazy import statements should be ok now, I added a fix on modular converter that makes my previous changes redundant

let me know if there is anything else

ArthurZucker

All good for me! @eustlb can you give a final look and merge? 🤗

eustlb

Thanks a lot for the good work @nikosanto13 !! 🤗
LGTM, I'll just run slow tests for the affected models.

BTW subsequent work could focus on propagating to other speech models that rely partially on wav2vec modelling: seamless m4t, speecht5, sew, sew_d

HuggingFaceDocBuilderDev · 2025-03-21T10:54:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-03-21T12:16:32Z

run-slow: wavlm

github-actions · 2025-03-21T12:23:48Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/data2vec', 'models/hubert', 'models/unispeech', 'models/unispeech_sat', 'models/wav2vec2_bert', 'models/wav2vec2_conformer']
quantizations: [] ...

eustlb · 2025-03-28T09:40:36Z

run-slow: wav2vec2_bert, wav2vec2_conformer, unispeech, unispeech_sat, hubert, data2vec

github-actions · 2025-03-28T09:41:52Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/data2vec', 'models/hubert', 'models/unispeech', 'models/unispeech_sat', 'models/wav2vec2_bert', 'models/wav2vec2_conformer']
quantizations: [] ...

eustlb · 2025-03-31T12:14:42Z

run-slow: wav2vec2_bert, wav2vec2_conformer, unispeech, unispeech_sat, hubert, data2vec

github-actions · 2025-03-31T12:15:59Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/data2vec', 'models/hubert', 'models/unispeech', 'models/unispeech_sat', 'models/wav2vec2_bert', 'models/wav2vec2_conformer']
quantizations: [] ...

nikosanto13 · 2025-04-04T14:15:36Z

@eustlb thanks for taking care of this

BTW subsequent work could focus on propagating to other speech models that rely partially on wav2vec modelling: seamless m4t, speecht5, sew, sew_d

yeah I skipped them by mistake, maybe I could open another PR now that this has been merged

eustlb · 2025-04-04T14:41:00Z

@nikosanto13 thanks again for the work !!
I would love to review such a PR 🤗

* WAV_2_VEC_2 to WAV2VEC2 * added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio * remove unnessary definitions in modulars * added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer * docstring fix for UniSpeechForCTC * removed unneccessary re-definition of modular classes * reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput * top-level import of deepspeed in seamless_m4t, speecht5 * avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations * convert modular * tiny modular typing fixes * some more modular fixes * make style --------- Co-authored-by: eustlb <[email protected]> Co-authored-by: Eustache Le Bihan <[email protected]>

nikosanto13 commented Jan 27, 2025

View reviewed changes

Cyrilvallez reviewed Jan 28, 2025

View reviewed changes

nikosanto13 added 7 commits March 3, 2025 20:08

WAV_2_VEC_2 to WAV2VEC2

300489a

added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio

ab836e1

remove unnessary definitions in modulars

1452eb5

added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer

22a2744

docstring fix for UniSpeechForCTC

c9350b9

removed unneccessary re-definition of modular classes

31645c5

reverted lazy imports change on modular_model_converter, type-alias f…

5c47d86

…or Wav2Vec2BaseModelOutput

nikosanto13 force-pushed the modular-speech-ssl-models branch from edba3d2 to 5c47d86 Compare March 3, 2025 20:09

top-level import of deepspeed in seamless_m4t, speecht5

a204a21

yonigozlan mentioned this pull request Mar 14, 2025

Support custom dosctrings in modular #36726

Merged

ArthurZucker approved these changes Mar 19, 2025

View reviewed changes

eustlb added the run-slow label Mar 19, 2025

nikosanto13 added 2 commits March 19, 2025 18:33

Merge remote-tracking branch 'upstream/main' into modular-speech-ssl-…

1b8438e

…models

avoid tracking imports inside classes, relocate lazy deepspeed, peft …

6726b06

…imports in their original locations

nikosanto13 commented Mar 19, 2025

View reviewed changes

ArthurZucker approved these changes Mar 20, 2025

View reviewed changes

Merge branch 'main' into modular-speech-ssl-models

a7060df

eustlb approved these changes Mar 21, 2025

View reviewed changes

Merge branch 'main' into modular-speech-ssl-models

e53cdc2

Merge branch 'main' into modular-speech-ssl-models

4e157e7

Merge branch 'main' into modular-speech-ssl-models

b5f0fed

eustlb and others added 7 commits April 1, 2025 13:07

Merge branch 'main' into modular-speech-ssl-models

ddef33d

convert modular

d71cd2f

tiny modular typing fixes

c749b26

Merge branch 'main' into modular-speech-ssl-models

d1c5ea1

Merge branch 'main' into modular-speech-ssl-models

9fa480d

Merge branch 'main' into modular-speech-ssl-models

ef7e97c

Merge branch 'main' into modular-speech-ssl-models

41c62c8

eustlb closed this Apr 2, 2025

eustlb reopened this Apr 2, 2025

github-actions bot marked this pull request as draft April 2, 2025 14:46

Merge branch 'main' into modular-speech-ssl-models

d7773c9

eustlb marked this pull request as ready for review April 2, 2025 14:48

huggingface deleted a comment from github-actions bot Apr 2, 2025

eustlb and others added 4 commits April 4, 2025 09:42

Merge branch 'main' into modular-speech-ssl-models

0039b37

some more modular fixes

39277cd

make style

844518b

Merge branch 'main' into modular-speech-ssl-models

82a3e9a

eustlb merged commit f74d7da into huggingface:main Apr 4, 2025
20 checks passed

nikosanto13 mentioned this pull request Apr 13, 2025

Modular m4t speecht5 sew #37473

Open

5 tasks

		if is_peft_available():
		from peft.tuners.lora import LoraLayer

Introduce modular files for speech models #35902

Introduce modular files for speech models #35902

Uh oh!

Conversation

nikosanto13 commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Additional details

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikosanto13 Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented Jan 27, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikosanto13 Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikosanto13 commented Mar 3, 2025

Uh oh!

Cyrilvallez commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Mar 14, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikosanto13 commented Mar 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikosanto13 commented Mar 19, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

nikosanto13 commented Jan 27, 2025 •

edited

Loading

nikosanto13 Jan 30, 2025 •

edited

Loading

nikosanto13 Jan 29, 2025 •

edited

Loading

Cyrilvallez commented Mar 14, 2025 •

edited

Loading