Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[BUG] Conversation processing error. #1036

@SummerFall1819

Description

@SummerFall1819

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

Function conversation_to_ids in file finetune/dataset.py. There exists the following code:

    if llm_type == "llama3":
        input_ids, context, raw_msg = conversation_to_ids_llama3(
            conversation, tokenizer
        )
    elif llm_type == "qwen":
        input_ids, context, raw_msg = conversation_to_ids_qwen2(
            conversation, tokenizer
        )
    else:
        input_ids, context, raw_msg = conversation_to_ids_minicpm(
            conversation, tokenizer
        )

    ids = torch.from_numpy(np.hstack(input_ids, dtype=np.int32))
    context = torch.from_numpy(np.hstack(context, dtype=np.int8))
    if input_ids.shape[-1] > max_length:
    [omitted]

but function conversation_to_ids_minicpm only returns a tuple of lists, which won't have a shape attribute, thus setting LLM_TYPE as minicpm would lead the following error:

Traceback (most recent call last):
  File "/finetune/dataset.py", line 60, in __getitem__
    ret = preprocess(
          ^^^^^^^^^^^
  File "dataset.py", line 399, in preprocess
    input_dict = conversation_to_ids(conversations, tokenizer, llm_type, new_schema, max_length)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/finetune/dataset.py", line 150, in conversation_to_ids
    if input_ids.shape[-1] > max_length:
       ^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'shape'

期望行为 | Expected Behavior

Return a tuple of numpy array as other functions do.

复现方法 | Steps To Reproduce

None

运行环境 | Environment

No need.

备注 | Anything else?

This issue is first raised in #535. Is this a legacy code? Do you consider updating this code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions