-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
Function conversation_to_ids in file finetune/dataset.py. There exists the following code:
if llm_type == "llama3":
input_ids, context, raw_msg = conversation_to_ids_llama3(
conversation, tokenizer
)
elif llm_type == "qwen":
input_ids, context, raw_msg = conversation_to_ids_qwen2(
conversation, tokenizer
)
else:
input_ids, context, raw_msg = conversation_to_ids_minicpm(
conversation, tokenizer
)
ids = torch.from_numpy(np.hstack(input_ids, dtype=np.int32))
context = torch.from_numpy(np.hstack(context, dtype=np.int8))
if input_ids.shape[-1] > max_length:
[omitted]but function conversation_to_ids_minicpm only returns a tuple of lists, which won't have a shape attribute, thus setting LLM_TYPE as minicpm would lead the following error:
Traceback (most recent call last):
File "/finetune/dataset.py", line 60, in __getitem__
ret = preprocess(
^^^^^^^^^^^
File "dataset.py", line 399, in preprocess
input_dict = conversation_to_ids(conversations, tokenizer, llm_type, new_schema, max_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/finetune/dataset.py", line 150, in conversation_to_ids
if input_ids.shape[-1] > max_length:
^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'shape'期望行为 | Expected Behavior
Return a tuple of numpy array as other functions do.
复现方法 | Steps To Reproduce
None
运行环境 | Environment
No need.备注 | Anything else?
This issue is first raised in #535. Is this a legacy code? Do you consider updating this code?
Metadata
Metadata
Assignees
Labels
No labels