Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The ilab model train finishing with the error: "Error: Data path must be to a valid .jsonl file. Value given: /path/to/.local/share/instructlab/datasets". #2257

@junaruga

Description

@junaruga

Describe the bug

The ilab model train finishes with the error "Expected list[str] but got tuple with value ('q_proj', 'k_proj', 'v_proj', 'o_proj') - serialized value may not be as expected".

To Reproduce
Steps to reproduce the behavior:

  1. Run the ilab model train.
  2. See the following error. The error message is similar with the one on the Click converts lists into tuples when parsing args #2061. But the error message is a bit different.
(venv) $ ilab model train
/home/jaruga/doc/dev/instructlab/venv/lib64/python3.11/site-packages/pydantic/main.py:387: UserWarning: Pydantic serializer warnings:
  Expected `list[str]` but got `tuple` with value `('q_proj', 'k_proj', 'v_proj', 'o_proj')` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(
Usage: ilab model train [OPTIONS]
Try 'ilab model train --help' for help.

Error: Data path must be to a valid .jsonl file. Value given: /home/jaruga/.local/share/instructlab/datasets

Expected behavior

The ilab model train finishes without an error.

Screenshots

Device Info (please complete the following information):

  • Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
  • OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
  • Python Version: [output of python --version]
  • InstructLab Version: [output of ilab system info]

Hardware Specs:

$ neofetch --stdout
jaruga@fedora
-------------
OS: Fedora release 39 (Thirty Nine) x86_64
Host: 20KGS23S0P ThinkPad X1 Carbon 6th
Kernel: 6.9.12-100.fc39.x86_64
Uptime: 14 days, 3 hours, 10 mins
Packages: 3791 (rpm), 21 (brew)
Shell: bash 5.2.26
Resolution: 1920x1080
WM: sway
Theme: Adwaita [GTK2/3]
Icons: Adwaita [GTK2/3]
Terminal: alacritty
CPU: Intel i7-8650U (8) @ 4.200GHz
GPU: Intel UHD Graphics 620
Memory: 5854MiB / 15635MiB

OS Version: Fedora Linux 39

Python Version:

(venv) $ python --version
Python 3.11.9
$ rpm -q python3.11
python3.11-3.11.9-6.fc39.x86_64

InstructLab Version:

(venv) $ ilab system info
sys.version: 3.11.9 (main, Aug 23 2024, 00:00:00) [GCC 13.3.1 20240522 (Red Hat 13.3.1-1)]
sys.platform: linux
os.name: posix
platform.release: 6.9.12-100.fc39.x86_64
platform.machine: x86_64
platform.node: localhost.localdomain
platform.python_version: 3.11.9
os-release.ID: fedora
os-release.VERSION_ID: 39
os-release.PRETTY_NAME: Fedora Linux 39 (Workstation Edition)
instructlab.version: 0.18.4
instructlab-dolomite.version: 0.1.1
instructlab-eval.version: 0.1.2
instructlab-quantize.version: 0.1.0
instructlab-schema.version: 0.3.1
instructlab-sdg.version: 0.2.7
instructlab-training.version: 0.4.2
torch.version: 2.3.1+cu121
torch.backends.cpu.capability: AVX2
torch.version.cuda: 12.1
torch.version.hip: None
torch.cuda.available: False
torch.backends.cuda.is_built: True
torch.backends.mps.is_built: False
torch.backends.mps.is_available: False
llama_cpp_python.version: 0.2.79
llama_cpp_python.supports_gpu_offload: False

Additional context

I ran the following commands before running the ilab model train.

(venv) $ ilab data generate --pipeline simple
...
INFO 2024-09-12 14:03:35,328 instructlab.sdg.datamixing:200: Mixed Dataset saved to /home/jaruga/.local/share/instructlab/datasets/skills_train_msgs_2024-09-12T13_05_40.jsonl
INFO 2024-09-12 14:03:35,329 instructlab.sdg:438: Generation took 3474.95s

(venv) $ ilab taxonomy diff
compositional_skills/miscellaneous_unknown/qna.yaml
Taxonomy in /home/jaruga/.local/share/instructlab/taxonomy is valid :)

(venv) $ ilab model list
+------------------------------+---------------------+--------+
| Model Name                   | Last Modified       | Size   |
+------------------------------+---------------------+--------+
| merlinite-7b-lab-Q4_K_M.gguf | 2024-09-09 18:44:12 | 4.1 GB |
+------------------------------+---------------------+--------+

The used pip package versions.

(venv) $ pip list | grep instructlab
instructlab               0.18.4
instructlab-dolomite      0.1.1
instructlab-eval          0.1.2
instructlab-quantize      0.1.0
instructlab-schema        0.3.1
instructlab-sdg           0.2.7
instructlab-training      0.4.2
(venv) $ pip list | grep pydantic
pydantic                  2.9.1
pydantic_core             2.23.3
pydantic-settings         2.4.0
pydantic_yaml             1.3.0

The pydantic version 2.9.1 is used. So, the error line pydantic/main.py:387 is here.

(venv) $ ilab model train
/home/jaruga/doc/dev/instructlab/venv/lib64/python3.11/site-packages/pydantic/main.py:387: > UserWarning: Pydantic serializer warnings:
  Expected `list[str]` but got `tuple` with value `('q_proj', 'k_proj', 'v_proj', 'o_proj')` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(
Usage: ilab model train [OPTIONS]
Try 'ilab model train --help' for help.

In addition, I am not sure if this is relevant to this issue. I ran the git pull command manually after setting up the /home/jaruga/.local/share/instructlab/taxonomy by the ilab config init.

$ pwd
/home/jaruga/.local/share/instructlab/taxonomy

$ git pull

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions