Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit f90cd62

Browse files
author
jimmy.xj
committed
Update README.md
1 parent c8b1ee1 commit f90cd62

File tree

2 files changed

+97
-17
lines changed

2 files changed

+97
-17
lines changed

README.md

Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
2626
## 📜 Table of Contents
2727

2828
- [🏆 Leaderboard](#-leaderboard)
29+
- [👀 DevOps](#-devops)
30+
- [🔥 AIOps](#-aiops)
2931
- [⏬ Data](#-data)
3032
- [👀 Notes](#-notes)
3133
- [🔥 AIOps Sample Example](#-aiops-sample-example)
@@ -36,19 +38,19 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
3638

3739
## 🏆 Leaderboard
3840
Below are zero-shot and five-shot accuracies from the models that we evaluate in the initial release. We note that five-shot performance is better than zero-shot for many instruction-tuned models.
39-
41+
### DevOps
4042
#### Zero Shot
4143

4244
| **ModelName** | plan | code | build | test | release | deploy | operate | monitor | **AVG** |
4345
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:-----------:|
44-
| **DevOps-Model-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
45-
| **DevOps-Model-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
46+
| **DevOpsPal-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
47+
| **DevOpsPal-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
4648
| Qwen-14B-Chat | 60.61 | 75.4 | 85.32 | 84.21 | 89.62 | 82.75 | 83.58 | 80.56 | 79.28 |
4749
| Qwen-14B-Base | 57.58 | 73.81 | 84.4 | 85.53 | 86.32 | 81.18 | 82.09 | 80.09 | 77.92 |
4850
| Baichuan2-13B-Base | 60.61 | 69.42 | 79.82 | 79.82 | 82.55 | 81.18 | 85.07 | 83.8 | 75.10 |
4951
| Baichuan2-13B-Chat | 60.61 | 68.43 | 77.98 | 80.7 | 81.6 | 83.53 | 82.09 | 84.72 | 74.60 |
50-
| **DevOps-Model-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
51-
| **DevOps-Model-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
52+
| **DevOpsPal-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
53+
| **DevOpsPal-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
5254
| Qwen-7B-Base | 53.03 | 68.13 | 78.9 | 75.44 | 80.19 | 80 | 83.58 | 80.09 | 73.13 |
5355
| Qwen-7B-Chat | 57.58 | 66.01 | 80.28 | 79.82 | 76.89 | 77.65 | 80.6 | 79.17 | 71.96 |
5456
| Baichuan2-7B-Chat | 54.55 | 63.66 | 77.98 | 76.32 | 71.7 | 73.33 | 75.37 | 79.63 | 68.17 |
@@ -61,21 +63,59 @@ Below are zero-shot and five-shot accuracies from the models that we evaluate in
6163

6264
| **ModelName** | plan | code | build | test | release | deploy | operate | monitor | **AVG** |
6365
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:---------:|
64-
| **DevOps-Model-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
65-
| **DevOps-Model-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
66+
| **DevOpsPal-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
67+
| **DevOpsPal-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
6668
| Qwen-14B-Chat | 65.15 | 76 | 82.57 | 85.53 | 84.91 | 84.31 | 85.82 | 81.48 | 79.55 |
6769
| Qwen-14B-Base | 66.67 | 76.15 | 84.4 | 85.53 | 86.32 | 80.39 | 86.57 | 80.56 | 79.51 |
6870
| Baichuan2-13B-Base | 63.64 | 71.39 | 80.73 | 82.46 | 81.13 | 84.31 | 91.79 | 85.19 | 77.09 |
6971
| Qwen-7B-Base | 75.76 | 72.52 | 78.9 | 81.14 | 83.96 | 81.18 | 85.07 | 81.94 | 77.02 |
7072
| Baichuan2-13B-Chat | 62.12 | 69.95 | 76.61 | 84.21 | 83.49 | 79.61 | 88.06 | 80.56 | 75.32 |
71-
| **DevOps-Model-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
72-
| **DevOps-Model-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
73+
| **DevOpsPal-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
74+
| **DevOpsPal-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
7375
| Qwen-7B-Chat | 65.15 | 66.54 | 82.57 | 81.58 | 81.6 | 81.18 | 80.6 | 81.02 | 73.62 |
7476
| Baichuan2-7B-Base | 60.61 | 67.22 | 76.61 | 75 | 77.83 | 78.43 | 80.6 | 79.63 | 72.11 |
7577
| Internlm-7B-Chat | 60.61 | 63.06 | 79.82 | 80.26 | 67.92 | 75.69 | 73.88 | 77.31 | 71.09 |
7678
| Baichuan2-7B-Chat | 60.61 | 64.95 | 81.19 | 75.88 | 71.23 | 75.69 | 78.36 | 79.17 | 70.49 |
7779
| Internlm-7B-Base | 62.12 | 65.25 | 77.52 | 80.7 | 74.06 | 78.82 | 79.85 | 75.46 | 69.17 |
7880

81+
### AIOps
82+
#### Zero Shot
83+
| **ModelName** | LogParsing | RootCauseAnalysis | TimeSeriesAnomalyDetection | TimeSeriesClassification | **AVG** |
84+
|:-------------------:|:------------:|:------------------:|:---------------------------:|:-------------------------:|:-------:|
85+
| Qwen-14B-Base | 66.29 | 58.8 | 25.33 | 43.5 | 49.27 |
86+
| DevOpsPal-14B—Base | 63.14 | 53.6 | 23.33 | 43.5 | 46.55 |
87+
| DevOpsPal-14B—Chat | 60 | 56 | 24 | 43 | 46.18 |
88+
| Qwen-14B-Chat | 64.57 | 51.6 | 22.67 | 36 | 45 |
89+
| Qwen-7B-Base | 50 | 39.2 | 22.67 | 54 | 40.82 |
90+
| Qwen-7B-Chat | 57.43 | 38.8 | 22.33 | 39.5 | 40.36 |
91+
| DevOpsPal-7B—Chat | 56.57 | 30.4 | 25.33 | 45 | 40 |
92+
| Baichuan2-13B-Chat | 64 | 18 | 21.33 | 37.5 | 37.09 |
93+
| Baichuan2-7B-Chat | 60.86 | 10 | 28 | 34.5 | 35.55 |
94+
| Baichuan2-7B-Base | 53.43 | 12.8 | 27.67 | 36.5 | 34.09 |
95+
| Internlm-7B—Base | 48.57 | 18.8 | 23.33 | 37.5 | 32.91 |
96+
| Baichuan2-13B-Base | 54 | 12.4 | 23 | 34.5 | 32.55 |
97+
| DevOpsPal-7B—Base | 46.57 | 20.8 | 25 | 34 | 32.55 |
98+
| Internlm-7B—Chat | 58.86 | 8.8 | 22.33 | 28.5 | 32 |
99+
100+
#### One Shot
101+
| **ModelName** | LogParsing | RootCauseAnalysis | TimeSeriesAnomalyDetection | TimeSeriesClassification | **AVG** |
102+
|:-------------------:|:------------:|:------------------:|:---------------------------:|:-------------------------:|:-------:|
103+
| DevOpsPal-14B—Chat | 66.29 | 80.8 | 23.33 | 44.5 | 53.91 |
104+
| Qwen-14B-Base | 64.29 | 74.4 | 28 | 48.5 | 53.82 |
105+
| DevOpsPal-14B—Base | 60 | 74 | 25.33 | 43.5 | 50.73 |
106+
| Qwen-14B-Chat | 49.71 | 65.6 | 28.67 | 48 | 47.27 |
107+
| Qwen-7B-Base | 56 | 60.8 | 27.67 | 44 | 47.18 |
108+
| DevOpsPal-7B—Base | 52.86 | 44.4 | 28 | 44.5 | 42.64 |
109+
| Qwen-7B-Chat | 54.57 | 52 | 29.67 | 26.5 | 42.09 |
110+
| Baichuan2-13B-Base | 56 | 43.2 | 24.33 | 41 | 41.73 |
111+
| Baichuan2-13B-Chat | 57.43 | 44.4 | 25 | 25.5 | 39.82 |
112+
| Baichuan2-7B-Base | 48.29 | 40.4 | 27 | 42 | 39.55 |
113+
| Baichuan2-7B-Chat | 58.57 | 31.6 | 27 | 31.5 | 38.91 |
114+
| DevOpsPal-7B—Chat | 56.57 | 27.2 | 25.33 | 41.5 | 38.64 |
115+
| Internlm-7B—Base | 48 | 33.2 | 29 | 35 | 37.09 |
116+
| Internlm-7B—Chat | 62.57 | 12.8 | 22.33 | 21 | 32.73 |
117+
118+
79119
## ⏬ Data
80120
#### Download
81121
* Method 1: Download the zip file (you can also simply open the following link with the browser):

README_zh.md

Lines changed: 48 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
2626
## 📜 目录
2727

2828
- [🏆 排行榜](#-排行榜)
29+
- [👀 DevOps](#-devops)
30+
- [🔥 AIOps](#-aiops)
2931
- [⏬ 数据](#-数据)
3032
- [👀 说明](#-说明)
3133
- [🔥 AIOps样本示例](#-AIOps样本示例)
@@ -41,14 +43,14 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
4143

4244
| **模型** | plan | code | build | test | release | deploy | operate | monitor | **平均分** |
4345
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:---------:|
44-
| **DevOps-Model-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
45-
| **DevOps-Model-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
46+
| **DevOpsPal-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
47+
| **DevOpsPal-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
4648
| Qwen-14B-Chat | 60.61 | 75.4 | 85.32 | 84.21 | 89.62 | 82.75 | 83.58 | 80.56 | 79.28 |
4749
| Qwen-14B-Base | 57.58 | 73.81 | 84.4 | 85.53 | 86.32 | 81.18 | 82.09 | 80.09 | 77.92 |
4850
| Baichuan2-13B-Base | 60.61 | 69.42 | 79.82 | 79.82 | 82.55 | 81.18 | 85.07 | 83.8 | 75.10 |
4951
| Baichuan2-13B-Chat | 60.61 | 68.43 | 77.98 | 80.7 | 81.6 | 83.53 | 82.09 | 84.72 | 74.60 |
50-
| **DevOps-Model-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
51-
| **DevOps-Model-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
52+
| **DevOpsPal-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
53+
| **DevOpsPal-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
5254
| Qwen-7B-Base | 53.03 | 68.13 | 78.9 | 75.44 | 80.19 | 80 | 83.58 | 80.09 | 73.13 |
5355
| Qwen-7B-Chat | 57.58 | 66.01 | 80.28 | 79.82 | 76.89 | 77.65 | 80.6 | 79.17 | 71.96 |
5456
| Baichuan2-7B-Chat | 54.55 | 63.66 | 77.98 | 76.32 | 71.7 | 73.33 | 75.37 | 79.63 | 68.17 |
@@ -61,21 +63,59 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
6163

6264
| **模型** | plan | code | build | test | release | deploy | operate | monitor | **平均分** |
6365
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:---------:|
64-
| **DevOps-Model-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
65-
| **DevOps-Model-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
66+
| **DevOpsPal-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
67+
| **DevOpsPal-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
6668
| Qwen-14B-Chat | 65.15 | 76 | 82.57 | 85.53 | 84.91 | 84.31 | 85.82 | 81.48 | 79.55 |
6769
| Qwen-14B-Base | 66.67 | 76.15 | 84.4 | 85.53 | 86.32 | 80.39 | 86.57 | 80.56 | 79.51 |
6870
| Baichuan2-13B-Base | 63.64 | 71.39 | 80.73 | 82.46 | 81.13 | 84.31 | 91.79 | 85.19 | 77.09 |
6971
| Qwen-7B-Base | 75.76 | 72.52 | 78.9 | 81.14 | 83.96 | 81.18 | 85.07 | 81.94 | 77.02 |
7072
| Baichuan2-13B-Chat | 62.12 | 69.95 | 76.61 | 84.21 | 83.49 | 79.61 | 88.06 | 80.56 | 75.32 |
71-
| **DevOps-Model-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
72-
| **DevOps-Model-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
73+
| **DevOpsPal-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
74+
| **DevOpsPal-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
7375
| Qwen-7B-Chat | 65.15 | 66.54 | 82.57 | 81.58 | 81.6 | 81.18 | 80.6 | 81.02 | 73.62 |
7476
| Baichuan2-7B-Base | 60.61 | 67.22 | 76.61 | 75 | 77.83 | 78.43 | 80.6 | 79.63 | 72.11 |
7577
| Internlm-7B-Chat | 60.61 | 63.06 | 79.82 | 80.26 | 67.92 | 75.69 | 73.88 | 77.31 | 71.09 |
7678
| Baichuan2-7B-Chat | 60.61 | 64.95 | 81.19 | 75.88 | 71.23 | 75.69 | 78.36 | 79.17 | 70.49 |
7779
| Internlm-7B-Base | 62.12 | 65.25 | 77.52 | 80.7 | 74.06 | 78.82 | 79.85 | 75.46 | 69.17 |
7880

81+
82+
### AIOps
83+
#### Zero Shot
84+
| **模型** | 日志解析 | 根因分析 | 时序异常检测 | 时序分类 | **平均分** |
85+
|:-------------------:|:-----:|:----:|:------:|:----:|:-------:|
86+
| Qwen-14B-Base | 66.29 | 58.8 | 25.33 | 43.5 | 49.27 |
87+
| DevOpsPal-14B—Base | 63.14 | 53.6 | 23.33 | 43.5 | 46.55 |
88+
| DevOpsPal-14B—Chat | 60 | 56 | 24 | 43 | 46.18 |
89+
| Qwen-14B-Chat | 64.57 | 51.6 | 22.67 | 36 | 45 |
90+
| Qwen-7B-Base | 50 | 39.2 | 22.67 | 54 | 40.82 |
91+
| Qwen-7B-Chat | 57.43 | 38.8 | 22.33 | 39.5 | 40.36 |
92+
| DevOpsPal-7B—Chat | 56.57 | 30.4 | 25.33 | 45 | 40 |
93+
| Baichuan2-13B-Chat | 64 | 18 | 21.33 | 37.5 | 37.09 |
94+
| Baichuan2-7B-Chat | 60.86 | 10 | 28 | 34.5 | 35.55 |
95+
| Baichuan2-7B-Base | 53.43 | 12.8 | 27.67 | 36.5 | 34.09 |
96+
| Internlm-7B—Base | 48.57 | 18.8 | 23.33 | 37.5 | 32.91 |
97+
| Baichuan2-13B-Base | 54 | 12.4 | 23 | 34.5 | 32.55 |
98+
| DevOpsPal-7B—Base | 46.57 | 20.8 | 25 | 34 | 32.55 |
99+
| Internlm-7B—Chat | 58.86 | 8.8 | 22.33 | 28.5 | 32 |
100+
101+
#### One Shot
102+
| **模型** | 日志解析 | 根因分析 | 时序异常检测 | 时序分类 | **平均分** |
103+
|:-------------------:|:------------:|:------------------:|:---------------------------:|:-------------------------:|:-------:|
104+
| DevOpsPal-14B—Chat | 66.29 | 80.8 | 23.33 | 44.5 | 53.91 |
105+
| Qwen-14B-Base | 64.29 | 74.4 | 28 | 48.5 | 53.82 |
106+
| DevOpsPal-14B—Base | 60 | 74 | 25.33 | 43.5 | 50.73 |
107+
| Qwen-14B-Chat | 49.71 | 65.6 | 28.67 | 48 | 47.27 |
108+
| Qwen-7B-Base | 56 | 60.8 | 27.67 | 44 | 47.18 |
109+
| DevOpsPal-7B—Base | 52.86 | 44.4 | 28 | 44.5 | 42.64 |
110+
| Qwen-7B-Chat | 54.57 | 52 | 29.67 | 26.5 | 42.09 |
111+
| Baichuan2-13B-Base | 56 | 43.2 | 24.33 | 41 | 41.73 |
112+
| Baichuan2-13B-Chat | 57.43 | 44.4 | 25 | 25.5 | 39.82 |
113+
| Baichuan2-7B-Base | 48.29 | 40.4 | 27 | 42 | 39.55 |
114+
| Baichuan2-7B-Chat | 58.57 | 31.6 | 27 | 31.5 | 38.91 |
115+
| DevOpsPal-7B—Chat | 56.57 | 27.2 | 25.33 | 41.5 | 38.64 |
116+
| Internlm-7B—Base | 48 | 33.2 | 29 | 35 | 37.09 |
117+
| Internlm-7B—Chat | 62.57 | 12.8 | 22.33 | 21 | 32.73 |
118+
79119
## ⏬ 数据
80120
#### 下载
81121
* 方法一:下载zip压缩文件(你也可以直接用浏览器打开下面的链接):

0 commit comments

Comments
 (0)