Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions src/instructlab/sdg/configs/knowledge/atomic_facts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
system: You are an AI assistant knowledgeable about {{domain}} domain. Be accurate but concise in response.

introduction: |
Please break down the following snippet from an article about {{domain}} into atomic facts.

principles: |
1. Make sure each fact is grounded in the given text.
2. Include any necessary information needed to explain the fact or concept
3. The atomic facts should be as simple as possible, if it’s compound sentence, break down one more time
4. For clarity, avoid using pronouns like ’it’, ’he’, ’she’, ’this’, ’that’ etc., and instead use the full names or titles.
5. Focus only on key concepts and facts. Skip any question or problems mentioned in the passage.

examples: |
To help you understand the task, here is an example:
[Passage]
The tournament was contested by ten national teams, maintaining the same format used in 2019. After six weeks of round-robin matches, India, South Africa, Australia, and New Zealand finished as the top four and qualified for the knockout stage. In the knockout stage, India and Australia beat New Zealand and South Africa, respectively, to advance to the final, played on 19 November at the Narendra Modi Stadium in Ahmedabad. Australia won the final by six wickets, winning their sixth Cricket World Cup title.
[Facts]
1. The tournament was contested by ten national teams.
2. The tournament maintained the same format used in 2019.
3. The round-robin matches lasted for six weeks.
4. India finished as one of the top four teams.
5. South Africa finished as one of the top four teams.
6. Australia finished as one of the top four teams.
7. New Zealand finished as one of the top four teams.
8. India, South Africa, Australia, and New Zealand qualified for the knockout stage.
9. In the knockout stage, India beat New Zealand.
10. In the knockout stage, Australia beat South Africa.
11. India advanced to the final.
12. Australia advanced to the final.
13. The final was played on 19 November.
14. The final was held at the Narendra Modi Stadium in Ahmedabad.
15. Australia won the final by six wickets.
16. Australia won their sixth Cricket World Cup title.
[End]


generation: |
Now it's your turn breakdown following snippet from article about {{domain}} into atomic facts following similar style as above examples
[Passage]
{{document}}
[Facts]


start_tags: [""]
end_tags: [""]
17 changes: 17 additions & 0 deletions src/instructlab/sdg/configs/knowledge/detailed_summary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
system: You are an AI assistant that is expert at summarizing text.

introduction: |
Give me detailed summary for below document, making sure all key points are covered.

principles: |
Do not add any new information.
Do not miss any key points from the provided document

examples: ""

generation: |
Document:
{{document}}

start_tags: [""]
end_tags: [""]
17 changes: 17 additions & 0 deletions src/instructlab/sdg/configs/knowledge/extractive_summary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
system: You are an AI assistant that is expert at summarizing text.

introduction: |
Give me detailed extractive summary for below document, making sure all key points are covered.

principles: |
Do not add any new information.
Do not miss any key points from the provided document

examples: ""

generation: |
Document:
{{document}}

start_tags: [""]
end_tags: [""]
Empty file.
53 changes: 53 additions & 0 deletions src/instructlab/sdg/pipelines/llama/freeform_skills.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
version: "1.0"
blocks:
- name: gen_questions
type: LLMBlock
config:
config_path: ../../configs/skills/freeform_questions.yaml
output_cols:
- question
batch_kwargs:
num_samples: 50
drop_duplicates:
- question
- name: eval_questions
type: LLMBlock
config:
config_path: ../../configs/skills/evaluate_freeform_questions.yaml
output_cols:
- evaluation
- score
- name: filter_questions
type: FilterByValueBlock
config:
filter_column: score
filter_value: 1.0
operation: eq
convert_dtype: float
drop_columns:
- evaluation
- score
- num_samples
- name: gen_responses
type: LLMBlock
config:
config_path: ../../configs/skills/freeform_responses.yaml
output_cols:
- response
- name: evaluate_qa_pair
type: LLMBlock
config:
config_path: ../../configs/skills/evaluate_freeform_pair.yaml
output_cols:
- evaluation
- score
- name: filter_qa_pair
type: FilterByValueBlock
config:
filter_column: score
filter_value: 2.0
operation: ge
convert_dtype: float
drop_columns:
- evaluation
- score
70 changes: 70 additions & 0 deletions src/instructlab/sdg/pipelines/llama/grounded_skills.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
version: "1.0"
blocks:
- name: gen_contexts
type: LLMBlock
config:
config_path: ../../configs/skills/contexts.yaml
output_cols:
- context
gen_kwargs:
temperature: 0.7
max_tokens: 4096
n: 10
seed: 42
drop_duplicates:
- context
- name: gen_grounded_questions
type: LLMBlock
config:
config_path: ../../configs/skills/grounded_questions.yaml
output_cols:
- question
batch_kwargs:
num_samples: 3
drop_duplicates:
- question
- name: eval_grounded_questions
type: LLMBlock
config:
config_path: ../../configs/skills/evaluate_grounded_questions.yaml
output_cols:
- evaluation
- score
- name: filter_grounded_questions
type: FilterByValueBlock
config:
filter_column: score
filter_value: 1.0
operation: eq
convert_dtype: float
drop_columns:
- evaluation
- score
- num_samples
- name: gen_grounded_responses
type: LLMBlock
config:
config_path: ../../configs/skills/grounded_responses.yaml
output_cols:
- response
- name: evaluate_grounded_qa_pair
type: LLMBlock
config:
config_path: ../../configs/skills/evaluate_grounded_pair.yaml
output_cols:
- evaluation
- score
- name: filter_grounded_qa_pair
type: FilterByValueBlock
config:
filter_column: score
filter_value: 2.0
operation: ge
convert_dtype: float
- name: combine_question_and_context
type: CombineColumnsBlock
config:
columns:
- context
- question
output_col: question
169 changes: 169 additions & 0 deletions src/instructlab/sdg/pipelines/llama/knowledge.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
version: "1.0"
blocks:
- name: duplicate_document_col
type: DuplicateColumnsBlock
config:
columns_map:
document: base_document

- name: gen_detailed_summary
type: LLMBlock
config:
config_path: ../../configs/knowledge/detailed_summary.yaml
output_cols:
- summary_detailed
gen_kwargs:
max_tokens: 2048

- name: gen_atomic_facts
type: LLMBlock
config:
config_path: ../../configs/knowledge/atomic_facts.yaml
output_cols:
- summary_atomic_facts
gen_kwargs:
max_tokens: 2048

- name: gen_extractive_summary
type: LLMBlock
config:
config_path: ../../configs/knowledge/extractive_summary.yaml
output_cols:
- summary_extractive
gen_kwargs:
max_tokens: 2048

- name: flatten_summary_columns
type: FlattenColumnsBlock
config:
var_cols:
- summary_detailed
- summary_extractive
- summary_atomic_facts
- base_document
value_name: summary
var_name: dataset_type

- name: rename_to_document_column
type: RenameColumnsBlock
config:
columns_map:
document: raw_document
summary: document

- name: knowledge generation
type: LLMBlock
config:
config_path: ../../configs/knowledge/generate_questions_responses.yaml
output_cols:
- question
- response
batch_kwargs:
batched: true
parser_kwargs:
parser_name: custom
parsing_pattern: '\[(?:Question|QUESTION)\]\s*(.*?)\s*\[(?:Answer|ANSWER)\]\s*(.*?)\s*(?=\[(?:Question|QUESTION)\]|$)'
parser_cleanup_tags:
- "[END]"
- "[End]"
gen_kwargs:
max_tokens: 4096

- name: eval_faithfulness_qa_pair
type: LLMBlock
config:
config_path: ../../configs/knowledge/evaluate_faithfulness.yaml
output_cols:
- explanation
- judgment
gen_kwargs:
max_tokens: 512

- name: filter_faithfulness
type: FilterByValueBlock
config:
filter_column: judgment
filter_value: "YES"
operation: eq
drop_columns:
- judgment
- explanation

- name: eval_relevancy_qa_pair
type: LLMBlock
config:
config_path: ../../configs/knowledge/evaluate_relevancy.yaml
output_cols:
- feedback
- score
gen_kwargs:
max_tokens: 512

- name: filter_relevancy
type: FilterByValueBlock
config:
filter_column: score
filter_value: 2.0
operation: eq
convert_dtype: float
drop_columns:
- feedback
- score

- name: eval_verify_question
type: LLMBlock
config:
config_path: ../../configs/knowledge/evaluate_question.yaml
output_cols:
- explanation
- rating
gen_kwargs:
max_tokens: 512

- name: filter_verify_question
type: FilterByValueBlock
config:
filter_column: rating
filter_value: 1.0
operation: eq
convert_dtype: float
drop_columns:
- explanation
- rating
- __index_level_0__

datamixing:
auxiliary_instructions:
summary_detailed:
- Provide me with a comprehensive summary of the given document.
- Prepare a detailed breakdown of the contents of the document for me.
- Summarize the document thoroughly, covering all important points.
- Create a detailed executive summary of the provided document.
- Compose a comprehensive overview of the document's content.
- Deliver a detailed synopsis of the material presented in the document.
- Furnish me with a detailed analysis of the document's key points.
- Generate a thorough summary of the main ideas in the document.
- Offer a detailed digest of the information contained in the document.
- Supply me with a comprehensive rundown of the document's contents.
summary_extractive:
- Provide me with a summary of the document using extractive methods.
- Create an extractive summary for the given document.
- Generate an extractive summary from the document that was given to you.
- Summarize the document using extractive techniques.
- Create a summary of the provided document using extractive methods.
- Generate an extractive summary for the document provided.
- Using extractive techniques, summarize the given document.
- Create a summary of the document using extractive summarization.
- Generate an extractive summary of the document that was provided.
- Summarize the provided document using extractive summarization techniques.
summary_atomic_facts:
- Identify and list all atomic facts from the document.
- Extract all key facts from the given document.
- List all the important facts from the provided document.
- Highlight all the atomic facts present in the document.
- Identify and enumerate all key facts from the given text.
- List out all the critical information from the document.
- Highlight all the essential facts from the provided text.
- Identify and summarize all the important details from the document.
- Extract all the atomic facts from the given document.
- List all the key takeaways from the provided text.
Loading