This project evaluates the performance of Meta's LLaMA 3.1 variants (8B, 70B, 405B) in generating high-quality blogs from podcast transcripts. The evaluation utilizes Google's Gemini-1.5-Flash LLM Judge framework to assess generated blogs on various attributes such as clarity, grammar, and engagement.
The project focuses on converting podcast transcripts into coherent and engaging blogs using LLaMA 3.1 models. Key objectives include:
- Generating blogs from transcripts.
- Evaluating the output using an LLM-based judge.
- Analyzing the impact of model scaling on blog quality.
- Establishing a baseline with the 8B LLaMA variant.
- Blog Generation: Transforms podcast transcripts into blog posts using advanced LLaMA 3.1 models.
- LLM-Based Evaluation: Blogs are scored on clarity, grammar, tone, and engagement using Google's Gemini-1.5-Flash.
- Scalability Analysis: Compares the performance of models with 8B, 70B, and 405B parameters.
-
Dataset Preparation:
- Transcripts sourced from the Lex Fridman Podcast Dataset.
- Transcripts are segmented for processing within LLaMA's context window.
-
Blog Generation:
- Summaries generated for transcript segments.
- Summaries combined into a final blog post.
-
Evaluation:
- Blogs scored by Gemini-1.5-Flash on multiple attributes.
-
Comparison:
- Models compared using baseline scores.
- Source: HuggingFace Lex Fridman Podcast Dataset.
- Category: Science and Technology.
- Average Transcript Length: ~23,000 words.
Blogs are assessed on:
- Clarity
- Grammar & Syntax
- Tone Appropriateness
- Sentence Structure & Flow
- Engagement
- Conciseness
- Models: Meta LLaMA 3.1 (8B, 70B, 405B)
- Evaluation Framework: Gemini-1.5-Flash
- Platforms:
- Google Colab
- Groq Cloud
- Samba Nova Cloud
- Libraries: Hugging Face Transformers
The 405B model showed overall superior performance but struggled with conciseness. The 8B model, despite its smaller size, was efficient in certain metrics, particularly conciseness.
- Dynamic Chunking: Enhance context management for long transcripts.
- Tone Understanding: Improve model handling of conversational tones, humor, and sarcasm.
- Engagement Optimization: Reduce redundancy for better blog readability.
Special thanks to:
- Professor Ndapa Nakashole for guidance and insights.
- Yu Miaopeng (TA) for metric definitions and resources.
- Cloud Providers: Groq Cloud, Samba Nova, and Google Cloud AI for computational support.
Feel free to suggest any improvements or ask questions!