0% found this document useful (0 votes)

2 views11 pages

COM 801assignment MST2

Uploaded by

bhatsajid8494

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views11 pages

COM 801assignment MST2

Uploaded by

bhatsajid8494

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

ASSIGNMENT II (MST II)

By
Sajid Khursheed Bhat
2021A1R175
8th Semester
Computer Science Engineering

Model Institute of Engineering & Technology (Autonomous)

(Permanently Affiliated to the University of Jammu, Accredited by NAAC with “A” Grade)
Jammu, India
2024
Assignment: COM-801 MST II

ASSIGNMENT II (MST II)

Subject Code: COM-801(Generative AI) Due Date: 22 May 2025

Question Course Outcomes Blooms’ Level Maximum Marks

Number Marks Obtained
Q1 CO 4 Understanding 4
Q2 CO 4 Analysing 4
Q3 CO 5 Evaluating 4
Q4 CO 5 Creating 4
Q5 CO 5 Evaluating 4
Total Marks 20
Faculty Signature
Email:- [email protected]

Assignment Objectives:
The assignment aims to deepen students' understanding of sequence modeling and language
generation using modern generative AI techniques. It focuses on the architecture and operational
mechanisms of Transformer models, emphasizing components such as self-attention, multi-head
attention, and positional encoding. Students will analyze the limitations of RNNs, LSTMs, and GRUs in
handling long-term dependencies and compare these with Transformer-based models. The assignment
also explores the design and application of pre-trained models like GPT for conditional text generation.
Additionally, it covers evaluation frameworks for generative models, introducing key metrics like
BLEU, ROUGE, METEOR, and Perplexity to assess model output quality across different NLP tasks.

Assignment Questions:

Q. No. Questions BL CO Marks Total

Marks
1 Explain the key components of the Transformer Understanding
architecture, including the role of self-attention and
4 4 4
feed-forward layers. How does positional encoding
contribute to the model?
2 Compare RNN, LSTM, and GRU in terms of their Analyzing
ability to handle long-term dependencies. Where do
4 4 4
they fall short in modeling context in language
tasks?
3 Analyze and summarize how GPT and BERT differ Evaluating
in their model architecture and training objectives. 5 4 4
Highlight practical use cases where each excels.
4 Draft a step-by-step pipeline for a conditional text Creating
generation system using a pre-trained transformer
5 4 4
(like GPT). Mention how input prompts and
decoding strategies affect output.
5 List and explain evaluation metrics used for Evaluating
generative text models (BLEU, ROUGE, METEOR,
Perplexity). Which metric would you choose for 5 4 4
summarization vs. dialogue generation tasks and
why?

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

Question 1: Explain the key components of the Transformer architecture,

including the role of self-attention and feed-forward layers. How does positional
encoding contribute to the model?

The Transformer architecture represents a significant shift in how natural language processing
tasks are performed. Proposed by Vaswani et al. in 2017, it removed the need for recurrence or
convolution, which were previously central to sequence modelling. The Transformer is
composed of an encoder-decoder structure, although models like BERT and GPT typically use
only one of these components. In a standard Transformer, the encoder processes the input and
the decoder generates the output.

At the heart of this architecture is the self-attention mechanism, which enables the model to
weigh and relate all words in the input sequence simultaneously. For each token, the model
computes three vectors: Query (Q), Key (K), and Value (V). The attention score is calculated
as the dot product of the Query and Key, scaled and passed through a softmax function to
assign weights. These weights are applied to the Value vectors to generate a weighted sum.
This allows each word to "attend" to other words based on relevance. For instance, in the
sentence “The cat sat on the mat,” the word “sat” will strongly attend to “cat” and “mat” to
understand context.

Multiple self-attention layers run in parallel (multi-head attention), allowing the model to learn
various semantic and syntactic relationships. After this, the output passes through a feed-
forward neural network which adds non-linearity and enables the model to learn complex
patterns. Each layer is wrapped with residual connections and layer normalization, which help
in gradient flow and training stability.

However, since Transformers do not process input sequentially like RNNs, they lack an
inherent sense of order. This is where positional encoding comes in. Positional encodings are
vectors added to input embeddings, derived from sine and cosine functions of different
frequencies. These encodings help the model understand the position of each word in a

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

sentence. Without positional encodings, “I love dogs” and “Dogs love I” would appear similar
to the model. Thus, positional encoding is a key enabler for preserving sequence information.

Following the self-attention block is the feed-forward layer, which is a fully connected neural
network applied independently to each position. It consists of two linear transformations with a
ReLU (or sometimes GELU) activation function in between. While self-attention enables the
model to exchange information between different positions in the sequence, the feed-forward
network provides a deeper transformation of each position’s representation, helping the model
generalize and learn complex patterns.

One of the challenges faced by the Transformer is that, unlike RNNs, it processes tokens in
parallel and thus lacks a built-in sense of order or sequence. This is where positional encoding
comes into play. Since the model itself does not inherently know the position of a word in the
sentence, positional encodings are added to the input embeddings to give the model
information about the order of tokens. These encodings are fixed sinusoidal functions or
learned positional vectors that are added element-wise to the word embeddings. This addition
enables the model to distinguish between tokens based not only on their identity but also on
their position in the sequence. As a result, the model can understand that in the sentence “He
went home,” the word “home” comes after “went,” which is crucial for understanding meaning.

In practice, positional encoding is a clever solution to the challenge of modeling sequential data
without using recurrence. The sinusoidal form has the advantage of being able to extrapolate to
longer sequences, while learned positional embeddings may adapt better to specific tasks.

Another critical aspect of the Transformer architecture is the use of residual connections and
layer normalization. Each sub-layer in the Transformer is wrapped with a residual connection
followed by layer normalization. This means the output of each sub-layer is added to its input,
and then normalized. These techniques help stabilize training, speed up convergence, and allow
for the stacking of many layers without the vanishing gradient problem that typically affects
deep neural networks.

To illustrate this architecture in a simple way, a diagram showing the Transformer encoder
layer would be helpful. The diagram would include an input embedding layer followed by
positional encoding, then a self-attention block, followed by a feed-forward layer, all wrapped
in residual connections and layer normalization. For a full Transformer model, stacking several
such layers leads to a powerful architecture capable of handling complex language tasks.

In summary, the Transformer model's key components—self-attention, multi-head attention,

feed-forward layers, and positional encodings—work together to enable powerful sequence
modeling without recurrence. The self-attention mechanism allows the model to focus on
relevant words regardless of their distance, while the feed-forward layers help refine these
representations. Positional encoding ensures the model captures word order, which is vital for
syntactic and semantic understanding. Together, these elements form the foundation of most
modern language models, including BERT, GPT, and many others that power applications
ranging from chatbots to translation systems.

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

Question 2: Compare RNN, LSTM, and GRU in terms of their ability to handle
long-term dependencies. Where do they fall short in modelling context in language
tasks?

Recurrent Neural Networks (RNNs) were among the first architectures used to model
sequential data such as text, speech, or time series. They process input one token at a time
while retaining a hidden state that captures prior information. However, RNNs struggle with
vanishing and exploding gradients during backpropagation through time (BPTT), especially
when sequences are long. This makes learning long-term dependencies extremely difficult,
which is a critical requirement for understanding language.

To overcome this, Long Short-Term Memory networks (LSTMs) were introduced. LSTMs add
a cell state and three gates—input, forget, and output—that regulate the flow of information.
The forget gate decides what to discard from the previous cell state, while the input gate
determines which new information to store. This gating system helps LSTMs maintain
information across longer sequences and mitigates the vanishing gradient issue.

Gated Recurrent Units (GRUs) are a simplified version of LSTMs. Instead of three gates,
GRUs use two: reset and update. The update gate decides how much past information to carry
forward, and the reset gate determines how much of the past to forget. GRUs are
computationally more efficient due to fewer parameters and are often preferred when training
resources or data are limited.

Despite their advantages, both LSTM and GRU still process sequences sequentially, which
makes them slower to train compared to parallelizable architectures like Transformers.
Moreover, they are still limited in their ability to capture very long-range dependencies due to
their inherent step-by-step design. They may also suffer from memory compression, where too
much information gets squeezed into a fixed-size hidden state, causing loss of nuanced context.

In contrast, Transformers use attention mechanisms to directly relate any two words in a
sequence, regardless of their distance. This allows them to capture global dependencies and
bidirectional context more effectively. Therefore, while RNNs, LSTMs, and GRUs are good
for moderate-length sequences, they fall short when modeling long or complex textual data
compared to attention-based models.

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

Question 3: Analyse and summarize how GPT and BERT differ in their model
architecture and training objectives. Highlight practical use cases where each
excels.

Both GPT and BERT are derived from the Transformer architecture but are built and trained
with fundamentally different objectives and components. GPT (Generative Pre-trained
Transformer) is a decoder-only architecture trained in an autoregressive manner. This means it
learns to predict the next word in a sentence given the previous words. It processes data from
left to right, making it well-suited for tasks involving generation, such as text completion or
creative writing.

On the other hand, BERT (Bidirectional Encoder Representations from Transformers) uses
only the encoder part of the Transformer. It is trained using masked language modelling, where
random tokens in a sentence are masked, and the model learns to predict them using the context
from both left and right. This bidirectional nature allows BERT to better understand the full
context, making it ideal for tasks requiring deep comprehension, such as reading
comprehension, named entity recognition, and sentiment analysis.

Architecturally, GPT stacks decoder blocks with masked self-attention, preventing the model
from seeing future tokens. BERT uses encoder blocks with full self-attention, allowing every
token to attend to all others. As a result, GPT is inherently generative, while BERT is
discriminative and contextual.

In terms of use cases, GPT shines in creative writing, conversational agents, story generation,
and even code generation. It’s frequently used in chatbot backends for generating human-like

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

responses. BERT is ideal for classification tasks, such as spam detection or intent recognition,
and extractive question answering (e.g., identifying exact answers in a passage).

The training objectives also impact generalization. GPT is better at open-ended tasks, while
BERT is more precise and consistent in structured understanding. Hybrid models like T5 and
BART try to combine the best of both approaches.

Another key distinction between GPT and BERT lies in how they handle downstream fine-
tuning. BERT typically requires task-specific architecture augmentation for fine-tuning. For
instance, in sentence classification, a classification head is added on top of the [CLS] token
output. In question answering tasks, two additional layers are added to predict the start and end
tokens of the answer span. BERT’s versatility comes from its ability to be fine-tuned with
minimal data across a wide variety of supervised tasks.

GPT, on the other hand, tends to perform well with few-shot or zero-shot learning, especially in
its later versions like GPT-2, GPT-3, and GPT-4. These models are so large and generalized
that they can adapt to new tasks with only a few examples given at inference time. This is
enabled by prompt engineering, where task instructions and examples are embedded into the
input prompt itself. For example, a prompt like: “Translate ‘Hello’ to French: Bonjour.
Translate ‘Thank you’ to French:” is enough for GPT to generate “Merci” without any
additional training. This makes GPT especially powerful in scenarios where labeled training
data is scarce or unavailable.

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

Question 4: Draft a step-by-step pipeline for a conditional text generation system

using a pre-trained transformer (like GPT). Mention how input prompts and
decoding strategies affect output.

Conditional text generation refers to the process of producing coherent and relevant text based
on a specific input or condition. In this setup, the model does not generate text randomly but in
response to a prompt or context provided beforehand. This condition could be a sentence, a
phrase, a question, or even a structured instruction that guides the model toward a specific goal.
In recent years, pre-trained transformer models like GPT (Generative Pre-trained Transformer),
T5 (Text-to-Text Transfer Transformer), and BART have become the go-to choices for
implementing such systems. Among them, GPT is particularly popular due to its strong
performance in generating fluent and context-aware language across diverse tasks.

The process begins with identifying the task and the kind of output desired. Conditional
generation can be used for applications like storytelling, dialogue generation, summarization,
email drafting, code generation, and more. For example, if the objective is to create a product
review from a product description, the description becomes the input condition. This condition
is transformed into a prompt written in natural language. How the prompt is phrased plays a
crucial role in determining the quality of the output. For instance, a generic prompt such as
“Write” is vague and unhelpful. However, a prompt like “Write a short story about a dragon
who protects a village” provides a clear direction to the model. An even better approach might
be: “Story Prompt: A dragon guards a village. Continue the story:”, which sets the tone and
makes the model's task clear.

Once the input prompt is finalized, it is passed through a tokenizer, which breaks the text down
into smaller units called tokens. These tokens are then converted into numerical IDs, as the
transformer model can only process numerical data. Each token ID represents a sub-word or
character piece based on the model’s vocabulary. These token IDs are then fed into the
transformer’s input layer, initiating the generation process.

The text generation itself is an autoregressive process, meaning the model generates one token
at a time, using previously generated tokens as additional context. The quality and style of the
generated output depend largely on the decoding strategy chosen. The most basic method,
greedy decoding, always selects the token with the highest probability at each step. Although
this is fast and easy, it often leads to repetitive or overly simplistic text. To improve upon this,
beam search considers multiple possible continuations of a sentence and selects the most
promising path among them. While beam search provides more coherent results than greedy
decoding, it can still lack diversity.

To generate more creative and varied text, sampling-based methods are preferred. In top-k
sampling, the model selects the next token from a limited set of k most probable options,
introducing randomness while maintaining some control. Another popular method is top-p or
nucleus sampling, where the model dynamically selects tokens from the smallest possible set
whose combined probability exceeds a threshold p. This method is considered more flexible
and tends to produce more natural and diverse outputs. Additionally, temperature control can

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

be applied to influence the randomness of the model. A higher temperature (like 1.0 or above)
makes the output more unpredictable and creative, whereas a lower temperature (around 0.3 to
0.6) makes the text more focused and conservative.

After generating the token sequence, the model’s output is passed through a detokenize, which
converts the numerical token IDs back into human-readable text. The final text may be post-
processed to remove special tokens, adjust punctuation, or format the response as needed. In
real-world systems, this output might be evaluated manually by users or automatically using
metrics like BLEU, ROUGE, METEOR, or newer ones like BERTScore or BLEURT.

To understand this pipeline better, consider an example where a company wants to automate
email replies. The incoming email serves as the condition. Suppose the user sends: “I would
like to schedule a meeting next week regarding the product update.” The prompt might then be
crafted as: “Email: I would like to schedule a meeting next week regarding the product update.
Reply:”, and the model could generate: “Thank you for reaching out. I’d be happy to schedule a
meeting. Please share your availability.” This is an excellent example of how conditional
generation can be practically applied.

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

Question 5: List and explain evaluation metrics used for generative text models
(BLEU, ROUGE, METEOR, Perplexity). Which metric would you choose for
summarization vs. dialogue generation tasks and why?

Evaluating generative models is challenging because multiple correct outputs can exist for a
single input. Therefore, evaluation relies on automatic metrics as well as human judgment.
Among automatic metrics, BLEU (Bilingual Evaluation Understudy) is a precision-based score
that measures n-gram overlap between generated and reference text. It is popular in machine
translation, but often penalizes valid paraphrases that do not use the same words.

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is recall-based and widely used
in summarization. It measures how much content from the reference summary is retained in the
generated output. Variants like ROUGE-1, ROUGE-2, and ROUGE-L focus on unigram,
bigram, and longest common subsequence overlap, respectively.

METEOR improves over BLEU by accounting for synonyms, stemming, and word order. It
provides a more balanced and semantically aware evaluation, useful for tasks like dialogue
generation or paraphrasing.

Perplexity measures how well the model predicts the next word in a sequence. It is an internal
measure of fluency, where lower values indicate more confident predictions. However,
perplexity cannot assess content relevance or coherence and thus is insufficient on its own.

For summarization, ROUGE is preferred due to its emphasis on content coverage. For dialogue
generation, METEOR or human evaluation is better, as conversational quality depends more on
relevance, fluency, and diversity than word overlap.

While traditional metrics like BLEU, ROUGE, and METEOR focus on surface-level text
similarity using n-gram overlaps, they often fall short when evaluating the semantic quality of
generated text. To address this, newer metrics like BERTScore and BLEURT have been
proposed, leveraging the power of pre-trained transformer models to assess meaning rather
than just word overlap.

BERTScore uses contextual embeddings from BERT or other similar models to compare each
token in the candidate sentence with tokens in the reference sentence. Instead of looking for
exact word matches, it computes cosine similarity between embeddings. This allows
BERTScore to reward semantically similar words, even if the exact wording differs. For
example, “The boy is running” and “The child is sprinting” would receive a high BERTScore
despite having no n-gram overlap. This metric is particularly useful for tasks like
summarization, paraphrasing, and dialogue systems where semantic fidelity is more important
than surface similarity.

BLEURT (Bilingual Evaluation Understudy with Representations from Transformers) goes one
step further. It is a learned evaluation metric trained on human judgment data. BLEURT fine-
tunes a BERT-like model to predict human evaluation scores. This enables it to detect nuances

Model Institute of Engineering and Technology (Autonomous), Jammu

Assignment: COM-801 MST II

like factual accuracy, grammar, fluency, and coherence better than rule-based metrics. Because
it mimics human scoring patterns, BLEURT has shown strong correlation with human
preferences in benchmark studies.

These modern metrics, however, are computationally intensive and may require GPU
acceleration. Despite this, they are becoming increasingly popular in research and industry
because they align better with human perception of text quality.

In conclusion, for summarization tasks, BERTScore or BLEURT can provide deeper insights
into the semantic adequacy of the output. For dialogue generation, where coherence and
contextual flow matter more than surface matching, BLEURT is a promising choice when
available computationally.

Model Institute of Engineering and Technology (Autonomous), Jammu

Generative AI Interview Questions and Answers
100% (1)
Generative AI Interview Questions and Answers
7 pages
Com 801
No ratings yet
Com 801
20 pages
Gen Ai 2021a1r175 Assign Mst2
No ratings yet
Gen Ai 2021a1r175 Assign Mst2
19 pages
Generative AI
No ratings yet
Generative AI
54 pages
14.chapter10 AdvancedDeepLearningForText
No ratings yet
14.chapter10 AdvancedDeepLearningForText
22 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
Generative AI Unit 3 Notes
No ratings yet
Generative AI Unit 3 Notes
8 pages
Week 12
100% (1)
Week 12
64 pages
Transformers
No ratings yet
Transformers
10 pages
DAA FinalReport
No ratings yet
DAA FinalReport
14 pages
The Transformer Model - Revolutionizing Artificial Intelligence
No ratings yet
The Transformer Model - Revolutionizing Artificial Intelligence
6 pages
Transformer
No ratings yet
Transformer
5 pages
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
Getting Started With The Model Architecture of The Transformer
No ratings yet
Getting Started With The Model Architecture of The Transformer
103 pages
Tianzheng Troy Wang CIS498EAS499 Submission
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
51 pages
Shivam Final
No ratings yet
Shivam Final
34 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Transformer NLP
No ratings yet
Transformer NLP
15 pages
Transformer Networks
No ratings yet
Transformer Networks
53 pages
Gen AI - 15-3-25
No ratings yet
Gen AI - 15-3-25
24 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
Transformers
No ratings yet
Transformers
2 pages
GenAI For Developers
No ratings yet
GenAI For Developers
205 pages
Large Language Models
No ratings yet
Large Language Models
10 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Transformer
No ratings yet
Transformer
10 pages
Notes 2 Transformer Model Architecture
No ratings yet
Notes 2 Transformer Model Architecture
4 pages
Transformers Architecture
No ratings yet
Transformers Architecture
5 pages
Advanced Techniques in Training and Applying Large Language Models
No ratings yet
Advanced Techniques in Training and Applying Large Language Models
6 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
20 pages
Transformer
No ratings yet
Transformer
31 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
Transformers
No ratings yet
Transformers
23 pages
JioDiscover-What Is The Neural Networ
No ratings yet
JioDiscover-What Is The Neural Networ
5 pages
FDP Deep Learning Architectures and Applications
No ratings yet
FDP Deep Learning Architectures and Applications
51 pages
For A Change
No ratings yet
For A Change
10 pages
Transformers in Machine Learning - GeeksforGeeks
No ratings yet
Transformers in Machine Learning - GeeksforGeeks
9 pages
Generative Ai and Large Language Models (LLMS) : Unit - 7
No ratings yet
Generative Ai and Large Language Models (LLMS) : Unit - 7
42 pages
Good Note - Transformer
No ratings yet
Good Note - Transformer
16 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Transformer Architecture Explained in LLMs
No ratings yet
Transformer Architecture Explained in LLMs
2 pages
LectureLtR-neural IR 2
No ratings yet
LectureLtR-neural IR 2
52 pages
2024 Transformer Master
No ratings yet
2024 Transformer Master
50 pages
GEN-AI Handout 1
No ratings yet
GEN-AI Handout 1
4 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Unlocking Linguistic Intelligence - Attention Mechanisms and Transformer Architectures in NLP
No ratings yet
Unlocking Linguistic Intelligence - Attention Mechanisms and Transformer Architectures in NLP
117 pages
Transformers: Attention Is All You Need
No ratings yet
Transformers: Attention Is All You Need
54 pages
Am Ogh Seminar Report
No ratings yet
Am Ogh Seminar Report
19 pages
Unit 2 Genai
No ratings yet
Unit 2 Genai
99 pages
Transformer Model for NLP Tasks
No ratings yet
Transformer Model for NLP Tasks
2 pages
Transformer Models in NLP
No ratings yet
Transformer Models in NLP
21 pages
Transformers
No ratings yet
Transformers
12 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
19 pages
3 2transformers
No ratings yet
3 2transformers
22 pages
LTRC-MT Simple & Effective Hindi-English Neural Machine Translation Systems at WAT 2019
No ratings yet
LTRC-MT Simple & Effective Hindi-English Neural Machine Translation Systems at WAT 2019
4 pages
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
No ratings yet
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
23 pages
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
No ratings yet
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
15 pages
Classifying Emotional Engagement in Online Learning Via Deep Learning Architecture
No ratings yet
Classifying Emotional Engagement in Online Learning Via Deep Learning Architecture
8 pages
JournalPaper ASC Updated
No ratings yet
JournalPaper ASC Updated
16 pages
TrueFace A Dataset For The Detection of Synthetic Face Images From Social Networks
No ratings yet
TrueFace A Dataset For The Detection of Synthetic Face Images From Social Networks
7 pages
CNN Image Classification in MATLAB
No ratings yet
CNN Image Classification in MATLAB
8 pages
V05 SS24 DL CNNs Lecture2
No ratings yet
V05 SS24 DL CNNs Lecture2
73 pages
Computer Vision Projects With PyTorch: Design and Develop Production-Grade Models 1st Edition Akshay Kulkarni Instant Download
No ratings yet
Computer Vision Projects With PyTorch: Design and Develop Production-Grade Models 1st Edition Akshay Kulkarni Instant Download
72 pages
PAPER-EMOTION BASED AGE AND GENDER DETECTION Final
No ratings yet
PAPER-EMOTION BASED AGE AND GENDER DETECTION Final
9 pages
Traffic Signs Recognition System Using Deep Learning and CNN Approaches
No ratings yet
Traffic Signs Recognition System Using Deep Learning and CNN Approaches
91 pages
CNN Based Deep Learning Model For Deepfake Detection
No ratings yet
CNN Based Deep Learning Model For Deepfake Detection
5 pages
Acute Myeloid Leukemia Diagnosis Using Deep.2
No ratings yet
Acute Myeloid Leukemia Diagnosis Using Deep.2
8 pages
Jhankar Paper Propulsion
No ratings yet
Jhankar Paper Propulsion
13 pages
Residual Kolmogorov-Arnold Network For Enhanced Deep Learning
No ratings yet
Residual Kolmogorov-Arnold Network For Enhanced Deep Learning
16 pages
BLM5135 10 ResidualNetworks Transformer
No ratings yet
BLM5135 10 ResidualNetworks Transformer
60 pages
LithoHoD: IC Layout Hotspot Detection Framework
No ratings yet
LithoHoD: IC Layout Hotspot Detection Framework
14 pages
Vanishing & Exploding Gradient Fixes
No ratings yet
Vanishing & Exploding Gradient Fixes
41 pages
RNN Gradient Stability Techniques
No ratings yet
RNN Gradient Stability Techniques
13 pages
Enhancing Disease Detection With Weight Initialization and Residual Connections Using LeafNet For Groundnut Leaf Diseases
No ratings yet
Enhancing Disease Detection With Weight Initialization and Residual Connections Using LeafNet For Groundnut Leaf Diseases
16 pages
Neural ODES
No ratings yet
Neural ODES
32 pages
Furlan Ello 18 A
No ratings yet
Furlan Ello 18 A
10 pages
Res Net
No ratings yet
Res Net
13 pages
DCU-Net: A Dual-Channel U-Shaped Network For Image Splicing Forgery Detection
100% (1)
DCU-Net: A Dual-Channel U-Shaped Network For Image Splicing Forgery Detection
17 pages
Difference Between AlexNet, VGGNet, ResNet, and Inception
No ratings yet
Difference Between AlexNet, VGGNet, ResNet, and Inception
25 pages
Resnet50 Summary
No ratings yet
Resnet50 Summary
4 pages
Artificial Intelligence by SKOLAR
No ratings yet
Artificial Intelligence by SKOLAR
30 pages
Artificial Intelligence Programming With Python From Zero To Hero 1st Edition Perry Xiao No Waiting Time
No ratings yet
Artificial Intelligence Programming With Python From Zero To Hero 1st Edition Perry Xiao No Waiting Time
79 pages
Gender Classification Using Opencv: Under The Guidance of
No ratings yet
Gender Classification Using Opencv: Under The Guidance of
19 pages
Overview Multimodal
No ratings yet
Overview Multimodal
22 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
51 pages
AI Face Aging for Tech Students
No ratings yet
AI Face Aging for Tech Students
21 pages
Draft Skin Disease Detection Using ResNet-50
No ratings yet
Draft Skin Disease Detection Using ResNet-50
13 pages

COM 801assignment MST2

Uploaded by

COM 801assignment MST2

Uploaded by

ASSIGNMENT II (MST II)

Model Institute of Engineering & Technology (Autonomous)

ASSIGNMENT II (MST II)

Question Course Outcomes Blooms’ Level Maximum Marks

Q. No. Questions BL CO Marks Total

Model Institute of Engineering and Technology (Autonomous), Jammu

Question 1: Explain the key components of the Transformer architecture,

Model Institute of Engineering and Technology (Autonomous), Jammu

In summary, the Transformer model's key components—self-attention, multi-head attention,

Model Institute of Engineering and Technology (Autonomous), Jammu

Model Institute of Engineering and Technology (Autonomous), Jammu

Model Institute of Engineering and Technology (Autonomous), Jammu

Model Institute of Engineering and Technology (Autonomous), Jammu

Question 4: Draft a step-by-step pipeline for a conditional text generation system

Model Institute of Engineering and Technology (Autonomous), Jammu

Model Institute of Engineering and Technology (Autonomous), Jammu

Model Institute of Engineering and Technology (Autonomous), Jammu

Model Institute of Engineering and Technology (Autonomous), Jammu

You might also like