Generative AI Interview Flashcards
Q: What is GPT trained to do?
A: GPT is trained to predict the next token given the previous tokens. This sequential training allows
it to generate coherent text.
Q: How does attention work in Transformers?
A: Attention computes weights over all tokens, letting the model focus on the most relevant words
for each prediction. This captures long-range dependencies better than RNNs.
Q: Why are Transformers better than RNNs/LSTMs?
A: They parallelize training (process all tokens at once), handle long-range context with attention,
and scale more efficiently to billions of parameters.
Q: What’s the difference between pre-training and fine-tuning?
A: Pre-training: model learns general language patterns from massive text data. Fine-tuning: adapt
the model on a smaller, task-specific dataset (e.g., legal documents, customer support).
Q: What is prompt engineering?
A: Prompt engineering is designing inputs that guide the model toward the desired output, without
changing the model weights.
Q: Fine-tuning vs. Prompt Engineering?
A: Prompting is quick and flexible, but limited. Fine-tuning permanently teaches the model domain
knowledge or style, useful when prompts alone aren’t enough.
Q: What is few-shot and zero-shot prompting?
A: Zero-shot: ask the model directly without examples. Few-shot: provide some examples in the
prompt to guide the model’s style and format.
Q: What is RAG?
A: Retrieval-Augmented Generation combines GPT with a retrieval system (like a vector DB) to
inject external knowledge at query time.
Q: Why use a vector database?
A: Vector DBs store embeddings that capture semantic meaning, enabling similarity search. This
allows GPT to retrieve relevant text even when wording is different.
Q: Fine-tuning vs. RAG: which one for 10,000 PDFs?
A: RAG is better — it scales, is cheaper, and updates easily. Fine-tuning is costly and requires
retraining for updates.
Q: How are embeddings generated?
A: Embeddings are numerical vectors generated by models like BERT, OpenAI’s
text-embedding-ada-002, etc. They represent meaning, so similar text is close in vector space.
Q: How do diffusion models work?
A: They learn to denoise: start from random noise, remove noise step by step until an image
emerges. Training teaches them how to reverse the noising process.
Q: Difference between GPT and Diffusion models?
A: GPT generates sequentially (next token prediction). Diffusion models generate iteratively
(denoising over many steps).
Q: What are GANs vs Diffusion Models?
A: GANs use a generator and discriminator in a game to create images. Diffusion models use a
probabilistic denoising process. Diffusion tends to produce more stable, higher-quality results.
Q: How to evaluate a text generation model?
A: Automatic: Perplexity, BLEU, ROUGE. Practical: human evaluation + hallucination checks.
Q: How to evaluate an image generation model?
A: FID (Fréchet Inception Distance) for realism, IS (Inception Score) for quality/diversity, plus
human evaluation.
Q: What is perplexity?
A: Perplexity measures how well a model predicts the next word. Lower perplexity means better
predictive performance.
Q: How to deploy a GPT-based chatbot?
A: Wrap it in an API (FastAPI/Flask), connect to a vector DB for RAG, deploy on cloud (AWS/GCP),
add monitoring for hallucinations and logging for feedback.
Q: How to reduce hallucinations?
A: Use RAG with trusted knowledge sources, add guardrails with prompt engineering, and apply
human feedback reinforcement (RLHF).
Q: How to optimize large models for deployment?
A: Techniques include model quantization, pruning, knowledge distillation, and using efficient
serving libraries like ONNX or TensorRT.