Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views15 pages

Interaction Between Computers and Human Language

Uploaded by

imviswanthanss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views15 pages

Interaction Between Computers and Human Language

Uploaded by

imviswanthanss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1.​ Define the main focus of Natural Language Processing.


a) Image recognition​
b) Signal processing​
c) Interaction between computers and human language​
d) Circuit design​

2.​ Describe the two broad categories of NLP.​


a) Symbolic and Analog​
b) Rule-based and Statistical​
c) Linear and Nonlinear​
d) Sequential and Parallel​

3.​ Identify which component deals with sentence meaning.​


a) Syntax​
b) Semantics​
c) Morphology​
d) Phonology​

4.​ Classify the applications of NLP.​


a) Data mining, Sorting​
b) Machine translation, Chatbots, Sentiment analysis​
c) Hardware optimization, Storage​
d) Circuit evaluation, Compiling​

5.​ Examine which is a subfield of NLP.​


a) Compiler design​
b) Information Retrieval​
c) Operating systems​
d) Database indexing​

6.​ Locate the earliest milestone in NLP history.​


a) Google Translate​
b) ELIZA (1966)​
c) Siri​
d) Alexa
7.​ Recall the first stage of NLP pipeline.​
a) Lexical analysis​
b) Syntax analysis​
c) Semantic analysis​
d) Pragmatics​

8.​ Enumerate challenges of NLP.​


a) Ambiguity, Context, Sarcasm​
b) Sorting, Searching, Indexing​
c) Multiplication, Addition​
d) Compiling, Linking​

9.​ Identify the stage that checks grammar.​


a) Lexical analysis​
b) Syntax analysis​
c) Pragmatics​
d) Information retrieval​

10.​Distinguish between syntactic and semantic analysis.​


a) Syntax deals with meaning, semantics with structure​
b) Syntax deals with structure, semantics with meaning​
c) Both deal with phonetics​
d) Both are about speech recognition​

11.​Classify the biggest challenge in NLP.​


a) Large memory​
b) Ambiguity​
c) Parallel computation​
d) Indexing speed​

12.​Explain the role of pragmatics.​


a) Meaning of individual words​
b) Meaning in context of conversation​
c) Sound recognition​
d) Data mining

13.​Define regular expression.​


a) Random text​
b) A sequence of characters defining a search pattern​
c) Binary search tree​
d) Language compiler​

14.​Identify which symbol matches zero or more repetitions.​


a) +​
b) ?​
c) *​
d) ^​

15.​Match the symbol with its use: “^”.​


a) End of string​
b) Any digit​
c) Whitespace​
d) Start of string​

16.​Recall the regex for matching digits.​


a) [a-z]​
b) \s​
c) \d​
d) \w​

17.​Compare greedy vs non-greedy matching.​


a) Greedy takes longest match, non-greedy shortest match​
b) Both take same length​
c) Greedy is faster​
d) Non-greedy ignores regex rules​

18.​Examine practical use of regex.​


a) Compiler optimization​
b) Email validation​
c) Machine learning training​
d) File compression

19.​Define text normalization.​


a) Transforming text into standard format​
b) Compressing text​
c) Encrypting text​
d) Tokenizing text​

20.​Identify which is not part of normalization.​


a) Encryption​
b) Lowercasing​
c) Removing punctuation​
d) Expanding contractions​

21.​Describe stemming.​
a) Removing suffixes/prefixes to reach root form​
b) Converting to lowercase​
c) Adding tokens​
d) Encoding​

22.​Distinguish stemming from lemmatization.​


a) Both return random roots​
b) Lemmatization uses dictionary, stemming cuts off suffixes​
c) Lemmatization is faster​
d) Stemming uses POS tags​
23.​Recall the step applied before tokenization.​
a) Parsing​
b) Cleaning text (punctuation removal, lowercasing)​
c) Compiling​
d) POS tagging​

24.​Explain why normalization is necessary.​


a) To make text encrypted​
b) To reduce file size​
c) To make text consistent for processing​
d) To identify stopwords only

25.​Define minimum edit distance.​


a) Number of operations to convert one word into another​
b) Number of sentences in a paragraph​
c) Steps in parsing​
d) Syllables in speech​

26.​Identify the three operations in edit distance.​


a) Merge, Delete, Sort​
b) Insert, Delete, Substitute​
c) Copy, Replace, Divide​
d) Tokenize, Encode, Decode​

27.​Recall the edit distance between “kitten” and “sitting”.​


a) 2​
b) 3​
c) 4​
d) 1​

28.​Describe the algorithm commonly used.​


a) Merge Sort​
b) Quick Sort​
c) Dynamic Programming (Wagner-Fischer)​
d) BFS​

29.​Distinguish Levenshtein distance from Hamming distance.​


a) Both require equal length strings​
b) Hamming is for equal-length strings only, Levenshtein allows different lengths​
c) Levenshtein is faster​
d) Hamming allows insertions​

30.​Explain application of edit distance.​


a) POS tagging​
b) Parsing​
c) Spell correction​
d) Tokenization

31.​Define an n-gram.​
a) Random set of n tokens​
b) Sequence of n words​
c) Sequence of n characters​
d) Sentence structure​

32.​Identify bigram model.​


a) Probability of word given previous word​
b) Probability of sentence length​
c) Word embedding method​
d) Grammar parser​

33.​Recall unigram model assumption.​


a) Words occur independently​
b) Words depend on previous two words​
c) Words are random noise​
d) Word order is preserved​

34.​Compare trigram vs bigram.​


a) Trigram considers two previous words, bigram one​
b) Trigram is faster​
c) Both ignore history​
d) Bigram uses three words​

35.​Examine main problem of n-grams.​


a) Tokenization​
b) Data sparsity​
c) Large vocabulary​
d) Lowercasing​

36.​Classify the type of model n-grams belong to.​


a) Neural models​
b) Statistical models​
c) Rule-based models​
d) Machine translation
37.​ Identify the main evaluation metric.​
a) Perplexity​
b) Accuracy​
c) Recall​
d) BLEU​

38.​Describe held-out test data.​


a) Data used for training​
b) Data kept aside for evaluation​
c) Validation set​
d) Augmented data​

39.​Recall the purpose of cross-validation.​


a) Reduce vocabulary size​
b) Ensure generalization​
c) Improve syntax​
d) Normalize text​

40.​Distinguish intrinsic vs extrinsic evaluation.​


a) Intrinsic: direct measure of model; Extrinsic: task-based​
b) Both are task-based​
c) Intrinsic uses BLEU​
d) Extrinsic ignores accuracy​

41.​Explain why log probability is used.​


a) To speed up compilation​
b) To avoid underflow and simplify multiplication​
c) To reduce grammar rules​
d) To create embeddings​

42.​Examine application of BLEU score.​


a) Sentiment analysis​
b) Machine translation​
c) Speech tagging​
d) Syntax checking
43.​Identify the problem of zeros in n-grams.​
a) Negative probabilities​
b) Unseen events get probability zero​
c) Overflow in computation​
d) Division by zero​

44.​Recall why generalization is needed.​


a) To reduce file size​
b) To avoid ambiguity​
c) To assign probabilities to unseen words/sequences​
d) To improve tokenization​
45.​Compare open vs closed vocabulary.​
a) Both handle infinite words​
b) Closed has fixed vocabulary, open allows unseen words​
c) Open ignores OOV​
d) Closed allows infinite​

46.​Describe the solution for unseen words.​


a) Drop them​
b) Introduce unknown (UNK) token​
c) Ignore them​
d) Encode them​

47.​Distinguish OOV problem from ambiguity.​


a) OOV: unseen word; Ambiguity: multiple meanings​
b) Both are same​
c) OOV deals with multiple senses​
d) Ambiguity deals with spelling errors​

48.​Explain why zero probabilities are harmful.​


a) They improve speed​
b) They make sentence probability zero​
c) They reduce perplexity​
d) They simplify models
49.​Define smoothing.​
a) Technique to handle zero probabilities​
b) Removing stopwords​
c) Lowercasing text​
d) Tokenizing text​

50.​Identify a simple smoothing method.​


a) Add-one (Laplace) smoothing​
b) Regex​
c) POS tagging​
d) Parsing​

51.​Compare Laplace vs Good-Turing.​


a) Both same​
b) Good-Turing estimates probability of unseen events better​
c) Laplace is advanced​
d) Good-Turing ignores unseen events​

52.​Recall the problem with add-one smoothing.​


a) Too fast​
b) Overestimates unseen events​
c) Ignores seen events​
d) Reduces vocabulary​

53.​Describe backoff smoothing.​


a) Uses lower-order n-grams when higher-order is unavailable​
b) Ignores unseen words​
c) Only uses unigrams​
d) Normalizes text​

54.​Distinguish interpolation from backoff.​


a) Both drop higher n-grams​
b) Interpolation combines probabilities; Backoff falls back​
c) Both are same​
d) Backoff is faster
55.​Define perplexity.​
a) Random guessing​
b) Measure of how well a model predicts test data​
c) Grammar rule​
d) Probability of sentence length​

56.​Identify relation between perplexity and entropy.​


a) Perplexity = 2^(Entropy)​
b) Entropy = Perplexity²​
c) Both are unrelated​
d) Perplexity = Entropy/2​

57.​Recall lower perplexity means.​


a) Worse model​
b) Better predictive model​
c) Random model​
d) No effect​

58.​Explain entropy in NLP.​


a) Word embeddings​
b) Average information content per word​
c) Tokenization​
d) Syntax rule​

59.​Distinguish perplexity from accuracy.​


a) Accuracy is probabilistic, perplexity is binary​
b) Accuracy measures correctness, perplexity measures uncertainty​
c) Both same​
d) Perplexity uses F1-score​
60.​Describe why perplexity is exponential.​
a) To simplify​
b) Because it is derived from entropy measured in bits​
c) To normalize data​
d) To reduce vocabulary​

61.​Define morphology in NLP.​


a) Syntax analysis​
b) Study of word structure and formation​
c) Sentence meaning​
d) Pragmatics​

62.​Identify the smallest unit of meaning.​


a) Phoneme​
b) Morpheme​
c) Grapheme​
d) Token​

63.​Classify “unhappiness” into morphemes.​


a) un + happy + ness​
b) unhappy + ness​
c) un + happiness​
d) happiness​

64.​Distinguish inflectional morphemes from derivational.​


a) Both change meaning​
b) Inflection changes tense/number; derivation changes category/meaning​
c) Derivational changes tense only​
d) Inflectional creates new words​

65.​Describe the type of morphology in English.​


a) Agglutinative​
b) Inflectional​
c) Polysynthetic​
d) Isolating​

66.​Recall example of an inflectional suffix.​


a) un-​
b) -ed​
c) re-​
d) mis-
67.​Identify the word class of “quickly”.​
a) Adjective​
b) Adverb​
c) Noun​
d) Pronoun​

68.​Define open word classes.​


a) Closed set of function words​
b) Classes that accept new members (nouns, verbs, adjectives, adverbs)​
c) Classes that never change​
d) Prepositions only​

69.​Recall which is a closed class.​


a) Verb​
b) Preposition​
c) Adjective​
d) Adverb​

70.​Classify “the” in word class.​


a) Verb​
b) Determiner​
c) Adjective​
d) Pronoun​

71.​Distinguish noun vs pronoun.​


a) Both are identical​
b) Noun names things; pronoun replaces noun​
c) Pronoun is descriptive​
d) Noun is functional​

72.​Describe interjections.​
a) Complex phrases​
b) Exclamatory expressions (Oh!, Wow!)​
c) Helping verbs​
d) Closed class
73.​Define POS tagging.​
a) Tokenizing text​
b) Assigning word classes to tokens​
c) Removing stopwords​
d) Normalizing text​

74.​Identify the POS tag for “run” in “I will run fast”.​


a) Noun​
b) Verb​
c) Adjective​
d) Adverb​
75.​Describe rule-based POS tagging.​
a) Uses probabilities​
b) Uses handcrafted grammar rules​
c) Uses embeddings​
d) Uses CRFs​

76.​Recall the Penn Treebank tag for plural noun.​


a) NN​
b) NNS​
c) VB​
d) JJ​

77.​Distinguish supervised from unsupervised tagging.​


a) Both require labeled data​
b) Supervised uses labeled corpora; unsupervised uses clustering​
c) Unsupervised is faster always​
d) Both use rules only​

78.​Examine application of POS tagging.​


a) Speech synthesis​
b) Parsing and information extraction​
c) Image recognition​
d) Sorting words
79.​Define HMM.​
a) Statistical model with hidden states and observed outputs​
b) Rule-based grammar model​
c) Embedding model​
d) Parsing algorithm​

80.​Identify hidden states in POS tagging.​


a) Words​
b) POS tags​
c) Sentences​
d) Morphemes​

81.​Recall observable sequence in HMM tagging.​


a) Words in a sentence​
b) POS tags​
c) Morphemes​
d) Syntax tree​

82.​Describe transition probabilities.​


a) Probability of tag given previous tag​
b) Probability of word given tag​
c) Probability of morpheme​
d) Probability of sentence length​

83.​Distinguish emission vs transition.​


a) Both same​
b) Emission: word given tag; Transition: tag given previous tag​
c) Transition is word-based​
d) Emission ignores probabilities​

84.​Explain limitation of HMM in tagging.​


a) Always accurate​
b) Cannot handle long dependencies well​
c) Ignores syntax​
d) Uses neural networks​

85.​Define Viterbi algorithm.​


a) Sorting method​
b) Dynamic programming algorithm for most probable sequence​
c) Neural embedding method​
d) Parsing algorithm​

86.​Identify what Viterbi computes in POS tagging.​


a) Lexicon​
b) Best sequence of tags​
c) Syntax tree​
d) Lemmas​

87.​Recall Viterbi initialization step.​


a) Probability = 1 for all tags​
b) Start probabilities assigned to first word​
c) Transition matrix only​
d) Zero for all​

88.​Distinguish forward vs Viterbi algorithm.​


a) Both same​
b) Forward sums probabilities; Viterbi chooses maximum​
c) Forward ignores states​
d) Viterbi ignores probabilities​

89.​Describe backtracking in Viterbi.​


a) Recovering best tag sequence​
b) Building syntax tree​
c) Tokenizing​
d) Counting words​

90.​Examine time complexity of Viterbi.​


a) O(n)​
b) O(n × T²) (n = words, T = tags)​
c) O(T^n)​
d) O(1)
91.​Define Named Entity Recognition (NER).​
a) Identifying proper nouns like person, location, organization​
b) Tokenization​
c) POS tagging​
d) Parsing​

92.​Identify the entity in “Google was founded in California”.​


a) Founded​
b) Google = Organization, California = Location​
c) Organization only​
d) Action word​

93.​Recall common NER categories.​


a) Pronoun, Verb, Adjective​
b) Person, Location, Organization, Date​
c) Root, Stem, Affix​
d) Syntax, Pragmatics​

94.​Distinguish NER from POS tagging.​


a) Both same​
b) NER detects named entities; POS tags word classes​
c) POS is for parsing​
d) NER ignores text​

95.​Describe BIO tagging scheme.​


a) Bigram model​
b) Begin-Inside-Outside notation for entities​
c) Binary index operator​
d) Bag-of-words​

96.​Explain application of NER.​


a) Information extraction in text (e.g., news, resumes)​
b) Syntax analysis​
c) Tokenization​
d) Lowercasing
97.​Define CRFs.​
a) Neural networks​
b) Probabilistic sequence models discriminatively trained​
c) Rule-based grammar​
d) Embedding models​

98.​Identify difference between HMM and CRF.​


a) Both are generative​
b) HMM is generative; CRF is discriminative​
c) Both discriminative​
d) HMM ignores probabilities​

99.​Recall why CRFs are better for NER.​


a) Faster​
b) They capture overlapping, global features​
c) Use fewer labels​
d) No probabilities needed​

100.​ Describe feature function in CRF.​


a) Maps input sequence and label sequence to real values​
b) Tokenizes text​
c) Embeds words​
d) Parses grammar​

101.​ Distinguish linear-chain CRF.​


a) Specialized for sequential data like text​
b) Ignores sequence order​
c) Used for parsing trees​
d) Random clustering​

102.​ Explain training challenge of CRF.​


a) Easy optimization​
b) High computational cost​
c) Small data requirement​
d) No labeling needed
103.​ Define the standard metrics for NER evaluation.​
a) BLEU, Perplexity​
b) Precision, Recall, F1-score​
c) Accuracy only​
d) Word error rate​

104.​ Identify what precision measures.​


a) Correct entities out of predicted entities​
b) Correct entities out of total entities​
c) Predicted entities out of all tokens​
d) Errors in tagging​

105.​ Recall recall formula.​


a) TP / (TP+FP)​
b) TP / (TP+FN)​
c) FP / (TP+FN)​
d) FN / (TP+FP)​

106.​ Distinguish micro vs macro evaluation.​


a) Micro averages over all instances; Macro averages over classes​
b) Both same​
c) Micro ignores recall​
d) Macro ignores precision​

107.​ Describe effect of high recall but low precision.​


a) Few entities detected​
b) Many false positives included​
c) Many entities missed​
d) Perfect accuracy​

108.​ Explain CoNLL evaluation metric.​


a) F1 score for entity-level evaluation​
b) Word error rate​
c) BLEU score​
d) Entropy
109.​ ​

You might also like