Vin AI

Uploaded by

Công Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

131 views55 pages

Vin AI

Uploaded by

Công Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 55

Given a grayscale image X with the size of 4x4 and the intensity levels in the range of [0, 15], this image is passed through a point processing function f(X) as defined as ae ese a oe Time left 1:04:05 fox) = a*loga(1 + X) + b, — where @ and B are wo constants ‘Assuming that some of the pixel values of the input and output are given as the following figure: ol2ziclé 3{_|7 5(s[s aja > x > 6|7/5 L rot : 3 id What are the values of c and d in the above figure? Oa c=3,d=7Consider applying attention mechanism to encoder-decoder seq2seq architecture, which of the following are applicable? Ga ad In the global attention mechanism the context is derived as the weighted sum of all input hidden states of the encoder. Global attention uses all input hidden states of the encoder to derive a single context c which is then applied repeatedly to generate decoded sequence. In the local attention mechanism, the context is derived as the weighted sum of half of the input hidden states of the encoder. Let h, and h, be an output hidden state at position t' and input hidden state at position s“” respectively, one way to measure the score between them to build the attention score is their dot product.Given an RGBA image of size 200x200. Compute its size, assuming no compression. Oa 120kb Ob. 80kb Oc 160kb d. 200 kb 0Consider an RNN language model for text modelling with further reference to the architecture below for the sequence of words denoted by {ws.Wa,..Wnl- Which of the following statements are true? Oa Ob. Oc Od. One can train this RNN language model by maximizing the likelihood p(wy) IT » plas The task of the RN language model is to model the conditional word in a sentence given the sequence of previous words. This architecture is an example of many-to-one seq2seq model. The language model property implies p(wibw1s.1)=plwilh)Suppose we have a collection of 3 documents: document 1 says “lions eat fat cats"; document 2 says “cats eat fat mice"; document 3 says “mice eat fat cheese’. Compute the cosine similarity of document 1 and document 2 with equal TF weighting: Oa 0.25 Ob 05 Oe 10 Od, 075 oO © 0.0Given the function f(x) = 75 S72" (a — é)?, We need to solve min, f(x) using stochastic gradient descent with learning rate n = 0.1. Assume that at iteration ¢, we have 2, = 10 and we sample a batch i; = 1, iy = 2, is = 3, i4 = 4 of indices, What is the value of 2,, at the next iteration? Oey 8.5, Ob. aty1 = 1000. Oren ees: Od au —9What is the main reason for the non-robustness of Transformer? Oa Ob. Ove Od: The masking is not robust against outliers in data The positional encoding is not robust against outliers in data The multi-head attention is not robust against outliers in data The attention matrix in each layer is not robust against outliers in dataWhich of the following actions can take to avoid overfitting? oossa 8 Increasing the complexity of the model Using dropout layers Collecting more data Increasing the learning rate Decreasing the learning rate Applying regulariser termsGiven a 2D convolution layer with kernel size 11, dilation 1, and stride 1. Which padding should we use for that layer so that the output tensor has the same spatial resolution as the input? Oa 0 Ob. 3 Oa 1 (ORGS 7 < emnD)Given n labelled data samples (;,y;) € R4 x R, i = 1,...,n, we are interested in solving the linear regression problem my Dera - yi)? + AMlBllp- If we want to recover a solution * that is sparse, what is recommended value for p? Select one: Oa p=lorp=co Ob. p=lorp=2 O c None of these. Od. p=Oorp=1Given the function f(x) = 75 Di? (x — i)?. We need to solve min, f(z) using stochastic gradient descent with learning rate n = 0.1. Assume that at iteration ¢, we have 2, = 10 and we sample a batch i, = 1, ig = 2, iy = 3, iy = 4 of indices. What is the value of 2,4 at the next iteration? Oa ay41=85. Ob. aes = 1000, Oc my=8 Od ay =9.Which loss function is most sensitive to outliers? Select one: O a. The Huber loss. O b. The 0-1 loss. @® c. The square error loss. O d. The absolute error loss. Clear my choiceNeural networks Select one: @® a. canbe used for regression as well as classication. O b. optimize a convex cost function. O c._ always output values between 0 and 1. O d. always use a differentiable loss function. Clear my choiceLet A and B be two 2 x 2 matrices such that the product matrix AB is equal ls 2] . What is trace of BABA? Oa 5 Ob. 6. Oc. None of these answers. Od 4.Let X be random variable taking only positive value with mean 1. Which of the followings cannot be true? a P(X>2)=0.7 Ob P(X>2)=04 P(X > 2) =06 . P(X > 2)=05where a is the angle between two planes: x + y + 22 = 1 and What is the value of cos( z—y—22=0? Oa $ Ob 2 Oc -2What is the rank of the linear map f : R — R where f(z, y,z) = (@t+yt+z,0—y—2z,30+y+2)What is the anti-derivative of the function f(x) = # + 2a n(x) ? Oa x+In(z) Ob. 2?In(z)-—2 Oc. 2? In(z)What is the value of the sum }>°° 4 + for |x| > 1? =z Oa FA l+z Ob 3 lcs 1What is the use of matrix in linear algebra? a. All of the mentioned. O b. Store data c. Store coordinates of a linear map.A fair coin tossed four times, what is the probability that there are at least two consecutive Heads appear? Da Ob. ele ole slo OcWhich of the following is true for reinforcement learning with human feedback (RLHF) to train large language models? © a. Human feedback for RLHF requires an iterative process for model training and human evaluation O b. Human feedback for RLHF is based on human annotation of data for the instruction classification task O c_ RLHF allows multiple people to provide feedback for large language models © 4: RLF utilizes human feedback during the reinforcement learning process O e. RLHF avoids human-designed reward functionsWhat is a true statement for projective dependency parsing? Oa Ob Oc od Multiple stacks are used for arc-standard parsing Graph-based parsing cannot be used to produce projective dependency trees Dependency arcs can be projected into a tree shape when words are put in their linear order Parsing trees are obtained by adding a swap-transition operation to transition-based parsing Organizing all arcs about the words in their linear order, dependency arcs are not crossedWhich of the following is NOT true for in-context learning with large language models? © a. In-context learning does not require the models to be trained with demonstrations O b. The number of demonstrations in in-context learning might be limited by the context window O c._In-context learning will mainly emerge when the models are large enough O d. In-context learning requires some demonstrations as the input Clear my choiceThe words room and house are in a lexical semantic relation, in which room is the (1) and house is the (2). O a (1) meronym (2) holonym © b. (1) hypernym (2) hyponym © c (1)holonym (2) meronym O d. (1) hyponym (2) hypernymWhat is the limitation of Graph Convolutional Networks (GCN)? Oa Ob. Oc Oda The training of GCNs is isomorphism that cannot learn to classify complex structures. They cannot learn to distinguish certain simple graph structures. They cannot be applied to text vision as there is no graph there. They might need many layers to work well.Which of the following statements is true when you use 1x1 convolutions in a CNN? a. It can be used for feature pooling b. It can help in dimensionality reduction It suffers less overfitting due to small kernel size d. All of the above 00080 0 9 e. None of the aboveIn the Generative Pretrained Transformer model, how is the standard multi-head attention layer modified? O a. The dimensions of the hidden vectors are adaptively increased © b. It must be masked appropriately O & More attention heads are introduced O d. Thenormalization operation is appended Clear my choice With rare exception, the verb die appears without a complement. This is an example of a__ constraint. O a count noun O b. subcategorization Oc selection O d. animacySelect the correct term for the following blank: ___ tagger uses probabilistic and statistical information to assign tags to words O a. Stochastic © b. Rule-based O c Statistical Od. Pos Is the statement "Hierarchical Softmax increases the computation complexity when compared to Softmax’ true? O a. Itdepends on the application © b. Not enough information to determine Oc False Od. TueIn a Hidden Markov model for part-of-speech tagging, observation likelihoods measure Oa Ob. Oc Od The likelihood of a word given a POS tag The likelihood of a POS tag given a word The likelihood of a POS tag given two preceding tags The likelihood of a POS tag given the preceding tagSuppose we wanted to extract reports of people visiting foreign countries, such as “Fred occasionally visited Morocco.” What is the lexicalized dependency path (LDP) for this relation? ©a Ob. Oc Od Oe Fred -> visited -> Morocco Fred <- visited -> Morocco Fred -> occasionally <- visited -> Morocco Fred <- visited <- Morocco Fred <- occasionally <- visited -> MoroccoThe Glove word embeddings O a. cannot be applied to the languages that cannot be tokenized by spaces O b. predicts the current word based on the context words O c._ predicts the context words based on the current word O d. None of the other three statementsWhat are the following steps should be done to freeze a network to inference only? Oa M b. Oc Md. Cit Put the inference code to ‘with torch.inference_mode(:* Set: ‘net.eval()’ Set: ‘net.train0’ Set: ‘for p in net.parameters(): p.requires_grad = False’ Put the inference code to ‘with torch.no_gradQ:" Set: ‘for p in net.parameters(): p.requires_grad = True’Each image is represented by a color matrix whose elements are normalized to float numbers. in [0,1]. For an input image A, gamma correction transforms it to a new image A’ = Ag=™me, with gamma is a hyper-parameter. Which value of gamma is suitable to get the enhancement result below? output Oas Oc os Od 025In stereo matching, what is the biggest advantage of image rectification? Oa Ob. Of Od Ome: All epipolar lines are perfectly vertical Images are scaled to a desirable size. Epipoles are moved to the center of the image. All epipolar lines intersect at the vanishing point All epipolar lines are perfectly horizontalWhat are the values for a, and 6 so that the homogeneous 3-vectors (1, a, b) ~ (-2, 4, -6), where ~ represents the equivalent relationship? -2,b=3 Ob a=-2,b=-3 Oc a=2,b=3 Od. a=2,b=-3In computer vision, there are many data modalities that can be represented as a sequence such as videos. What are the common ways to process these kinds of data? © a. Vision Transformer Ob. Multilayer perceptron (MLP) Mc Convolutional neural networks O d. Recurrent neural networksWhat is the main reason for the non-robustness of Transformer? Ha Ob. Oc Od. The attention matrix in each layer is not robust against outliers in data The masking is not robust against outliers in data The positional encoding is not robust against outliers in data The multi-head attention is not robust against outliers in dataIn machine learning, regularization discourages learning a more complex or flexible model] to prevent overfitting. Which of the following statements about the regularization paramet] Xis not correct? Select one: O a. Using too small a value of \ can cause your hypothesis to underfit the data. O b. None of these. O c._ Using too large a value of A can cause your hypothesis to overfit the data. @® d. Using a very large value of A cannot hurt the performance of your hypothesis.Assume that we would like to minimize the function f where f is convex and smooth function. Then, the convergence rate of the Nesterov's accelerated method is: O A. Sublinear convergence 1/t. O B. Linear convergence O C. Sublinear convergence 1/¢?. O D. Superlinear convergenceAfter training a linear regression model by minimizing the empirical loss of a training data set, we get a function f*(z) that has zero empirical loss. Which of the following statements may be WRONG? Select one or more: M a. f*(x) may have a good generalization Ob. f*(a) generalizes the best among all linear functions Mc. f*(x) may be overfitting Od. f*(x) may be underfittingthe linear regression problem nin ye 4-4) If B* is the solution of the above regression problem, then which first-order optimality condition that 8* should satisfy? Select one: Oa , V's - wu =0 ei 6" -v)=0 ~ 6" we =0 + Yee, —y)B=0 eiConsider the optimization problem to train a feed-forward NN: ming J(8) = (0) + + OY, CEs, F(ais 9)), where 0 = [(W*, BF)|E ,, f(x;3 8) returns the prediction probabilities for x; with ground- truth label y;, 9(-) is the regularization term, and CE is the cross entropy loss function. Choose all correct answers. az YY, CE(y;, f(x;; 9) is known as a regularization term. 4b. LN, CE(y;,, f(x;;9)) is known as the empirical loss. Oc Minimizing + SY, CE(yi, f(a;; 9) makes the model fitter to the testing set Od. Minimizing + YN, CE(y;, f(a;;9)) can lead to overfitting Be. Minimizing % YX, CE(y;; F(2;; 8)) makes the model fitter to the training dataset.For the K-means clustering problem, we have iid. data X,, X2,...,X,, and we would like to partition the data into K clusters based on their similarity. The quality of the clusters mainly depends on: O A. Data generating distribution O B. The dimension of the data O C The number of samples © D. The separation of the dataWhich of the following are true about generative models? Select one or more: O a. Support Vector Machine is a generative model. & b. They can be used for classification. Mc. Linear discriminant analysis is a generative model. OU d. The perceptron is a generative model.When performing regression or classification, which of the following is the correct way to preprocess the data? Select one: Oa Ob Oc Od PCA — normalize PCA output — training. Normalize the data —> PCA — training. Normalize the data — training + PCA — evaluation of performance score. Normalize the data —-> PCA — normalize PCA output — training.Astick is broken in two at random. What is the average ratio of the smaller length to the larger length? Oa. 2log2-1 1 Ob ¢ Oc 2log3-2 1 Od 3Given that f is a function of («, y) from R? to R. which of the followings could be the Hessian matrix of f? Oa (2 5 ya Ob. (2 ) yoo C1 «None of theses. Od (z 4) 2y 2xLet A and B be two events such that P(A) = P(B) = 0.8. Which of the following could be the value of P(B|A)? Ma 08 O b. 0.65 Uc 05Let f be a function from R? to R with property: min, max, f(x,y) = max, min, f(x,y) at (a9, yo). Which of the following could be the Hessian matrix of f at xo, yo? Da ( ») 0 2 Ob. (-1 0 () = Oc (3 ) 0 -2There are m unit vectors in n-dimensional space such that the angles between any two of them are the same. What is the maximal value of m? @an Ob n+1 Oa n+2 O d. None of these choices.Let X,Y be independent random variables with uniform distribution on (0, lj. Find E[(x-Yy]. Oa ok al Ob. O c. None of these. i oa tWhich of the followings could deduce that the function f is convex? Oa tf(z)+(1—t)f(y) > f(te+(1—t)y) Va,y€ R;t € [0,1]. Ob. f(z) >2? VeeR c. The second derivative of f is always positive. Od. f(z)>23 VeeRA fair coin tossed four times, what is the probability that there are at least two consecutive Heads appear? 9 Oa | 6 Ob. F mesWhich of the following could be described by Normal distribution? 0 a. Weights of the population. Ub. The number of people queuing to buy a football match’s ticket. Oc. Blood pressure. O d. The number of phone calls in one hour.

AI & Python Quiz for Developers
60% (5)
AI & Python Quiz for Developers
9 pages
Applied NLP
50% (2)
Applied NLP
8 pages
ML MCQ
100% (4)
ML MCQ
31 pages
Huawei.H13-311 - V3.0.v2022-03-02.q107: Show Answer
No ratings yet
Huawei.H13-311 - V3.0.v2022-03-02.q107: Show Answer
24 pages
Examen Deep Learning
100% (1)
Examen Deep Learning
8 pages
Data Science Quiz for Students
100% (1)
Data Science Quiz for Students
21 pages
Deep Learning MCQ Previous Year MCQ
100% (1)
Deep Learning MCQ Previous Year MCQ
11 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
Hcia Ai
100% (1)
Hcia Ai
49 pages
Cs230exam Win19 Soln
No ratings yet
Cs230exam Win19 Soln
29 pages
AI & Python Quiz for Tech Enthusiasts
No ratings yet
AI & Python Quiz for Tech Enthusiasts
24 pages
Recollected - Questions para Repasar
No ratings yet
Recollected - Questions para Repasar
8 pages
AI Mock 2
No ratings yet
AI Mock 2
17 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
MT1 SP19 Solutions
No ratings yet
MT1 SP19 Solutions
14 pages
Exam Long Questions
No ratings yet
Exam Long Questions
8 pages
Hackaton Round 1
No ratings yet
Hackaton Round 1
14 pages
Quiz AI4
No ratings yet
Quiz AI4
7 pages
Quiz AI2
No ratings yet
Quiz AI2
11 pages
CS224N NLP Deep Learning Midterm 2017
No ratings yet
CS224N NLP Deep Learning Midterm 2017
14 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
OCI Answers
No ratings yet
OCI Answers
14 pages
Applied NLP - Project - Learner Template
No ratings yet
Applied NLP - Project - Learner Template
5 pages
Share Feedback: 1Z0-1127-24: Free Certification For Oracle Generative AI (20 Q & A) - Results
No ratings yet
Share Feedback: 1Z0-1127-24: Free Certification For Oracle Generative AI (20 Q & A) - Results
19 pages
OCI GEN AI Test 1
No ratings yet
OCI GEN AI Test 1
6 pages
CS378 NLP Midterm Exam 2020
No ratings yet
CS378 NLP Midterm Exam 2020
12 pages
Top 40 Machine Learning Questions & Answers: Which of The Following Statement Is True in The Following Case?
No ratings yet
Top 40 Machine Learning Questions & Answers: Which of The Following Statement Is True in The Following Case?
34 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
Deep Learning: Encoding & Models Quiz
No ratings yet
Deep Learning: Encoding & Models Quiz
3 pages
Ucs664 Est 23
No ratings yet
Ucs664 Est 23
3 pages
Cs224n Midterm 2018 Solution
No ratings yet
Cs224n Midterm 2018 Solution
17 pages
WS 2021 Solutions
No ratings yet
WS 2021 Solutions
16 pages
OCI GEN AI Test
No ratings yet
OCI GEN AI Test
11 pages
DS3001 - DAV - Final Exam - Fall23 - v3
No ratings yet
DS3001 - DAV - Final Exam - Fall23 - v3
14 pages
Is The Data Linearly Separable?: A) Yes B) No
No ratings yet
Is The Data Linearly Separable?: A) Yes B) No
19 pages
R 2032422
No ratings yet
R 2032422
11 pages
New Microsoft Word Document 1
No ratings yet
New Microsoft Word Document 1
12 pages
NLP Quiz
No ratings yet
NLP Quiz
1 page
T Quiz1
No ratings yet
T Quiz1
4 pages
Final 2018
No ratings yet
Final 2018
15 pages
Domande ANN
No ratings yet
Domande ANN
28 pages
Assignment 6 Solution
No ratings yet
Assignment 6 Solution
3 pages
NLP MCQ Advanced Real 1 20
No ratings yet
NLP MCQ Advanced Real 1 20
7 pages
Assignment Mid
No ratings yet
Assignment Mid
13 pages
OCI Ai Exam
No ratings yet
OCI Ai Exam
19 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Updated Assignment-1 Deep Learning
No ratings yet
Updated Assignment-1 Deep Learning
3 pages
DL2024
No ratings yet
DL2024
4 pages
Exam 3
No ratings yet
Exam 3
6 pages
Ee782 Es QP 2023
No ratings yet
Ee782 Es QP 2023
2 pages
InternalTest I
No ratings yet
InternalTest I
4 pages
QP
No ratings yet
QP
3 pages
LLMS, Gpus, and Bert (Module 1)
No ratings yet
LLMS, Gpus, and Bert (Module 1)
15 pages
First Exam 24 25 Solution
No ratings yet
First Exam 24 25 Solution
13 pages
Second Exam 2021-22 Solution
No ratings yet
Second Exam 2021-22 Solution
9 pages
T243 COE 292 Quiz04 Concept
No ratings yet
T243 COE 292 Quiz04 Concept
7 pages

Vin AI

Uploaded by

Vin AI

Uploaded by

You might also like