Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Home / My courses / UGRD-CYBS6101-2323T / MIDTERM EXAMINATION / Midterm Exam

Started on Thursday, 7 March 2024, 12:49 PM


State Finished
Completed on Thursday, 7 March 2024, 2:10 PM
Time taken 1 hour 20 mins
Marks 36.00/50.00
Grade 72.00 out of 100.00

Question 1

Correct

Mark 1.00 out of 1.00

What is the Kullback-Leibler (KL) distance used for?

Select one:
a. To measure the dissimilarity between two probability distributions

b. To measure the uncertainty of a probability distribution

c. To measure the predictability of a probability distribution


d. To measure the similarity between two probability distributions

Question 2

Correct

Mark 1.00 out of 1.00

What is the "E" step in the EM algorithm?

Select one:
a. The step where the expectation of the latent variables is calculated

b. The step where the model parameters are updated


c. The step where the likelihood of the model is maximized

d. The step where the prediction accuracy of the model is calculated


Question 3

Correct

Mark 1.00 out of 1.00

What is the process of evaluating the performance of a trained perceptron on unseen data called?

Select one:
a. Pruning

b. Validation
c. Testing

d. Training

Question 4

Correct

Mark 1.00 out of 1.00

The KL distance is always positive and is equal to zero only when the two probability distributions are:

Select one:
a. Identically distributed

b. Mutually exclusive
c. Independently distributed

d. Uniformly distributed

Question 5

Incorrect

Mark 0.00 out of 1.00

What is the process of using data mining techniques to identify trends and make predictions called?

Select one:
a. Data mining 
b. Data modeling

c. Data visualization

d. Data analysis
Question 6

Correct

Mark 1.00 out of 1.00

What is a batch learning algorithm?

Select one:
a. An algorithm that processes the training data one example at a time

b. An algorithm that processes the training data in small groups or batches


c. An algorithm that processes the training data in real-time

d. An algorithm that processes all of the training data at once

Question 7

Incorrect

Mark 0.00 out of 1.00

How can users access the KNIME Marketplace?

Select one:
a. All of the above

b. From the KNIME interface 


c. From the KNIME website

d. From the KNIME forum

Question 8

Correct

Mark 1.00 out of 1.00

Which of the following is NOT a limitation of the k-means algorithm?

Select one:
a. It requires the user to specify the number of clusters in advance
b. It is sensitive to the initial placement of centroids

c. It may produce suboptimal results if the clusters are not spherical

d. It is not affected by the scale of the variables


Question 9

Correct

Mark 1.00 out of 1.00

Can the least squares method be used for nonlinear data sets?

Select one:
a. It depends on the data set

b. Yes
c. It depends on the method used to transform the data set

d. No

Question 10

Incorrect

Mark 0.00 out of 1.00

The ______________ linkage criterion is a popular choice for hierarchical clustering, which merges the two clusters that have the
maximum distance between them.

Select one:
a. Single

b. Complete 

c. Average
d. Centroid

Question 11

Correct

Mark 1.00 out of 1.00

The KL distance is also known as what other measure?

Select one:
a. Shannon entropy

b. Mutual information
c. Joint entropy

d. Cross-entropy
Question 12

Correct

Mark 1.00 out of 1.00

What is the main disadvantage of the Hebb rule?

Select one:
a. It is unable to handle nonlinear relationships

b. It is prone to overfitting
c. It is unable to handle large datasets

d. It is slow to converge

Question 13

Incorrect

Mark 0.00 out of 1.00

What is a node in a Bayesian network?

Select one:
a. All of the above

b. A probabilistic relationship between two variables


c. A point in the network where two or more edges meet

d. A variable in the system being modeled 

Question 14

Correct

Mark 1.00 out of 1.00

What is the process of adjusting the weights of a perceptron based on the error calculated during validation called?

Select one:
a. Testing
b. Pruning

c. Training

d. Validation
Question 15

Correct

Mark 1.00 out of 1.00

How does the k-means algorithm determine which data points belong to which cluster?

Select one:
a. By computing the distance between data points and the centroid of each cluster

b. By evaluating the variance of each cluster


c. By comparing the data point to the characteristics of each cluster

d. By evaluating the probability that a data point belongs to each cluster

Question 16

Correct

Mark 1.00 out of 1.00

The KL distance between two discrete probability distributions P and Q is defined as:

Select one:
a. The sum of the products of the probabilities of each event in P and Q

b. The sum of the differences between the probabilities of each event in P and Q
c. The sum of the ratio of the probabilities of each event in P and Q

d. The sum of the logarithm of the ratio of the probabilities of each event in P and Q

Question 17

Correct

Mark 1.00 out of 1.00

What is the "M" step in the EM algorithm?

Select one:
a. The step where the expectation of the latent variables is calculated
b. The step where the likelihood of the model is maximized

c. The step where the model parameters are updated

d. The step where the prediction accuracy of the model is calculated


Question 18

Correct

Mark 1.00 out of 1.00

What is a parent node in a Bayesian network?

Select one:
a. None of the above

b. A node that has no parents or children in the network


c. A node that is a direct ancestor of another node in the network

d. A node that is a direct descendant of another node in the network

Question 19

Correct

Mark 1.00 out of 1.00

How is the final set of clusters determined in the k-means algorithm?

Select one:
a. By selecting the set of clusters that minimize the within-cluster variance

b. By selecting the set of clusters that maximize the sum of squared errors
c. By selecting the set of clusters that minimize the sum of squared errors

d. By selecting the set of clusters that maximize the within-cluster variance

Question 20

Correct

Mark 1.00 out of 1.00

How is the line of best fit calculated using the least squares method?

Select one:
a. By minimizing the sum of the squares of the errors between the data points and the line of best fit
b. By minimizing the mean of the data set

c. By minimizing the sum of the absolute values of the errors between the data points and the line of best fit

d. By minimizing the variance of the data set


Question 21

Incorrect

Mark 0.00 out of 1.00

What are some advantages of batch learning algorithms?

Select one:
a. They can learn from a limited amount of resources

b. They can learn from streaming data in real-time


c. They can learn from a small amount of data

d. They can learn from very large datasets 

Question 22

Incorrect

Mark 0.00 out of 1.00

What is the EM algorithm used for?

Select one:
a. Classification

b. All of the above


c. Regression

d. Clustering 

Question 23

Correct

Mark 1.00 out of 1.00

What is the process of applying machine learning algorithms to data called?

Select one:
a. Data analysis
b. Data modeling

c. Data visualization

d. Data mining
Question 24

Correct

Mark 1.00 out of 1.00

What is the role of the centroid in the k-means algorithm?

Select one:
a. It is a data point that is randomly chosen to be the initial center of a cluster

b. It is a data point that is representative of the cluster


c. It is a data point that is randomly chosen to be removed from the cluster

d. It is the center point of a cluster

Question 25

Correct

Mark 1.00 out of 1.00

How is the Hebb rule used in the training of a neural network?

Select one:
a. It is used to determine the input to the neural network

b. It is used to adjust the weights of the neural network based on the input and output
c. It is used to calculate the output of the neural network

d. It is used to determine the structure of the neural network

Question 26

Incorrect

Mark 0.00 out of 1.00

The KL distance can be used to measure the information lost when approximating one distribution with another. In this context, the
distribution being approximated is known as the:

Select one:
a. Target distribution

b. Approximation distribution
c. Reference distribution 

d. Base distribution
Question 27

Correct

Mark 1.00 out of 1.00

How is the slope of the line of best fit calculated using the least squares method?

Select one:
a. By dividing the sum of the y values by the sum of the squares of the x values

b. By dividing the sum of the product of the x values and the y values by the sum of the x values
c. By dividing the sum of the product of the x values and the y values by the sum of the squares of the x values

d. By dividing the sum of the y values by the sum of the x values

Question 28

Correct

Mark 1.00 out of 1.00

What is the EM algorithm used to estimate in the "E" step?

Select one:
a. The likelihood of the model

b. The latent variables


c. The model parameters

d. The prediction accuracy of the model

Question 29

Correct

Mark 1.00 out of 1.00

What is the advantage of using the Gaussian Naive Bayes classifier over other types of Naive Bayes classifiers?

Select one:
a. It is able to handle continuous features
b. It is more accurate

c. It is faster to train and predict

d. It is able to handle categorical features


Question 30

Incorrect

Mark 0.00 out of 1.00

The ______________ linkage criterion is a popular choice for hierarchical clustering, which merges the two clusters that have the
minimum distance between them.

Select one:
a. Centroid
b. Complete

c. Average
d. Single 

Question 31

Correct

Mark 1.00 out of 1.00

What is an example of a regression task in supervised learning?

Select one:
a. Grouping customers into different segments based on their spending habits

b. Predicting the price of a house based on its characteristics

c. Determining whether an email is spam or not


d. Predicting the stock price for the next day based on historical data

Question 32

Incorrect

Mark 0.00 out of 1.00

What is the process of calculating the error between the desired output and the actual output of a perceptron called?

Select one:
a. Pruning

b. Testing
c. Validation

d. Training 
Question 33

Correct

Mark 1.00 out of 1.00

In hierarchical clustering, the distance between clusters is typically measured using the ______________ criterion.

Select one:
a. Euclidean distance

b. Linkage criterion
c. Manhattan distance

d. Cosine similarity

Question 34

Correct

Mark 1.00 out of 1.00

What is the assumption made by the Naive Bayes classifier?

Select one:
a. That the features in the data are normally distributed

b. That the features in the data are uniformly distributed


c. That the features in the data are dependent on each other

d. That the features in the data are independent of each other

Question 35

Incorrect

Mark 0.00 out of 1.00

What is an example of a batch learning algorithm?

Select one:
a. Linear regression 
b. K-nearest neighbors

c. All of the above

d. Support vector machine


Question 36

Incorrect

Mark 0.00 out of 1.00

What is the process of transforming data into a consistent format called?

Select one:
a. Filtering

b. Sampling
c. Normalizing

d. Cleaning 

Question 37

Correct

Mark 1.00 out of 1.00

What is the least squares method used for?

Select one:
a. To solve systems of linear equations

b. To find the line of best fit for a set of data


c. To calculate the variance of a data set

d. To calculate the mean of a data set

Question 38

Correct

Mark 1.00 out of 1.00

How does the Naive Bayes classifier calculate the probability of a data point belonging to a particular class?

Select one:
a. By using the least squares method
b. By using the gradient descent algorithm

c. By using the Bayes theorem

d. By using the maximum likelihood estimation


Question 39

Incorrect

Mark 0.00 out of 1.00

What is a Bayesian network used for?

Select one:
a. To optimize the use of resources

b. To model and predict the behavior of systems 


c. All of the above

d. To perform machine learning tasks

Question 40

Correct

Mark 1.00 out of 1.00

What is the process of identifying and correcting errors in data called?

Select one:
a. Sampling

b. Filtering
c. Normalizing

d. Cleaning

Question 41

Correct

Mark 1.00 out of 1.00

What is an example of a classification task in supervised learning?

Select one:
a. Grouping customers into different segments based on their spending habits
b. Determining whether an email is spam or not

c. Predicting the price of a house based on its characteristics

d. Predicting the stock price for the next day based on historical data
Question 42

Correct

Mark 1.00 out of 1.00

In hierarchical clustering, the final clusters are represented using a ______________ diagram.

Select one:
a. Dendrogram

b. Line graph
c. Bar chart

d. Scatter plot

Question 43

Incorrect

Mark 0.00 out of 1.00

What is the main advantage of using a directed acyclic graph (DAG) over other types of graphs?

Select one:
a. DAGs are more efficient for storing and processing data

b. DAGs can represent more complex relationships between data


c. DAGs are easier to understand and visualize 

d. All of the above

Question 44

Incorrect

Mark 0.00 out of 1.00

The ______________ linkage criterion is a popular choice for hierarchical clustering, which merges the two clusters based on the
distance between their centroids.

Select one:
a. Complete

b. Single
c. Average 

d. Centroid
Question 45

Correct

Mark 1.00 out of 1.00

What is an example of a batch learning algorithm used for feature selection tasks?

Select one:
a. Recursive feature elimination

b. Variance threshold
c. Mutual information

d. All of the above

Question 46

Correct

Mark 1.00 out of 1.00

What is an example of a batch learning algorithm used for clustering tasks?

Select one:
a. K-means

b. All of the above


c. DBSCAN

d. Agglomerative clustering

Question 47

Correct

Mark 1.00 out of 1.00

What is an example of a batch learning algorithm used for classification tasks?

Select one:
a. Support vector machine
b. K-nearest neighbors

c. Decision tree

d. Linear regression
Question 48

Correct

Mark 1.00 out of 1.00

Which of the following is NOT a disadvantage of the k-means algorithm?

Select one:
a. It may produce suboptimal results if the clusters are not spherical

b. It can be computationally expensive for large datasets


c. It is sensitive to the initial placement of centroids

d. It can handle categorical variables

Question 49

Correct

Mark 1.00 out of 1.00

What is the minimum required Java version to run KNIME?

Select one:
a. Java 10

b. Java 8
c. Java 9

d. Java 7

Question 50

Correct

Mark 1.00 out of 1.00

How does supervised learning differ from unsupervised learning?

Select one:
a. Supervised learning involves predicting a value, while unsupervised learning involves clustering data
b. Supervised learning involves clustering data, while unsupervised learning involves predicting a value

c. Supervised learning involves labeled data, while unsupervised learning involves unlabeled data

d. Supervised learning involves predicting a continuous value, while unsupervised learning involves predicting a categorical
value

◄ Prelim Lab Exam

Jump to...

Midterm Lab Exam ►

You might also like