Professional Documents
Culture Documents
Question Bank With 2 Marks
Question Bank With 2 Marks
Ramapuram Campus.
Department of Computer Applications
Year/Semester: III year / V semester
1
b. Support information processing by providing a solid platform of consolidated,
historical data for analysis.
“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection
of data in support of management’s decision-making process.”.
2
12. What is Temporal Database?
Temporal database store time related data .It usually stores relational data that include time
related attributes. These attributes may involve several time stamps, each having different
semantics.
13. What are Time-Series databases?
A Time-Series database stores sequences of values that change with time, such as data
collected regarding the stock exchange.
29. List out some of the real time applications of data mining.
Ex: Humidity, Raining, Ex: The temperature dropped 15 Ex: If the humidity is very high
Temperature degrees and then it started raining and the temperature drops
substantially, then atmospheres
is unlikely to hold the
moisture, so it rains.
37. Write the preprocessing steps that may be applied to the data for classification and
prediction.
o Data Cleaning
o Relevance Analysis
o Data Transformation
6
Part – B
7
33. What is linear regression?
In linear regression data are modeled using a straight line. Linear regression is the simplest form of
regression. Bivariate linear regression models a random variable Y called response variable as a
linear function of another random variable X, called a predictor variable. Y = a + b X
34. State the types of linear model and state its use?
Generalized linear model represent the theoretical foundation on which linear regression can be
applied to the modeling of categorical response variables. The types of generalized linear model
are
(i) Logistic regression
(ii) Poisson regression
37. Write the preprocessing steps that may be applied to the data for classification and
prediction.
o Data Cleaning
o Relevance Analysis
o Data Transformation
8
40. What is a “decision tree”?
It is a flow-chart like tree structure, where each internal node denotes a test on an attribute, each
branch represents an outcome of the test, and leaf nodes represent classes or class distributions.
Decision tree is a predictive model. Each branch of the tree is a classification question and leaves
of the tree are partition of the dataset with their classification.
43. How will you solve a classification problem using decision trees?
a. Decision tree induction:
Construct a decision tree using training data
b. For each ti Î D apply the decision tree to determine its class
ti - tuple
D – Database
9
50. What is the classification of association rules based on various criteria?
1. Based on the types of values handled in the rule.
a. Boolean Association rule.
b. Quantitative Association rule.
2. Based on the dimensions of data involved in the rule.
a. Single Dimensional Association rule.
b. Multi Dimensional Association rule.
3. Based on the levels of abstractions involved in the rule.
a. Single level Association rule.
b. Multi level Association rule.
4. Based on various extensions to association mining.
a. Maxpatterns.
b. Frequent closed itemsets.
10
o Separate
o Available
o Integrated
o Subject Oriented
o Not Dynamic
o Consistency
o Iterative Development
o Aggregation Performance
11
• Data mining for the Retail industry
• Data mining for the Telecommunication industry
64. What is the difference between “supervised” and unsupervised” learning scheme.
In data mining during classification the class label of each training sample is provided, this type of
training is called supervised learning (i.e) the learning of the model is supervised in that it is told to
which class each training sample belongs. Eg.:Classification In unsupervised learning the class
label of each training sample is not known and the member or set of classes to be learned may not
be known in advance. Eg.:Clustering
65. Discuss the importance of similarity metric clustering? Why is it difficult to handle
categorical data for clustering?
The process of grouping a set of physical or abstract objects into classes of similar objects is called
clustering. Similarity metric is important because it is used for outlier detection. The clustering
algorithm which is main memory based can operate only on the following two data structures
namely,
A) Data matrix
B) Dissimilarity matrix
So it is difficult to handle categorical data.
66. Why do we need to prune a decision tree? Why should we use a separate pruning data set
instead of pruning the tree with the training database?
When a decision tree is built, many of the branches will reflect animation in the training
data due to noise or outliers. Tree pruning methods are needed to address this problem of over
fitting the
data.
12
70. What do you mean by high performance data mining?
Data mining refers to extracting or mining knowledge. It involves an integration of
techniques from multiple disciplines like database technology, statistics, machine learning, neural
networks, etc. When it involves techniques from high performance computing it is referred as high
performance data mining.
73.Define OLTP?
If an on-line operational database systems is used for efficient retrieval, efficient storage
and management of large amounts of data, then the system is said to be on-line transaction
processing.
74.Define OLAP?
Data warehouse systems serves users (or) knowledge workers in the role of data analysis
and decision-making. Such systems can organize and present data in various formats. These
systems are known as on-line analytical processing systems.
88. Point out the major difference between the star schema and the snowflake schema?
The dimension table of the snowflake schema model may be kept in normalized form to
reduce redundancies. Such a table is easy to maintain and saves storage space.
89. Which is popular in the data warehouse design, star schema model (or) snowflake schema
model?
Star schema model, because the snowflake structure can reduce the effectiveness and more
joins will be needed to execute a query.
14
If the attributes of a dimension which forms a concept hierarchy such as “street<city<
province_or_state <country”, then it is said to be total order.
Country
Province or state
City
Street
Fig: Partial order for location
15
101.List out the steps of the data warehouse design process?
1. Choose a business process to model.
2. Choose the grain of the business process
3. Choose the dimensions that will apply to each fact table record.
4. Choose the measures that will populate each fact table record.
16
109. Applications of DBMiner.
The DBMiner system can be used as a general-purpose online analytical mining system for
both OLAP and data mining in relational database and datawarehouses.Used in medium to large
relational databases with fast response time.
126. What are the backend tools and utilities of Data warehouse?
Data extraction:
Get data from multiple, heterogeneous, and external sources
Data cleaning:
Detect errors in the data and rectify them when possible
Data transformation:
Convert data from legacy or host format to warehouse format
17
Load:
Sort, summarize, consolidate, compute views, check integrity, and build indicies and
partitions
Refresh
propagate the updates from the data sources to the warehouse
Information processing
• supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts
and graphs
Analytical processing
• Multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models, performing classification and
prediction, and presenting the mining results using visualization tools.
a. Tree construction
18
o At start, all the training examples are at the root
o Partition examples recursively based on selected attributes
b. Tree pruning
o Identify and remove branches that reflect noise or outliers
133. List the five primitives for specification of a data mining task.
a. task-relevant data
b. kind of knowledge to be mined
c. background knowledge
d. interestingness measures
e. knowledge presentation and visualization techniques to be used for displaying the
discovered patterns
Characterization
Discrimination
Association
Classification/prediction
Clustering
Outlier analysis
Other data mining tasks
Data reduction Obtains reduced representation in volume but produces the same or similar
analytical results.
It can be defined as Part of data reduction but with particular importance, especially for
numerical data
It reduces the number of values for a given continuous attribute by dividing the range of the
attribute into intervals. Interval labels can then be used to replace actual data values.
It reduce the data by collecting and replacing low level concepts (such as numeric values
for the attribute age) by higher level concepts (such as young, middle-aged, or senior).
• supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts
and graphs Analytical processing
• Multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
Data mining
20
• supports associations, constructing analytical models, performing classification and
prediction, and presenting the mining results using visualization tools.
Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies.
21