Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

(CSE902)

Submitted to: Submitted


by:
Robin Prakash Mathur
Amandeep Kaur
M. Tech. Dept Roll no:
26
M. Tech. 1st
Sem.
INDEX

PART-A
Q-1 Explain how the evolution of database led to data mining.
Q-2 Give a brief architecture of typical data mining system.
Q-3 “Data Mining is the extraction of knowledge” comment.

PART-B
Q-4 Compare the Operational database with Data Warehouse.
Q-5 Justify the terms Subject oriented, Non Volatile, Time variant in Data
Warehouse.
Give brief about OLAP & OLTP. Try to differentiate between them.
Q-6 Give some ideas about latest research area in Data Mining.
PART-A

1. How evolution of database led to data mining:


Data mining takes this evolutionary process beyond retrospective data access and
navigation to prospective and proactive information delivery. Data mining is ready for
application in the business community because it is supported by three technologies that are now
sufficiently mature:
• Massive data collection
• Powerful multiprocessor computers.
• Data mining algorithms
Data mining techniques are the result of a long process of research and product development.
This evolution began when business data was first stored on computers. At that time data was of
no use or less use because data was stored in format of flat files.
Then DBMS came into existence where data was arranged in a form of table so that data can
be easily interpreted. It was also done to overcome the problems of Integrity, redundancy,
concurrency control.
Even then there was the problem of redundancy, which was removed with the help of
RDBMS. Then approach of RDBMS was adopted as research area in advanced RDBMS, Data
Warehousing. Data was stored in various sections which are called Data Marts. Large increasing
volume of data needs some kind of analysis which was supported by Data Mining, Text Mining
and Web Mining.
Evolutionary Techniques
Step
Data Collection and Flat files
Creation (1960’s)
DBMS (1970 DBMS, OLTP, RDBMS
onwards
Advanced DBMS Adoption of RDBMS for research work
(1980 onwards)
Data Warehousing OLAP, Warehousing and Mining
and Mining (1980
onwards)
Web Based DBMS Web Mining
(1990 onwards)
IIS (2000 onwards) Integrated information

1. Typical Data Mining Architecture:

Basically above given architecture is a 3-tier architecture.

Data mining results are the patterns or various combinations in form of information. Only that
information is of ultimate use. Cleaning, Integration etc are excluded from Data mining concept.
Data mining applications can also refer Knowledge base to support advanced decision making.

Database: Layer

Database layer basically deals


with metadata i.e. data about
data. Various sources are taken
cared of. Data mining results are
again saved there.

Data Mining engine:

Front End:
Front End is the user interface layer. It has following prime functionalities.
Administration
Input Parameter Settings
Data Mining Results / Visualization

2. “Data Mining is an extraction of knowledge”:

Data mining refers to extracting or mining knowledge from large amount of data. Most
companies already collect and refine massive quantities of data. When implemented on high
performance client/server or parallel processing computers, data mining tools can analyze
massive databases to deliver answers to questions such as, "Which clients are most likely to
respond to my next promotional mailing, and why?"

Examples:
Risk Analysis

Given a set of current customers and an assessment of their risk-worthiness, develop descriptions
for various classes. Use these descriptions to classify a new customer into one of the risk
categories.

Targeted Marketing

Given a database of potential customers and how they have responded to a solicitation, develop a
model of customers most likely to respond positively, and use the model for more focused new
customer solicitation. Other applications are to identify buying patterns from customers; to find
associations among customer demographic characteristics, and to predict the response to mailing
campaigns.
Retail/Marketing

• Identify buying patterns from customers


• Find associations among customer demographic characteristics
• Predict response to mailing campaigns
• Market basket analysis .
Portfolio Management

Given a particular financial 'asset, predict the return on investment to determine the inclusion of
the asset in a folio or not.
Brand Loyalty

Given a customer and the product he/she uses, predict whether the customer will switch brands.

Banking

The application areas in banking are:


• Detecting patterns of fraudulent credit card use
• Identifying 'loyal' customers
• Predicting customers likely to change their credit card affiliation
• Determine credit card spending by' customer groups
• Finding hidden correlations between different financial indicators
• Identifying stock trading rules from historical market data

PART B
4. Features of Data Warehouse:
W. H. Inmon author of building the data warehouse and the guru, characterized a data
warehouse as "a subject-oriented, integrated, nonvolatile, time-variant collection of data
in support of management's decisions." Data warehouses provide access to data for
complex analysis, knowledge discovery, and decision-making.

1. Subject oriented:
Data are organized according to subject instead of application e.g. an insurance
company using a data warehouse would organize their data by costumer, premium, and
claim, instead of by different products (auto. Life etc.).
• Organized around major subjects, such as customer, product, sales.
• Focusing on the modeling and analysis of data for decision making, not on daily
operations or transaction processing.
• Provide a simple and concise view around particular subject by excluding data that
are not useful in the decision support process.
1. Integrated:
When data resides in money separate applications in the operational environment,
encoding of data is often inconsistent. For instance in one application, gender might be
coded as “m” and “f” in another by o and l. When data are moved from the operational
environment in to the data warehouse, when data are moved from the operational
environment in to the data warehouse, they assume a consistent coding convention e.g.
gender data is transformed to “m” and “f”.
• Constructed by integrating multiple, heterogeneous data sources as relational
databases, flat files, on-line transaction records.
• Providing data cleaning and data integration techniques
1. Time variant:
• The data warehouse contains a place for storing data that are five to ten years old, or
older, to be used for comparisons, trends, and forecasting. These data are not up
dated.
• The time horizon for the data warehouse is significantly longer than that of
operational systems.
• Every key structure in the data warehouse contains an element of time (explicitly or
implicitly)
1. Non-volatile:
Data are not update or changed in any way once they enter the data warehouse, but
are only loaded and accessed.
• A physically separate store of data transformed from the operational environment.
• Does not require transaction processing, recovery, and concurrency control
mechanisms.

Data warehouses have the following distinctive characteristics.


• Multidimensional conceptual view.
• Client-server architecture.
• Multi-user support.
• Accessibility.
• Transparency.
• Intuitive data manipulation.
• Consistent reporting performance.
• Flexible reporting

1. Operational systems vs. Data Warehousing:


The fundamental difference between operational systems and data warehousing systems is that
operational systems are designed to support transaction processing whereas data warehousing
systems are designed to support online analytical processing (or OLAP, for short).
Based on this fundamental difference, data usage patterns associated with operational systems
are significantly different than usage patterns associated with data warehousing systems.

Data Warehouse
The Data Warehouse is an evolving set of university data used for reporting, planning, and
decision making.The data may contain information extracted from the Operational Data Store,
campus operational systems, and external data sources. The Data Warehouse incorporates web-
based access through the eReports portal in addition to providing direct access through desktop
tools like Hyperion, Microsoft Access, and Filemaker Pro.
Operational Data Store
The OIT Operational Data Store (ODS) is a set of relational databases that contain data extracted
on a nightly basis from operational systems on campus relating to students, personnel, financial
aid, admissions, and the Billing and Accounts Receivable System (BARS). The ODS allows you
to query operational data using desktop tools like Microsoft Access and Filemaker Pro.

A comparison of operational systems and data warehousing systems

Comparison Base Operational Data Warehouse


Database
Processing High volume of transactional High volume of analytical
Data processing

Reporting Minimum Back-end reporting Often report generation

Base Process Oriented, Process Subject oriented


driven

Data Concern Current Data Historic data

Updating On regular basis On Time variant basis as data is


entered regularly and loaded
once. Read only data
Optimization Faster retrieval and update of Faster retrieval and update of
small volume of data high volume of data

Specification No Data Integrity and partial Integrity and uniqueness at


redundant data maximum possible level

Skills required Non-Trivial Expert

Examples Billing system data Student data, Financial Data

Define & Compare OLAP and OLTP

OLAP:
A white paper entitled ‘Providing OLAP (On-line Analytical Processing) to User-Analysts: An
IT Mandate’, E.F. Codd established 12 rules to define an OLAP system. In the same paper he
listed three characteristics of an OLAP system. Dr. Codd later added 6 additional features of an
OLAP system to his original twelve rules.
Three significant characteristics of an OLAP system
• Dynamic Data Analysis
This refers to time series analysis of data as opposed to static data analysis, which does not allow
for manipulation across time. In an OLAP system historical data must be able to be manipulated
over multiple data dimensions. This allows the analysts to identify trends in the business.
• Four Enterprise Data Models
The Categorical data model describes what has gone on before by comparing historical values
stored in the relational database. The Exegetical data model reflects what has previously
occurred to bring about the state, which the categorical model reflects. The Contemplative data
model supports exploration of ‘what-if’ scenarios. The Formulaic data model indicates which
values or behaviour across multiple dimensions must be introduced into the model to affect a
specific outcome.
OLAP database servers support common analytical operations including: consolidation,
drill-down, and "slicing and dicing".
• Consolidation - involves the aggregation of data such as simple roll-ups or complex
expressions involving inter-related data. For example, sales offices can be rolled-up to
districts and districts rolled-up to regions.
• Drill-Down - OLAP data servers can also go in the reverse direction and automatically
display detail data, which comprises consolidated data. This is called drill-downs.
Consolidation and drill-down are an inherent property of OLAP servers.
• "Slicing and Dicing" - Slicing and dicing refers to the ability to look at the database from
different viewpoints. One slice of the sales database might show all sales of product type
within regions. Another slice might show all sales by sales channel within each product
type. Slicing and dicing is often performed along a time axis in order to analyse trends
and find patterns.
OLTP:
A data base which in built for on line transaction processing, OLTP, is generally
regarded as inappropriate for warehousing as they have been designed with a different set
of need in mind i.e., maximizing transaction capacity and typically having hundreds of
table in order not to look out user etc. Data warehouse are interested in query processing
as opposed to transaction processing.
OLTP systems cannot be receptacle stored of repositories of facts and historical data for
business analysis. They cannot be quickly answer adhoc queries is rapid retrieval is
almost impossible. The data is inconsistent and changing, duplicate entries exist, entries
can be missing and there is an absence of historical data, which is necessary to analyses
trends. Basically OLTP offers large amounts of raw data, which is not easily understood.
The data warehouse offers the potential to retrieve and analysis information quickly and
easily. OLAP V/S OLTP is discussed in the following table:
Features OLTP OLAP
Characteristics Transactional Informational
Purpose Operations of business Analysis of businesses
Users Expert Knowledge Workers
Types of Short & Daily Complex queries for
transactions transactions Decision Making
Type of data Updated Historical
Memory Less as MB’s can also Depends upon data,
Requirements help generally more
Focus Data Information
Data Updates Up to date and Historic and
detailed, Changing in Summarised data
complete
Output Metric Detailed view based, Summary based,
fast flexible enough to cope
up with the changes.
Access Patterns Concurrent controls to Atomic operations
be maintained` need to be supported
Orientation Customer Oriented Market oriented
Data Model Normalised in RDBMS Multi-dimensional base
is also RDBMS
Access SQL SQL plus data analysis
extension

6. Latest Research Areas in Data Mining:


S. Topics Description
No.

1 3-D or 4-D reports generation


Data analysis for generating Multi-
Dimensional Reports
• Dimension reduction techniques to handle multi-
dimensional data
• Scalable algorithms for classification and
clustering
• Parallel implementations for interactive
exploration of data
2 Storing info in images in
• Image processing techniques for de-noising,
secured format
object identification, and feature extraction
3 Data Mining- A solution for business Giving Business solutions for
Enterprises problems of marketing.
• Applied statistics to ensure that the
conclusions drawn from the data are
statistically sound

4 Wavelet applications
Data Mining – Parallel Object-Oriented, De-
noising System Using Wavelet Multi-
resolution Analysis
5 Data Mining – Creating ensembles of oblique Artificial Intelligence,
decision trees with evolutionary algorithms research in any area like
and sampling sports selection
6 Data Mining – A tools for chemo informatics Specific drug prescription
for drug identification optimization

You might also like