Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

DATAMINING AND DATA WAREHOUSE

By,
M.E.PAAR
RIVANAN
Abstract:
The purpose of this paper is giving a short introduction to the
concepts of Data mining and Data warehousing and an explanation
of their general possibilities and a short description of their uses in
the field of Enterprise System Integration. We also describe the
concept of data mining by comparing traditional marketing
research with relationship marketing. The background of data
mining is discussed with special emphasis paid to the various terms
in data mining such as data warehouses and data marts as well as
knowledge discovery in databases (KDD) and continuous
relationship marketing (CRM). Steps necessary for companies to
implement successful data mining projects are enumerated and
there is much scope for future research. An enterprise website
nowadays becomes one of the most important channels between
the enterprise and its existing/potential customers (visitors).
We envision a better management of visitor relationship will bring
about loyalty from the existing customers and stimulate the
interests in the enterprise from the potential customers. In this
paper, we apply the concept of CRM to the management of an
enterprise website, that is, visitor relationship management. In
other words, customers are differentiated with their different
values and served with different relationship strengthening
practices with the understanding of the visitors.
I. INTRODUCTION:

The promise of data mining in business environments is enormous. Until


recently capitalizing on that promise in a real-world business environment has sometimes
been very difficult. The promise is still as bright as ever and the recent past has taught
practitioners of data mining for CRM (customer relationship management) much about
delivering high-return, practical results. Data mining is not a universal panacea for CRM
success. Critical criteria include tools selection, business objective matching, data
discovery, preparation & delivery. Successful data mining in a CRM environment is far
more than the application of algorithms to data.
Customer Relationship Management is a broad approach to doing business. It
is holistic in that it encompasses all aspects and functions of a company, focusing on
managing the relationship between customer and company just as much between
company and customer. CRM requires a two-way street – and exchange of information
just as much as of goods and services.
There are five crucial CRM strategic business areas. Each area is examined
separately to enable a clear view of the problems to be met, the business problem to be
solved, and the methods for delivering value.
The areas chosen for scrutiny are:
1) Data preparation
2) Customer segmentation
3) Attrition
4) Cross sell and
5) E-commerce.
These extend the miner’s skill set within CRM.

a) Data warehousing:

The volume of data that a company collects may be very large, like also the databases
may be numerous. In such a case, a system that makes easier and faster the process of
retrieval information is needed. This instrument is a Data Warehouse.
A common definition of data warehouse: “A Data Warehouse is a repository of integrated
information, available for queries and analysis. Data and information are extracted from
heterogeneous sources as they are generated. This makes it much easier and more
efficient to run queries over data that originally came from different sources.” A data
warehouse is a database in which are stored data from the other databases of the
company, after that these data have been pre-processed in order to make them more
accessible.
b) Data mining:
Data mining is a method for data processing; nowadays it could be considered the
powerful one. Data mining is also known as Knowledge Discovery in Databases – KDD,
and it can be defined as a method for retrieving information from data. Information and
data is not the same thing: data is just something stored somewhere; information is
something richer.
Data mining becomes a hot topic in the last year’s thanks to increase of computing
power: previous data, which have been compiled and never analysed, have been analysed
and the data mining techniques have been improved.
The power of data mining is the ability to achieve not visible information stored in
the data. Data mining finds patterns to classify data into information. None of other
traditional data process methods is so unrelated with human way of thinking: data mining
doesn’t need a “guide” to achieve information: there’s no need to say to it what to search,
that’s way it can find precious information previously unknown.

 Relation between data mining and data ware housing:


Data mining is useful especially if there’s a great amount of data to analyse: the biggest
and the most complete data repositories actually are data warehouses. So the link between
these two things is very clear.

II. Why use DATA WAREHOUSING AND DATAMINING

a) Data warehousing:
In a company where there are different databases, organized in different ways according
to the needs of the single department or unit of the enterprise, the retrieval of the useful
information for the strategy or other “high level” decisions, like marketing or customer
service decisions, may be a difficult and slow process. On the other hand, the databases
of an enterprise are often based on different systems, like mainframes and “old” systems,
called legacy systems, and “newer” systems such as server-client architecture. So, in
order to provide an instrument that can support high-level decisions and give the right
information at the right time, integration of databases and pre-processing of the great
amount of data are needed. These are the functions that a data warehouse implements.
There is another task that data warehouse can perform. It could be useful not only to
retrieve information, but also for “create” new knowledge from the available data. In fact,
data warehouse is often used like a support for the activity of data mining.
b) Data mining:

There are a lot of methods for processing data, but most of them are deeply related with
the ideas and way of thinking of the people who are using them. They need to be guided
in some way by human intelligence. Also data mining can work in this way, but it can
work also in a more independent way from human minds. This is very useful if there’s
not a concrete idea of the information to be found. This feature in some field of research
could be very important: discover previous unconsidered relation between some diseases
and other factors, for example, can lead to find a new approach to the study of these
diseases.

III. Use in enterprise system integration

One of the main purposes of enterprise system integration is knowledge management,


and it can be split in three categories: knowledge acquisition, knowledge organization
and knowledge deployment. The first and the second category are related with data
mining and data warehousing. As previous said, data mining is a powerful instrument for
information retrieving, and this is directly related with knowledge acquisition. As regards
knowledge organization, one of the functions of data warehousing is storing data in order
to support business analysis and management decision-making. So, the use of data
warehousing and data mining can help the ESI process, but, on the other hand, the
process of creating a data warehouse and, then, performing data mining has to be led by
the business policy of the company, especially during the preprocessing of the data of the
different databases, in phases like elimination of “not useful” data and aggregation.

a) Data warehousing:

Often information is split in different database according to the needs of the different
components of the company. The marketing division has its own database, with a
structure to fulfill its needs, and so on for the sales division, the product development
division; Data stored in these ways are not very helpful for the management purpose and
for having a complete overview of the company. So through data warehousing is possible
to process and combine data in an automated way in order to fulfill needs previous
unsatisfied. This is needed for developing a decision support system.

b) Data mining:

This instrument can be a very important help for discover new information that can
support the planning of new strategies for the company, the analysis of current strategies,
the development of new products, and so on. One of the most important fields, related
with ESI, in which data mining is used, is CRM (Customer Relationship Management).
CRM is a process that manages the interactions between a company and its customers.
The primary users of CRM software applications are database marketers who are looking
to automate the process of interacting with customers. Data mining applications automate
the process of searching the mountains of data to find patterns that are good predictors of
purchasing behaviors. After mining the data, marketers must feed the results into
campaign management software that, as the name implies, manages the campaign
directed at the defined market segments.
Data mining helps marketing users to target marketing campaigns more accurately; and
also to align campaigns more closely with the needs, wants, and attitudes of customers
and prospects. If the necessary information exists in a database, the data mining process
can model virtually any customer activity. The key is to find patterns relevant to current
business problems.

IV. Architecture of dataware house:

External data
source

Decision
Support
system

EXTRACT
CLEAN
TRANSFORM Metadata OLAP
Repository SERVES
LOAD
REFRESH

DATA
MINING

Data
Warehouse
Operational
Database

 External data source: The source available outside and that can be access to the
system of dataware house.

 Operational Database: An operational database is the system that can be used to


the day-to-day operation is required in the business.
 Extract: Data extracts are subsets of data that are offloaded from the server
machine onto other machines. Extracts can be unscheduled user extract of some
query results, or they can be scheduled extracts such as data mart refreshes. Data
extraction takes data from source system and makes it available to the data
warehouse; data load takes extracted data and loads it into the data warehouse.

 Clean: What ever data that can not usable in some extents of time that is clean.
That is removal of older data.

 Transform: Metadata may be used during data transformation and load to


describe the source data and any changes that need to be made. Whether or not
you need to store any metadata for this process will depend on the complexity of
the transformation that are to be applied to the data ,when moving it from source to
data warehouse. It will also depend on the number of source system and the type of
each system.
 The more sources that are used to feed the data warehouse, the more likely it is
that you will need to store metadata about the process.

 LOAD: Once the data is extracted from the source system, it is then typically
loaded into a temporary data store in order for it to be cleaned up and made
consistent. These checks can be quite complex, and identify consistency issues
when integrating data from a number of data source. In addition, as data changes
over time, errors become apparent that have gone unnoticed because the day-to-
day discrepancies were too small to detect.

 REFRESH: The data is updating time to time.

 Metadata Repository: It will be necessary to keep changing the summaries that


are produced to match the query profiles at each point in time. If we had to
modify the warehouse manager every time we wished to add a new summery or
change an existing one, the system would be perpetually in flux. Metadata can be
used to address this issue, by data-driving the generation of summaries. Within
the database itself, we store descriptions of the summery tables we require in
terms of facts and dimensions.

 OLAP: The term OLAP is an acronym for online analytical processing. Much has
been written about the subject in the computer literature, and for a detailed
discussion should consult some of that work. For our purpose it is sufficient to
understand of the term.
OLAP is primarily all about being able to access live data online and analyze it. It is
about the methods, structures and tools required to perform this analysis. OLAP is
about rapid access to and analysis of data. OLAP tools are designed to allow
reasonably large quantities of data to be analyzed online. An OLAP tool will allow a
user to quickly perform standard analytical functions on the data and to represent both
data results graphically. The idea is to allow the user to easily manipulates and
visualize the data.
Relational technology has been around for many years, and is family well understood
these days.
 Decision Support system: DSS (decision-support system) also had known as EIS
(executive information system) support an organization’s leading decision makers
with higher level data for complex and important decisions.

V. WORKING OF data warehousing and data MINING:

a) Data warehousing
Data warehousing is something more than a second copy of data, otherwise it would be a
simply backup database. Creating and maintaining a data warehouse implies other
operations, which can be classified in: extraction, consolidation, filtering, cleansing,
transformation, aggregation and updating.
 Extraction: periodical download of new data from various databases.
 Consolidation: combination of data from different databases in order to
perform data analysis.
 Filtering: elimination of data not needed for analysis.
 Cleansing: finding and repairing errors due to data manipulations.
 Transformation: modification of data in order to make them consistent.
 Aggregation: summarization of data into appropriate units for analysis.
 Updating: adding new data.

 Data modeling for data warehouses

Multidimensional models take advantage of inherent relationship in the data to populate


data in multidimensional matrix called cubes. For the data that leads itself to dimensional
formatting, query performance in multidimensional matrices can be much better than in
the relational data model. Three examples of dimensions in a corporate data warehouse
would be the corporation‘s fiscal period, products, and region.

 A two dimensional matrix model


A standard spreadsheet is a two dimensional matrix. One example would be spreadsheet
of regional sales by product for a particular time period. Product could be shown as rows,
sales revenue for each region comprising the columns.

 Three dimensional data cube model

Adding a time dimension, such as an organization’s fiscal quarters, would produce a


three dimensional matrix, which could represented using a data cube as shown in the
figure. In the figure there is a three dimensional data cube that organizes product sales
data by fiscal quarters and sales regions. Each cell could contain data for a specific
product, specific fiscal quarter, and specific region. By including additional dimensions, a
data hypercube can be obtained, although more than three dimensions cannot be easily
visualized at all or presented graphically. The data can be queried directly in any
combination of dimensions, bypassing complex database queries. Tools exist for viewing
data according to the user’s choice of dimensions.
 Pivoting

Changing from one dimensional hierarchy (orientation) to another is easily accomplished


in a data cube by a technique called pivoting (also called rotation).In this technique the
data cube can be thought of as rotating to show different orientation of the axes.
Multidimensional models lend themselves readily to hierarchical views in what is known
as roll-up display and roll-down display.

b) Data mining:

There are a lot of techniques related with data mining, but the general process can be
described using the following steps:

1) Identification of the problem.

 Data preparation: before applying data processing techniques, data needs to be


manipulated in order to choose the relevant ones.
 Creation of data mining patterns:
Using different techniques is possible to obtain different patterns. The patterns are
obtained by selecting a training set of data (a subset of existing data used to create the
pattern) and by testing them using other subsets of data called testing sets. Testing
sets and techniques are needed in order to avoid problem like over fitting: the pattern
fits efficiently the data given but is not useful for other set of data, as it is too tied up
with training set data. To choose between different patterns generated with different
techniques a valuation of the kind of errors that the patterns are likely to generate is
needed. The choice of the technique is driven by the goal that is to be achieved: for
example, fraud recognition in an assurance company suggests the use of a technique
of classification.(data mining is used to find rules useful to classify in categories, like
“safe” and “not safe”, the costumers, using age, profession and other parameters),
products sales analysis in a supermarket needs a technique of associations
recognition (collected data are used to find new relations between products). Other
techniques are, for example, clustering and regression.

VI. Characteristics of data warehouses:


The diagram shown is explained about the characteristics of the data warehouse.
Compared with the transactional database, data warehouses are nonvolatile. That means
that information in the data warehouse is changes for less often and my be regarded as
non-real –time with periodic updating.
Back flushing DATA WAREHOUSE

OLAP

Reformattin DSSI
Cleaning MATADATA
g EIS
DATA
DATA
Databases MINING

Other Data Inputs Updates/New Data

VII. CUSTOMER RELATIONSHIPMENT MANAGEMENT:

Customer is not new, Relations are as old as a buyer and a seller and so is not
Management. The concepts of CRM have been there since the concept of buying and
selling came into being. Then, what is creating waves in today's CRM industry? Is that
small electronic 'e' changing the trend?
CRM is considered to be a software tool and a technology solution in this Information
Technology industry. In fact CRM is a strategy towards achieving a holistic view of any
partner engagement. CRM, which is a combination of marketing and business processes,
is the basic understanding of customers and how organizations measure them. The mantra
behind CRM is catering to customized needs "centrally".
As defined by "gurus" of CRM - Customer Relationship Management is a business
strategy to select and manage the most valuable customer relationships. CRM requires
customer-centric business philosophy and culture to support effective marketing, sales
and service processes. CRM applications can enable effective customer relationship
management, provided that an enterprise has the right leadership, strategy and culture.
USE OF CRM: Keeping in mind the pace at which technology is changing today, any

company which is a step ahead of others because of some web product or service will not
be able to hold on to that advantage for long. Key to stability in today's dynamic
marketplace is forging long-term relationships with the customers.
Customers can be divided into three zones:
1. Zone of defection where customers are extremely hostile and have the lowest level
of satisfaction.
2. Zone of indifference where customers are not sure. They have a medium level of
satisfaction and loyalty towards the company.
3. The third level of customers is in the zone of affection described as "Apostles".
CRM focuses on bringing customers from level 1 to level 3 and retaining apostle
customers.

Traditional Marketing Research


Today the majority of companies that consider themselves market driven are still
organized around their products. These companies position their products to a carefully
researched segment of customers whose wants are unfulfilled. To virtually guarantee
success, these companies believe that they must give additional value to the chosen
segment by differentiating their product in some unique way. Companies of this type
emphasize the refining of internal processes and outputs to meet the needs of the mass-
market and customers are treated as a homogeneous and basically passive mass.
A number of companies attempted to change or redirect their efforts in the late 1980's and
early 1990's. At that time "customer service" became a "hot" topic. Everyone from
CEO's to brand managers to hourly employees was admonished to "Take Care of the
Customer."
Traditional surveys of what the customers want or the service they have received are
what many companies rely on today. This traditional survey gives the company reliable
information on what customers think they think or what they think they want, but it may
not be what they really think or want . If you are only supplying what your Customer
wants or think they want today, you are not tapping into the unspoken needs and
unserved markets that may be the key to the customer of today and the potential
customers of tomorrow.
Companies that consider themselves market driven spend an inordinate amount of time
differentiating their product through quality improvement. It is estimated that focusing on
quality improvements are only about 10% of what you should be doing in your company.
This overriding strategy of the past was to acquire customers and respond to their
aggregate needs.
Relationship Marketing- The Modern View
Forward looking companies of today believe that customers are what sustain any business
and that they have "lifetime value," not just the value of a single sale. It is believed that
customer groups, if managed and maintained, cannot be easily copied by the competition
i.e., they are one of the few "sustainable" competitive advantages open to the company.
Progressive companies of the future will know and understand the difference between
knowledge of the customer and customer knowledge. For instance, knowledge of the
customer is knowing how many hits a browser makes on your web site, whereas
customer knowledge is knowing what to do with the hits. To benefit from this "new"

philosophy a company must change the entire business operation so that research and
development and marketing, work seamlessly and financial resources are allocated in the
"right" places.
The producers and suppliers must be able to put together the right mix of service and
information surrounding the differentiated or personalized products of the future. This
mix will be customized by creating very separate portraits of individual customers.
The technology to develop these portraits exists in today's data mining technology.
Companies are able to take information from their own company's database and augment
it with enhancement information provided by a data compiler and then apply a predictive
model to the augmented data set using sophisticated data mining techniques. In this way
we can understand some of the things the individuals in the year 2020 will want to
achieve as customers.

VIII. Benefits of Customer relation Management

 Centralized Customer Database: VRM installs a Companies folder that


provides a one-to-many relationship between company and contacts. Company
information is stored once only and maintained in one place. All contacts
reference the same company information.
 Automatic E-mail Processing: VRM automates the process of creating a new
contact and company record from information within the e-mail, increasing team
productivity and information quality. It also automates the action of logging the
transaction, and can create related tasks and appointments.

 Instant Set-Up: VRM can be installed and used immediately — there is no


lengthy set-up and implementation time. The product can be used out-of-the-box
or easily customized to meet specific organizational requirements.

 Enhancements to Outlook: VRM implements a wealth of refinements and


extensions to existing Microsoft Outlook features such as: enhanced linking to the
journal; extending the logging of activities from a private journal folder to a
public journal folder for shared access; and enforcing consistency of categories
between multiple users.

 Builds on Existing Investment: VRM leverages your existing financial


investment in Microsoft Outlook by building on current Microsoft Outlook
features. It leverages your training and implementation

IX. Application of Data mining:

Data mining technologies can be applied to a large variety of decision-making contexts in


business. In particular, areas of significant payoffs are expected to include the following:
Marketing—Applications include analysis of costumer behavior based on buying
patterns; determination of marketing strategies including advertising, store location, and
targeted mailing; segmentation of customers, stores, or products; and design of catalogs,
store layouts, and advertising campaigns.
Finance: Applications include analysis of creditworthiness of clients, segmentation of
amount receivables, performance analysis of finance investments like stocks, bonds, and
mutual funds; evaluation of financing options; and fraud detection.
Manufacturing: Applications involve the optimization of resources like machines,
manpower, and materials; optimal design of manufacturing processes, shop-floor layouts,
and production design, such as for automobiles based on customer requirements.
Health Care: Applications include discovering patterns in radiological images, analysis of
microarry (gene-chip) experimental data to relate to diseases, analyzing side effects of
drugs.

References:
 An Introduction to Database by C.J.Data
 DATA WAREHOUSING IN REAL WORLD BY Sam Anahory & Dennis
Murray.
 Fundamentals of Database System by Remez Elmasri & Shamkant B. Navathe
 www.cisco.com/edu

You might also like