Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 155

MANYATTA TECHNICAL AND VOCATIONAL COLLEGE

ICT DEPARTMENT

COURSE: DIPLOMA IN INFORMATION COMMUNICATION


TECHNOLOGY (ICT)
(MODULE II- KNEC)
UNIT NAME: DATABASE MANAGEMENT SYSTEMS
TRAINER: MR JAMES NJERU

Page 1 of 19 | P a g e
COURSE SUMMARY AND TIME ALLOCATION

TOPIC SUB TOPIC TIME


T p TOTAL
INTRODUCTION TO  Meaning of DBMS 18 18
DATABASE  historical evolution of DBMS
MANAGEMENT  Traditional vs. database approaches
 components of a database management
systems
 classification of database systems
 Advantages of DBMS
 Role of key players in database design and
development

DATABASE  Meaning of database organization 6 6


ORGANIZATION  Database organization approaches:
o Distributed
o Centralized
o Client/server

PRINCIPLES AND  Meaning 8 8


TECHNIQUES OF  database design cycle
DATABASE DESIGN
RELATIONAL  Meaning of relational database system 10 16
DATABASE SYSTEM  Characteristics of relational database systems
 Relational algebra
 Relational calculus

Entity Relationships  Meaning of entity relationships 2 8


 Connotations of entity relationship
 Drawing ERDs

NORMALIZATION  Meaning and importance of normalization 10 14


 Normalization rules
 performing normalization

QUERRYING A  Meaning of database query 4 24


DATABASE  Features of database query 20
 Categories of SQL statements
 Design SQL queries
 Use SQL statements to interrogate a database
Page 2 of 19 | P a g e
FUNCTION OF  Meaning 4
DATABASE  Transaction processing 4
MANAGEMENT  Concurrency controls
SYSTEM  Database recovery
 Database security and authorization

EMERGING TRENDS  Emerging trends in database management 2


system 2
 Challenges of emerging trends in database
management system
 Coping with emerging trends in database
management system

Table of Contents
1.0 Introduction to database management...............................................................................................................3
1.2 Meaning of DBMS................................................................................................................................................3
1.2 Historical evolution of DBMS...............................................................................................................................6
1.3 Traditional vs. database approaches...................................................................................................................9
1.4 Components of a database management systems............................................................................................12
1.5 Classification of database Systems....................................................................................................................13
1.6 Advantages of DBMS.........................................................................................................................................16
1.7 Role of key players in database design and development.................................................................................17
2.0: Database organization..........................................................................................................................................21
2.1 centralized database.........................................................................................................................................21
2.2 Client - Server Architecture...............................................................................................................................22
2.3 Distributed Database Systems...........................................................................................................................24
3.0 Principles and techniques of database design........................................................................................................28
3.1 Meaning............................................................................................................................................................28
Page 3 of 19 | P a g e
3.2 Database design cycle.......................................................................................................................................28
4.0 Relational database system....................................................................................................................................30
4.1 Meaning of relational database system............................................................................................................30
4.2 Relational Database Characteristics..................................................................................................................30
4.3 Relational algebra..............................................................................................................................................36
4.4 Relational Calculus............................................................................................................................................44
5.0 Entity Relationships..............................................................................................................................................50
5.1 Meaning of Entity Relationships........................................................................................................................50
5.2 Connotations of entity Relationship..................................................................................................................54
5.3 Drawing ERDs....................................................................................................................................................55
6.0 Normalization........................................................................................................................................................55
6.1 Meaning and importance of normalization.......................................................................................................55
6.2 Normalization Rule............................................................................................................................................57
6.3 Performing Normalization.................................................................................................................................65
7.0 Querying a database..............................................................................................................................................69
7.1 Meaning of database query...............................................................................................................................69
7.2 Features of database query...............................................................................................................................69
7.3 Categories of SQL statements...........................................................................................................................70
7.4 Design SQL queries............................................................................................................................................73
8.0: Function of database management system.........................................................................................................124
8.1 Transaction processing....................................................................................................................................124
8.2 Concurrency controls......................................................................................................................................125
8.3 Database recovery...........................................................................................................................................129
9.0 Emerging trends in database management system...............................................................................................132
9.1 Emerging trends in database management system.........................................................................................132
9.2 Coping with emerging trends in database management system.....................................................................133
10.0 References.........................................................................................................................................................133

Page 4 of 19 | P a g e
1.0 Introduction to database management

1.2 Meaning of DBMS

A database is a collection of information that is organized so that it can be easily accessed, updated and
managed.

DBMS

A DBMS is software that allows creation and manipulation of database, allowing users to store, process
and analyze data easily.

DBMS provides us with an interface or a tool, to perform various operations like creating database, storing
data in it, updating data, creating tables in the database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data consistency in case of
multiple users.

Examples of Popular Database Management Systems (DBMS)

MySQL Database- MySQL was found in the year of 1995. Sun Microsystems acquired MySQL in 2008
and Sun Microsystems was acquired by oracle in 2010.

MySQL is an open source relational database management system.

MySQL comes among the largest open source company of the world. MySQL is so famous due its high
efficiency, reliability and cost.

MS- Access- MS- Access was developed by Microsoft and it is a computer based application that is used to
create and maintain computer based database on desktop computers. This can be used for personal use and
for small business that needs a database

Oracle Database- Oracle database is developed by Oracle Corporation and it is the fourth generation
of Relational database management system. Oracle database is used mostly by big companies that need to
manage a large amount of data. Oracle database is very flexible and it most useful features are integrity
constrains, triggers, shared SQL, and Locking.

DB2- DB2 database is developed by IBM Corporation. DB2 is also used to store data for large companies.
It is an relational database management system and its extended version also supports object –oriented
features. The main problem with DB2 is its cost.

Page 5 of 19 | P a g e
Microsoft SQL Server- As its name shows, it was developed by Microsoft. It is an RDBMS that is used to
create computer database for MS- Windows. MS SQL Server create database that can be accessed from
workstations and with internet. Microsoft has produced many versions of SQL server depending upon the
customer demands.

File Maker- It was developed by Filemaker inc. and it is a cross-platform rdbms widely used by many
companies. It has a database engine with graphical user interface. It can be used for both windows and mac.
It gives many security features that allows user to alter database by simply dragging new element into
forms, screens and layouts.

NoSQL- It stands for not only SQL. It is different from other database management system as it is a non-
relational database management system. It is used in distributed data stores like in google and facebook that
collects terabits of data every day. It is used to store huge amount of data of social media sites that SQl
Servers can never do.

Postgresql- Postgresql a cross plat ORDBMS that runs on different operating systems like linuz, windows
and solaris etc. It is developed by PostgreSQL development group. This is and open source database that is
free to use under free software license.

MS Fox Pro-Fox pro is a DBMS initially developed by Fox software then later by Microsoft corporation.
Fox pro is the combination of both dbms and rdbms. Fox pro supports multiple relationship between DBF
Files but it lacks transactional processing.

Characteristics of Database Management Systems

A database management system has following characteristics:


1. Data stored into Tables: Data is never directly stored into the database. Data is stored into tables,
created inside the database. DBMS also allows to have relationships between tables which makes the
data more meaningful and connected. You can easily understand what type of data is stored where by
looking at all the tables created in a database.
2. Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard drives
were too expensive, unnecessary repetition of data in database was a big problem. But DBMS
follows Normalization which divides the data in such a way that repetition is minimum.
3. Data Consistency: ensures data remain consistence by implementing various integrity constraints
4. Support Multiple user and Concurrent Access: DBMS allows multiple users to work on it(update,
insert, delete data) at the same time and still manages to maintain the data consistency.

Page 6 of 19 | P a g e
5. Query Language: DBMS provides users with a simple Query language, using which data can be easily
fetched, inserted, deleted and updated in a database.
6. Security: The DBMS also takes care of the security of data, protecting the data from un-authorized
access. In a typical DBMS, we can create user accounts with different access permissions, using which
we can easily secure our data by restricting user access.
7. DBMS supports transactions, which allows us to better handle and manage data integrity in real world
applications where multi-threading is extensively used.

Database Schema

A database schema is the skeleton structure that represents the logical view of the entire database.
It defines how the data is organized and how the relations among them are associated. It formulates all the
constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive detail of
the database, which can be depicted by means of schema diagrams. It’s the database designers who design
the schema to help programmers understand the database and make it useful.

Page 7 of 19 | P a g e
A database schema can be divided broadly into two categories −
 Physical Database Schema − this schema pertains to the actual storage of data and its
form of storage like files, indices, etc. It defines how the data will be stored in a secondary
storage.
 Logical Database Schema − this schema defines all the logical constraints that need to be
applied on the data stored. It defines tables, views, and integrity constraints.

Levels of database architecture (views)

1. Physical Level
2. Conceptual Level
3. External Level

In the above diagram,


 It shows the architecture of DBMS.
 Mapping is the process of transforming request response between various database levels of architecture.
Page 8 of 19 | P a g e
 Mapping is not good for small database, because it takes more time.
 In External / Conceptual mapping, DBMS transforms a request on an external schema against the conceptual
schema.
 In Conceptual / Internal mapping, it is necessary to transform the request from the conceptual to internal levels.

1. Physical Level
 Physical level describes the physical storage structure of data in database.
 It is also known as Internal Level.
 This level is very close to physical storage of data.
 At lowest level, it is stored in the form of bits with the physical addresses on the secondary storage device.
 At highest level, it can be viewed in the form of files.
 The internal schema defines the various stored data types. It uses a physical data model.
2. Conceptual Level
 Conceptual level describes the structure of the whole database for a group of users.
 It is also called as the data model.
 Conceptual schema is a representation of the entire content of the database.
 These schema contains all the information to build relevant external records.
 It hides the internal details of physical storage.
3. External Level
 External level is related to the data which is viewed by individual end users.
 This level includes a no. of user views or external schemas.
 This level is closest to the user.
 External view describes the segment of the database that is required for a particular user group and hides the
rest of the database from that user group.

DATA INDEPENDENCE

It is the property of the database which tries to ensure that if we make any change in any level of
schema of the database, the schema immediately above it would require minimal or no need of change. It
removes the need for additional amount of work needed in adopting the single change into all the levels
above.

Data independence can be classified into the following two types:

1. Physical Data Independence: This means that for any change made in the physical schema, the need
to change the logical schema is minimal. This is practically easier to achieve.

2. Logical Data Independence: This means that for any change made in the logical schema, the need to
change the external schema is minimal; this is a little difficult to achieve.

Page 9 of 19 | P a g e
1.2 Historical evolution of DBMS

The development of database technology can be divided into three eras based on data model or structure:
navigational, SQL/relational, and post-relational.

The two main early navigational data models were the hierarchical model, epitomized by IBM's IMS
system, and the CODASYL model (network model), implemented in a number of products such as IDMS.

The relational model, first proposed in 1970 departed from this tradition by insisting that applications
should search for data by content, rather than by following links.

The relational model employs sets of ledger-style tables, each used for a different type of entity. Only in
the mid-1980s did computing hardware become powerful enough to allow the wide deployment of
relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated
in all large-scale data processing applications, and as of 2014 they remain dominant except in niche areas.
The dominant database language, standardized SQL for the relational model, has influenced database
languages for other data models.

Object databases were developed in the 1980s to overcome the inconvenience of object-relational
impedance mismatch, which led to the coining of the term "post-relational" and also the development of
hybrid object-relational databases.

The next generation of post-relational databases in the late 2000s became known as NoSQL databases,
introducing fast key-value stores and document-oriented databases. A competing "next generation" known
as NewSQL databases attempted new implementations that retained the relational/SQL model while aiming
to match the high performance of NoSQL compared to commercially available relational DBMSs.

2000s, NoSQL and New-SQL

The next generation of post-relational databases in the 2000s became known as NoSQL databases,
including fast key-value stores and document-oriented databases

NoSQL databases are often very fast, do not require fixed table schemas, avoid join operations by storing
denormalized data, and are designed to scale horizontally. The most popular NoSQL systems include
MongoDB, Couchbase, Riak, memcached, Redis, CouchDB, Hazelcast, Apache Cassandra and HBase,
which are all open-source software products.

1.3 Traditional vs. database approaches

Page 10 of 19 | P a g e
File based system

File-based system is a collection of application programs that perform services for the end-users, such as
updating, insertion, deletion adding new files to database etc. Each program defines and manages its data.

Files are stored in specific locations on the hard disk (directories). The user can create new files to place
data in, delete a file that contains data, rename the file, etc. which is known as file management; a function
provided by the Operating System (OS).

Disadvantage of Computer File-based Processing System

Although a computer file-based processing system has many advantages over manual record keeping
system, but it has some limitations. The basic disadvantages (or limitations) of computer file-based
processing system are described below.

Data Redundancy -Redundancy means having multiple copies of the same data. In computer file-based
processing system, each application program has its own data files. The same data may be duplicated in
more than one file. The duplication of data may create many problems such as:

To update a specific data/record, the same data must be updated in all files; otherwise different file may
have different information about a specific item.

A valuable storage space is wasted.

Data Inconsistency - Data inconsistency mean that different files may contain different information of a
particular object in a database. Actually redundancy leads to inconsistency. When the same data is stored in
multiple locations, the inconsistency may occur.

Data Isolation - In computer file-based system, data is isolated in separate files. It is difficult to update and
to access particular information from data files.

Data Atomicity- Data atomicity means data or record is either entered as a whole or it is not entered at all.

Data Dependence - The data stored in file depends upon the application program through which the file
was created. It means that the structure of data files is coupled with application program. The physical
structure of data files and records are defined in the application program code. It is difficult to change the
structure of data files or records. If you want to change the structure of data file (or format of file), then you
have to modify the application program.

Program Maintenance- In computer file-based processing system, the structure of data file is coupled with
the individual application programs. Therefore, any modification to a data file such as size of a data field,
its type etc. requires the modification of the application program also. This process of modifying the
program is referred to as program maintenance.

Data Sharing- In computer file-based processing systems, each application program uses its own private
data files. The computer file-based processing systems do not provide the facility to share data of a data file
among multiple users on the network.

Page 11 of 19 | P a g e
Data Security- The computer file-based processing system do not provide the proper security system
against illegal access of data. Anyone can easily change or delete valuable data stored in the data file. It is
the most complicated problem of file-processing system.

Incompatible File Format- In computer file-based processing systems, the structure of data file is coupled
with the application program and the structure of data file is dependent on the programming languages in
which the application program was developed

Database Management System Approach

The improvement of the File-Based System (FBS) was the Database Management System (DBMS) which
came up in the 60's.

The Database Management System removed the trouble of manually locating data, and having to go
through it. The user could create a suitable structure for the data beforehand, to place the information in the
database that the DBMS is managing. Hence, the physical organizing of files is done away with and
provides the user with a logical view of the data input.

A database is a collection of interrelated information stored in a database server; these data will be stored
in the form of tables. The primary aim of database is to provide a way to store and retrieve database
information fast and in an efficient manner.

Advantages

1. Control of data redundancy- Although the database approach does not remove redundancy
completely, it controls the amount of redundancy in the database.
2. Data consistency- By removing or controlling redundancy, the database approach reduces the risk
of inconsistencies occurring. It ensures all copies of the idea are kept consistent.
3. Sharing of data- Database belongs to the entire organization and can be shared by all authorized
users.
4. Improved data integrity- Database integrity provides the validity and consistency of stored data.
Integrity is usually expressed in terms of constraints, which are consistency rules that the database
is not permitted to violate.
5. Improved security- Provides protection of data from unauthorized users. It will require user names
and passwords to identify user type and their access right in the operation including retrieval,
insertion, updating and deletion.
6. Enforcement of standards- The integration of the database enforces the necessary standards
including data formats, naming conventions, documentation standards, update procedures and
access rules.
7. Economy of scale- Cost savings can be obtained by combining all organization's operational data
into one database with applications to work on one source of data.
8. Balance of conflicting requirements- By having a structural design in the database, the conflicts
between users or departments can be resolved. Decisions will be based on the base use of resources
for the organization as a whole rather than for an individual person.

Page 12 of 19 | P a g e
9. Improved data accessibility and responsiveness- by having integration in the database approach,
data accessing can cross departmental boundaries. This feature provides more functionality and
better services to the users.
10. Improved maintenance- Provides data independence. As a change of data structure in the database
will not affect the application program, it simplifies database application maintenance.
11. Increased concurrency- Database can manage concurrent data access effectively. It ensures no
interference between users that would not result any loss of information or loss of integrity.
12. Improved backing and recovery services- Modern database management system provides facilities
to minimize the amount of processing that can be lost following a failure by using the transaction
approach.

1.4 Components of a database management systems

There are five major components in the database system environment and their interrelationship is.

1. Hardware
2. Software
3. Data
4. Users
5. Procedures

1. Hardware: The hardware is the actual computer system used for keeping and accessing the database.
Conventional DBMS hardware consists of secondary storage devices, usually hard disks, on which the
database physically resides, together with the associated Input-Output devices, device controllers and· so
forth. Databases run on a' range of machines, from Microcomputers to large mainframes. Other hardware
issues for a DBMS includes database machines, which is hardware designed specifically to support a
database system.

2. Software: The software is the actual DBMS. Between the physical database itself (i.e. the data as
actually stored) and the users of the system is a layer of software, usually called the Database Management
System or DBMS. All requests from users for access to the database are handled by the DBMS. One
general function provided by the DBMS is thus the shielding of database users from complex hardware-
level detail.
Page 13 of 19 | P a g e
The DBMS allows the users to communicate with the database. In a sense, it is the mediator between the
database and the users. The DBMS controls the access and helps to maintain the consistency of the data.
Utilities are usually included as part of the DBMS. Some of the most common utilities are report writers
and application development.

3. Data: It is the most important component of DBMS environment from the end users point of view. As
shown in observes that data acts as a bridge between the machine components and the user components.
The database contains the operational data and the meta-data, the 8'data about data'.

The database should contain all the data needed by the organization. One of the major features of databases
is that the actual data are separated from the programs that use the data. A database should always be
designed, built and populated for a particular audience and for a specific purpose.

4. Users: access or retrieve data on demand using the applications and interfaces provided by the DBMS.
Each type of user needs different software capabilities. The users of a database system can be classified in
the following groups, depending on their degrees of expertise or the mode of their interactions with the
DBMS. The users can be:

5. Procedures: Procedures refer to the instructions and rules that govern the design and use of the
database. The users of the system and the staff that manage the database require documented procedures on
how to use or run the system.

These may consist of instructions on how to:

1. Log on to the DBMS.


2. Use a particular DBMS facility or application program.
3. Start and stop the DBMS.
4. Make backup copies of the database.
5. Handle hardware or software failures.

Change the structure of a table, reorganize the database across multiple disks, improve performance, or
archive data to secondary storage

1.5 Classification of database Systems

Based on the data model

Relational database – This is the most popular data model used in industries. It is based on the SQL. They
are table oriented which means data is stored in different access control tables, each has the key field whose
task is to identify each row. The tables or the files with the data are called as relations that help in
Page 14 of 19 | P a g e
designating the row or record, and columns are referred to attributes or fields. Few examples are MYSQL
(Oracle, open source), Oracle database (Oracle), Microsoft SQL server(Microsoft) and DB2(IBM).

Object oriented database – The information here is in the form of the object as used in object oriented
programming. It adds the database functionality to object programming languages. It requires less code, use
more natural data and also code bases are easy to maintain. Examples are ObjectDB (ObjectDB software).

Object relational database – Relational DBMS are evolving continuously and they have been
incorporating many concepts developed in object database leading to a new class called extended relational
database or object relational database.

Hierarchical database – In this, the information about the groups of parent or child relationships is
present in the records which is similar to the structure of a tree. Here the data follows a series of records,
set of values attached to it. They are used in industry on mainframe platforms. Examples are IMS(IBM),
Windows registry(Microsoft).

Network database – Mainly used on large digital computers. If there are more connections, then this
database is efficient. They are similar to hierarchical database, they look like a cobweb or interconnected
network of records. Examples are CA-IDMS(COMPUTER associates), IMAGE(HP).

Based on the number of users

Single user – As the name itself indicates it can support only one user at a time. It is mostly used with the
personal computer on which the data resides accessible to a single person. The user may design, maintain
and write the database programs.

Multiple users – It supports multiple users concurrently. Data can be both integrated and shared,a database
should be integrated when the same information is not need be recorded in two places. For example a
student in the college should have the database containing his information. It must be accessible to all the
departments related to him. For example the library department and the fee section department should have
information about student’s database. So in such case, we can integrate and even though database resides in
only one place both the departments will have the access to it.

Page 15 of 19 | P a g e
Based on the sites over which network is distributed

Centralized database system – The DBMS and database are stored at the single site that is used by
several other systems too. We can simply say that data here is maintained on the centralized server.

Parallel network database system – This system has the advantage of improving processing input and
output speeds. Majorly used in the applications that have query to larger database. It holds the multiple
central processing units and data storage disks in parallel.

Distributed database system – In this data and the DBMS software are distributed over several sites but
connected to the single computer.

Based on the usage

Online transaction processing (OLTP) DBMS – They manage the operational data. Database server must
be able to process lots of simple transactions per unit of time. Transactions are initiated in real time, in
simultaneous by lots of user and applications hence it must have high volume of short, simple queries.

Page 16 of 19 | P a g e
Online analytical processing (OLAP) DBMS – They use the operational data for tactical and strategical
decision making. They have limited users deal with huge amount of data and complex queries.

Big data and analytics DBMS – To cope with big data new database technologies have been introduced.
One such is NoSQL (not only SQL) which abandons the well-known relational database scheme.

Multimedia DBMS – Stores data such as text, images, audio, video and 3D games which are usually
stored in binary large object

1.6 Advantages of DBMS

1. Control of data redundancy- Although the database approach does not remove redundancy
completely, it controls the amount of redundancy in the database.
2. Data consistency- By removing or controlling redundancy, the database approach reduces the risk
of inconsistencies occurring. It ensures all copies of the idea are kept consistent.
3. Sharing of data- Database belongs to the entire organization and can be shared by all authorized
users.
4. Improved data integrity- Database integrity provides the validity and consistency of stored data.
Integrity is usually expressed in terms of constraints, which are consistency rules that the database
is not permitted to violate.
5. Improved security- Provides protection of data from unauthorized users. It will require user names
and passwords to identify user type and their access right in the operation including retrieval,
insertion, updating and deletion.
6. Enforcement of standards- The integration of the database enforces the necessary standards
including data formats, naming conventions, documentation standards, update procedures and
access rules.
7. Economy of scale- Cost savings can be obtained by combining all organization's operational data
into one database with applications to work on one source of data.
8. Balance of conflicting requirements- By having a structural design in the database, the conflicts
between users or departments can be resolved. Decisions will be based on the base use of resources
for the organization as a whole rather than for an individual person.
9. Improved data accessibility and responsiveness- by having integration in the database approach,
data accessing can cross departmental boundaries. This feature provides more functionality and
better services to the users.
10. Improved maintenance- Provides data independence. As a change of data structure in the database
will affect the application program, it simplifies database application maintenance.
11. Increased concurrency- Database can manage concurrent data access effectively. It ensures no
interference between users that would not result any loss of information or loss of integrity.

Disadvantages of DBMS

The disadvantages of the database approach are summarized as follows:

1. Complexity: The provision of the functionality that is expected of a good DBMS makes the DBMS an
extremely complex system.
Page 17 of 19 | P a g e
2. Size: The complexity and breadth of functionality makes the DBMS an extremely large piece of
software, occupying many megabytes of disk space and requiring substantial amounts of memory to run
efficiently.

3. Performance: Since DBMS is written to be more general i.e. to cater for many applications rather than
just one. The effect is that some applications may not run as fast as they do in file based system.

4. Higher impact of a failure: The centralization of resources increases the vulnerability of the system.
Since all users and applications rely on the availabi1ity of the DBMS, the failure of any component can
bring operations to a halt.

5. Cost of DBMS: The cost of DBMS varies significantly, depending on the environment and functionality
provided. There is also the recurrent annual maintenance cost.

6. Cost of Conversion: In some situations, the cost of the DBMS and extra hardware may be insignificant
compared with the cost of converting existing applications to run on the new DBMS and hardware. This
cost also includes the cost of training staff to use these new systems and possibly the employment of
specialist staff to help with conversion and running of the system. This cost is one of the main reasons why
some organizations feel tied to their current systems and cannot switch to modern database technology.

1.7 Role of key players in database design and development

DATABASE ADMINISTRATOR

A database administrator (DBA) is an IT professional responsible for


the installation, configuration, upgrading,administration, monitoring, maintenance,
and security of databases in an organization.
The role includes the development and design of database strategies, system monitoring and improving
database performance and capacity, and planning for future expansion requirements. They may also
plan, co-ordinate and implement security measures to safeguard the database.
Duties

A database administrator's responsibilities can include the following tasks:

 Installing and upgrading the database server and application tools


 Allocating system storage and planning future storage requirements for the database system
 Modifying the database structure, as necessary, from information given by application developers
 Enrolling users and maintaining system security
 Ensuring compliance with database vendor license agreement
 Controlling and monitoring user access to the database
 Monitoring and optimizing the performance of the database
 Planning for backup and recovery of database information
 Maintaining archived data
Page 18 of 19 | P a g e
 Backing up and restoring databases
 Contacting database vendor for technical support
 Generating various reports by querying from database as per need

DATABASE DESIGNER

The database designer role defines the tables, indexes, views, constraints, triggers, stored procedures, table
spaces or storage parameters, and other database-specific constructs needed to store, retrieve, and delete
persistent objects

DATABASE ANALYST

Database Analyst: Maintains data storage and access by designing physical databases.

Database Analyst Duties:

 Confirms project requirements by studying user requirements; conferring with others on project team.
 Maintains data dictionary by revising and entering definitions.
 Maintains client confidence and protects operations by keeping information confidential.
 Maintains technical knowledge by attending educational workshops; reviewing publications;
establishing personal networks; participating in technical societies.
 Ensures operation of equipment by completing preventive maintenance requirements; following
manufacturer's instructions; troubleshooting malfunctions; calling for repairs; evaluating new equipment
and techniques.
 Contributes to team effort by accomplishing related results as needed.
 Determines changes in physical database by studying project requirements; identifying database
characteristics, such as location, amount of space, and access method.
 Changes database system by coding database descriptions.
 Protects database by developing access system; specifying user level of access.
 Maintains user reference by writing and rewriting database descriptions.

DATABASE DEVELOPER

Page 19 of 19 | P a g e
Design, develop and implement database systems based on customer requirements.

Optimize database systems for performance efficiency.

Prepare design specifications and functional documentations for assigned database projects.

Perform space management and capacity planning for database systems.

Develop database tables and dictionaries.

Ensure data quality and integrity in databases.

Identify any issues related to database performance and provide corrective measures.

Create complex functions, scripts, stored procedures and triggers to support application development.

Participate in database design and architecture to support application development projects.

Perform data back-up and archival on regular basis.

Test databases and perform bug fixes.

Troubleshoot database related issues in a timely fashion.

Develop security procedures to protect databases from unauthorized usage.

Evaluate existing database and recommend improvements for performance efficiency.

Develop best practices for database design and development activities.

End Users

 Naive Users: Naive Users are those users who need not be aware of the presence of the database
system or any other system supporting their usage. Naive users are end users of the database who
work through a menu driven application program, where the type and range of response is always
indicated to the user.
 A user of an Automatic Teller Machine (ATM) falls in this category. The user is instructed through
each step of a transaction. He or she then responds by pressing a coded key or entering a numeric
value. The operations that can be performed by valve users are very limited and affect only a
precise portion of the database. For example, in the case of the user of the Automatic Teller
Machine, user's action affects only one or more of his/her own accounts.
 Online Users: Online users are those who may communicate with the database directly via an
online terminal or indirectly via a user interface and application program. These users are aware of
the presence of the database system and may have acquired a certain amount of expertise with in the
limited interaction permitted with a database.

Page 20 of 19 | P a g e
 Sophisticated Users: Such users interact with the system without, writing programs. Instead, they
form their requests in database query language. Each such query is submitted to a very processor
whose function is to breakdown DML statement into instructions that the storage manager
understands.
 Specialized Users: Such users are those, who write specialized database application that do not fit
into the fractional data-processing framework. For example: Computer-aided design systems,
knowledge base and expert system, systems that store data with complex data types (for example,
graphics data and audio data).
 Application Programmers: Professional programmers are those who are responsible for developing
application programs or user interface. The application programs could be written using general
purpose programming language or the commands available to manipulate a database.
 Sophisticated Users - They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to request the
database. They directly interact with the database by means of query language like SQL. These
users will be scientists, engineers, analysts who thoroughly study SQL and DBMS to apply the
concepts in their requirement. In short, we can say this category includes designers and developers
of DBMS and SQL.
 Stand-alone Users - These users will have stand –alone database for their personal use. These kinds
of database will have readymade database packages which will have menus and graphical
interfaces.
 Naïve or parametric Users - these are the users who use the existing application to interact with the
database. For example, online library system, ticket booking systems, ATMs etc which has existing
application and users use them to interact with the database to fulfill their requests.

Page 21 of 19 | P a g e
2.0: Database organization

Definition: Data organization, in broad terms, refers to the method of classifying and organizing data sets
to make them more useful.

Physical arrangement of database management system to make the system more useful in an organization,
the organization of database is mainly based on the size of the organization and also the level of security.

2.1 centralized database

A centralized database (sometimes abbreviated CDB) is a database that is located, stored, and
maintained in a single location. This location is most often a central computer or database system, for
example a desktop or server CPU, or a mainframe computer. In most cases, a centralized database would
be used by an organization (e.g. a business company) or an institution (e.g. a university.) Users access a
centralized database through a computer network which is able to give them access to the central CPU,
which in turn maintains to the database itself.

Page 22 of 19 | P a g e
Advantages

1. Data integrity is maximized and data redundancy is minimized as the single storing place of all the
data also implies that a given set of data only has one primary record. This aids in the maintaining
of data as accurate and as consistent as possible and enhances data reliability.
2. Generally bigger data security, as the single data storage location implies only a one possible place
from which the database can be attacked and sets of data can be stolen or tampered with.
3. Better data preservation than other types of databases due to often-included fault-tolerant setup.
4. Easier for using by the end-user due to the simplicity of having a single database design.
5. Generally easier data portability and database administration.
6. More cost effective than other types of database systems as labor, power supply and maintenance
costs are all minimized.
7. Data kept in the same location is easier to be changed, re-organized, mirrored, or analyzed.
8. All the information can be accessed at the same time from the same location.
9. Updates to any given set of data are immediately received by every end-user.

Disadvantages

1. Centralized databases are highly dependent on network connectivity. The slower the internet
connection is, the more the database access time needed will be.
2. Bottlenecks can occur as a result of high traffic.
3. Limited access by more than one person to the same set of data as there is only one copy of it and it
is maintained in a single location. This can lead to major decreases in the general efficiency of the
system.
4. If there is no fault-tolerant setup and hardware failure occurs, all the data within the database will
be lost.
5. Since there is minimal to no data redundancy, if a set of data is unexpectedly lost it is very hard to
retrieve it back, in most cases it would have to be done manually.

2.2 Client - Server Architecture

The data processing is split into distinct parts. A part is either requester (client) or provider (server). The
client sends during the data processing one or more requests to the servers to perform specified tasks. The
server part provides services for the clients.

Page 23 of 19 | P a g e
Advantages and Disadvantages

 Centralization: Unlike P2P, where there is no central administration, here in this architecture there
is a centralized control. Servers help in administering the whole set-up. Access rights and resource
allocation is done by Servers.
 Proper Management : All the files are stored at the same place. In this way, management of files
becomes easy. Also it becomes easier to find files.
 Back-up and Recovery possible: As all the data is stored on server its easy to make a back-up of
it. Also, in case of some break-down if data is lost, it can be recovered easily and efficiently. While
in peer computing we have to take back-up at every workstation.
 Scalability in Client-server set-up: Changes can be made easily by just upgrading the server. Also
new resources and systems can be added by making necessary changes in server.
 Accessibility: From various platforms in the network, server can be accessed remotely.
 As new information is uploaded in database, each workstation need not have its own storage
capacities increased (as may be the case in peer-to-peer systems). All the changes are made only in
central computer on which server database exists.
 Security: Rules defining security and access rights can be defined at the time of set-up of server.
 Servers can play different roles for different clients.

Disadvantages of Client Server Architecture v/s P-2-P Technology

 Congestion in Network: Too many requests from the clients may lead to congestion, which rarely
takes place in P2P network. Overload can lead to breaking-down of servers. In peer-to-peer, the
total bandwidth of the network increases as the number of peers increase.
 Client-Server architecture is not as robust as a P2P and if the server fails, the whole network
goes down. Also, if you are downloading a file from server and it gets abandoned due to some
error, download stops altogether. However, if there would have been peers, they would have
provided the broken parts of file.
Page 24 of 19 | P a g e
 Cost: It is very expensive to install and manage this type of computing.
 You need professional IT people to maintain the servers and other technical details of network.

Types of client server database architecture

Two-tier Client / Server architecture - A two-tier architecture is a software architecture in which a


presentation layer or interface runs on a client, and a data layer or data structure gets stored on a server.
Separating these two components into different locations represents two-tier architecture, as opposed to
single-tier architecture. Other kinds of multi-tier architectures add additional layers in distributed software
design.

Disadvantages:

 In two tier architecture application performance will be degrade upon increasing the users.
 Cost-ineffective.
 Tightly coupled.
 Not easy to scale.
 Degrades performance when scale.

3-tier client/server architecture - is a type of software architecture which is composed of three “tiers” or
“layers” of logical computing. They are often used in applications as a specific type of client-server
system. 3-tier architectures provides many benefits for production and development environments by
modularizing the user interface, business logic, and data storage layers. Doing so gives greater flexibility to
development teams by allowing them to update a specific part of an application independently of the other
parts. This added flexibility can improve overall time-to-market and decrease development cycle times by
giving development teams the ability to replace or upgrade independent tiers without affecting the other
parts of the system..

 Presentation Tier- The presentation tier is the front end layer in the 3-tier system and consists of
the user interface. This user interface is often a graphical one accessible through a web browser or
web-based application and which displays content and information useful to an end user. This tier is

Page 25 of 19 | P a g e
often built on web technologies such as HTML5, JavaScript, CSS, or through other popular web
development frameworks, and communicates with others layers through API calls.
 Application Tier- The application tier contains the functional business logic which drives an
application’s core capabilities. It’s often written in Java, .NET, C#, Python, C++, etc.
 Data Tier- The data tier comprises of the database/data storage system and data access layer.
Examples of such systems are MySQL, Oracle, PostgreSQL, Microsoft SQL Server, MongoDB,
etc. Data is accessed by the application layer via API calls.

Advantages

1. High performance, lightweight persistent objects.


2. Scalability – Each tier can scale horizontally.
3. Performance – Because the Presentation tier can cache requests, network utilization is minimized,
and the load is reduced on the Application and Data tiers.
4. Better Re-usability.
5. Improve Data Integrity.
6. Improved Security – Client is not direct access to database.
7. Forced separation of user interface logic and business logic.
8. Business logic sits on small number of centralized machines (may be just one).
9. Easy to maintain, to manage, to scale, loosely coupled etc.

2.3 Distributed Database Systems

A distributed database is a collection of multiple interconnected databases, which are spread physically
across various locations that communicate via a computer network.

Eigenschaften

 Databases in the collection are logically interrelated with each other. Often they represent a single
logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
 The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.

Distributed Database Management System

A distributed database is a database that consists of two or more files located in different sites either on the
same network or on entirely different networks. Portions of the database are stored in multiple physical
locations and processing is distributed among multiple database nodes. A distributed database management
system (DDBMS) is a centralized software system that manages a distributed database in a manner as if it
were all stored in a single location.

Page 26 of 19 | P a g e
Features

 It is used to create, retrieve, update and delete distributed databases.


 It synchronizes the database periodically and provides access mechanisms by the virtue of which
the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed and accessed by numerous
users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.

Factors Encouraging DDBMS

The following factors encourage moving over to DDBMS −


 Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit requires its
own set of local data. Thus, the overall database of the organization becomes distributed.
 Need for Sharing of Data − the multiple organizational units often need to communicate with each
other and share their data and resources. This demands common databases or replicated databases
that should be used in a synchronized manner.
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online Analytical
Processing (OLAP) work upon diversified systems which may have common data. Distributed
database systems aid both these processing by providing synchronized data.
 Database Recovery − One of the common techniques used in DDBMS is replication of data across
different sites. Replication of data automatically helps in data recovery if database in any site is
damaged. Users can access data from other sites while the damaged site is being reconstructed.
Thus, database failure may become almost inconspicuous to users.s
 Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.
Types of Distributed Databases
Distributed databases can be broadly classified into homogeneous and heterogeneous distributed database
environments, each with further sub-divisions.
Homogeneous Distributed Databases

In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −
1. The sites use very similar software.
2. The sites use identical DBMS or DBMS from the same vendor.
3. Each site is aware of all other sites and cooperates with other sites to process user requests.
4. The database is accessed through a single interface as if it is a single database.
Types of Homogeneous Distributed Database

There are two types of homogeneous distributed database −

Page 27 of 19 | P a g e
 Autonomous − each database is independent and functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
 Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.

Heterogeneous Distributed Databases

In a heterogeneous distributed database, different sites have different operating systems, DBMS products
and data models. Its properties are −

Different sites use dissimilar schemas and software.

The system may be composed of a variety of database models like relational, network, hierarchical or
object oriented.

Query processing is complex due to dissimilar schemas.

Transaction processing is complex due to dissimilar software.

A site may not be aware of other sites and so there is limited co-operation in processing user requests.

Types of Heterogeneous Distributed Databases

Federated − the heterogeneous database systems are independent in nature and integrated together so that
they function as a single database system.

Un-federated − the database systems employ a central coordinating module through which the databases
are accessed.

Advantages of Distributed Databases

Page 28 of 19 | P a g e
Following are the advantages of distributed databases over centralized databases.
 Modular Development/scalability − If the system needs to be expanded to new locations or new units,
in centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
 More Reliable − In case of database failures, the total system of centralized databases comes to a halt.
However, in distributed systems, when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
 Better Response − If data is distributed in an efficient manner, then user requests can be met from local
data itself, thus providing faster response.
 Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not feasible
in centralized systems.

Disadvantages of Distributed Databases

Following are some of the adversities associated with distributed databases.


1. Need for complex and expensive software − DDBMS demands complex and often expensive
software to provide data transsparency and co-ordination across the several sites.
2. Processing overhead − Even simple operations may require a large number of communications and
additional calculations to provide uniformity in data across the sites.
3. Data integrity − the need for updating data in multiple sites pose problems of data integrity.
4. Overheads for improper data distribution − Responsiveness of queries is largely dependent upon
proper data distribution. Improper data distribution often leads to very slow response to user
requests.
5. Security risk- because data is accessed over the network/ internet which is insecure

3.0 Principles and techniques of database design

3.1 Meaning

Database design is the organization of data according to a database model. The designer determines what
data must be stored and how the data elements interrelate. With this information, they can begin to fit the
data to the Database model. Database design involves classifying data and identifying interrelationships

3.2 Database design cycle

There are six main objectives which must be fulfilled effectively by a good database.

1. Usability
2. Extensibility
3. Data Integrity
4. Performance
5. Availability
6. Security
Page 29 of 19 | P a g e
Let’s discuss a little in detail.

Usability
Any information which we are storing in any organization should be meaningful for that organization. If
we are storing those factors which are actually not fit with organization’s requirement then this is just waste
of resources.

Primary objective of any information system should be to meet organization requirements. Following are
few points to consider while going to start an architecture.

1. Properly get details about requirements.


2. See how information can be fit with requirement.
3. Trace requirement matrix to capture mapping of information architecture and requirements.
4. Organize it simple.
5. Decide upon format of data so that could be easily converted to meaningful representation.

Extensibility/Scalability
As we know that everyday new business requirements come up and every day there is a need to change or
enhance information system to capture new requirements. So information design should be extensible so
that it can adopt new requirements without many efforts or without major breaking changes.

If your initial design is too much complex or unorganized then it may create trouble for you to adopt new
things effectively.

Following are few points to consider when thinking of extensibility.


Page 30 of 19 | P a g e
 Normalization and correct handling of optional data.
 Generalization of entities when designing the schema.
 Data-driven designs that not only model the data but also enable the organization to store the
behavioral patterns or flow which can be hooked up in different stages of information processing.
 A well-defined abstraction layer that decouples the database from all client access, including client
apps, middle tiers, ETL, and reports.
 Extensibility is also closely related to simplicity. Complexity breeds complexity. A simple solution
is easy to understand and adopt, and ultimately, easy to adjust later.

Data Integrity
Now at this point we understand that information is very much important for any organization. Based on
the historic information, every organization makes different strategies, decisions for growth. One small
mistake in data can lead to major issues with any organization’s key decision and hence a big risk for
growth.

When we are designing a good information system then we must keep in mind about integrity, correctness
of data. Our system should be smart enough to handle incorrect, missing data attributes and based on that it
should either take corrective actions or straightaway reject the data. Incorrect data should not be present in
system or at least should not exposed to individuals creating misunderstanding.

Data integrity can be of many types.

Entity Integrity

Involves the structure (primary key and its attributes) of the entity. If the primary key is unique and all
attributes are scalar and fully dependent on the primary key, then the integrity of the entity is good. In the
physical schema, the table’s primary key enforces entity integrity.

Domain Integrity

It defines that data should be of correct type and we should handle optional data in correct way. We should
apply Nullability to those attributes which are optional for organization. We can define proper data types
for different attributes based on organization’s requirement so that correct format data should present in
system.

Referential Integrity

This defines if any entity is dependent on another one then parent entity should be there in the system and
should be uniquely identifiable. We can do this by implementing foreign keys.

Transactional Integrity

This defines that transaction should have its ACID properties. Any transaction should be atomic,
consistent, durable and isolated. The quality of a database product is measured by its transactions’
adherence to the ACID properties:
Page 31 of 19 | P a g e
Atomic — all or nothing

Consistent — the database begins and ends the transaction in a consistent state

Isolated — one transaction does not affect another transaction

Durable — once committed always committed

User defined integrity

There are few business rules which we cannot validate just by primary keys, foreign keys etc. There has to
be some mechanism so that we can validate complex rules for integrity. We can implement these rules in
following ways:

 Check Constraints
 Triggers & Stored Procedures
 Queries to identify incorrect data and handle in correct way.

Performance
As we know that information should be readily available as requested. Performance of the system should
be up to the mark. As data in increasing day by day so at some time there will be impact on performance if
database design is poor or we’ll not take any actions to improve performance.

Following could be few strategies which we can implement when there is need as data increases.

 A well-designed schema with normalization and generalization


 A sound indexing strategy, including careful selection of clustered and nonclustered
 Tight, fast transactions that reduce locking and blocking
 Partitioning, which is useful for advanced scalability

Availability
The availability of information refers to the information’s accessibility when required regarding uptime,
locations, and the availability of the data for future analysis. Disaster recovery, redundancy, archiving, and
network delivery all affect availability.

Following are the factors which impact availability:

 Quality, redundant hardware


 SQL Server’s high-availability features
 Proper DBA procedures regarding data backup and backup storage
 Disaster recovery planning

Page 32 of 19 | P a g e
Security
For any organizational asset, the level of security must be secured depending on its value and sensitivity.
Sometime organizations has suffered a lot because of data leaks which results in loss of faith and tends to
business risk. So security is one of the most important aspect of good database design.

We can enhance Security by the following:

 Physical security and restricted access of the data center


 Defensively coding against SQL injection
 Appropriate operating system security
 Reducing the surface area of SQL Server to only those services and features required
 Identifying and documenting ownership of the data
 Granting access according to the principle of least privilege, which is the concept that users should
have only the minimum access rights required to perform necessary functions within the database
 Cryptography — data encryption of live databases, backups, and data warehouses
 Metadata and data audit trails documenting the source and veracity of the data, including updates

Based on above principles, one should start designing databases and architectures.

Page 33 of 19 | P a g e
The Database Life Cycle

Page 34 of 19 | P a g e
The Database Life Cycle: The Database Initial Study
Overall Purpose of the Initial Study:

Analyze the company situation.


Define problems and constraints.
Define objectives.
Define scope and boundaries.

The Database Life Cycle: The Database Initial Study

The Database Life Cycle: Analyze the Company Situation


What is the organization's general operating environment, and what is its mission within that environment?

What is the organization's structure?

The Database Life Cycle: Define Problems and Constraints


How does the existing system function?
What input does the system require?
What documents does the system generate?
How is the system output used? By Whom?
What are the operational relationships among business units?
What are the limits and constraints imposed on the system?

Page 35 of 19 | P a g e
The Database Life Cycle: Define the Objective
What is the proposed system's initial objective?
Will the system interfere with other existing or future systems in the company?
Will the system share the data with other systems or users?

The Database Life Cycle: Define Scope and Boundaries


Scope -- What is the extent of the design based on operational requirements?
Boundaries -- What are the limits?

Budget
Hardware and software
Extent of organizational change required

Database Design: Business vs Designer View

Page 36 of 19 | P a g e
The Database Life Cycle: Conceptual Design

The Database Life Cycle: Conceptual Design


Data modeling is used to create an abstract database structure that represents real-world objects.
The design must be software- and hardware-independent.
Minimal data rule: All that is needed is there, and all that is there is needed.
Four Steps:

Data analysis and requirements


Entity relationship modeling and normalization
Data model verification
Distributed database design

The Database Life Cycle: Data analysis and requirements


Designer's efforts are focused on

Information needs.
Information users.
Information sources.
Information constitution.

Page 37 of 19 | P a g e
The Database Life Cycle: Data analysis and requirements
Sources of information for the designer

Developing and gathering end user data views


Direct observation of the current system: existing and desired output
Interface with the systems design group

The designer must identify the company's business rules and analyze their impacts.

The Database Life Cycle: Entity Relationship Modeling and Normalization

The Database Life Cycle: Tools and Information Sources

Page 38 of 19 | P a g e
The Database Life Cycle: Entity Relationship Modeling and Normalization

Define entities, attributes, primary keys, and foreign keys.


Make decisions about adding new primary key attributes in order to satisfy end user and/or processing
requirements.
Make decisions about the treatment of multivalued attributes.
Make decisions about adding derived attributes to satisfy processing requirements.\
Make decisions about the placement of foreign keys in 1:1 relationships.
Avoid unnecessary ternary relationships.
Draw the corresponding E-R diagram.
Normalize the data model.
Include all the data element definitions in the data dictionary.
Make decisions about standard naming conventions.

The Database Life Cycle: Entity Relationship Modeling and Normalization


Some Good Naming Conventions:

Use descriptive entity and attribute names wherever possible.


Composite entities usually are assigned a name that is descriptive of the relationships they represent.
An attribute name should be descriptive and it should contain a prefix that helps identify the table in which
it is found.

The Database Life Cycle: Data Model Verification

Purposes of close review of entities and attributes


Adding attribute details may lead to a revision of the entities themselves.
Attribute details can provide clues about the nature of the relationships as they are defined by the primary
and foreign keys.
To satisfy processing and/or end user requirements, it might be useful to create a new primary key to
replace an existing primary key.
Unless the entity details are precisely defined, it is difficult to evaluate the extent of the design's
normalization.

The Database Life Cycle: Data Model Verification


Run the data model through a series of tests against:

End user data views and their required transactions


Access paths, security, concurrency control
Business imposed data requirements and constraints

Advantages of the Modular Approach


Defining the designs major components as modules allows:

Delegating modules to design groups, greatly speeding up the development work.


Simplifying the design work by reducing the number of entities within each module.
Modules can be prototyped quickly. Implementation and applications programming trouble spots can be
identified more readily.

Page 39 of 19 | P a g e
Even if the entire system can't be brought on line quickly, implementation of one or more modules will
demonstrate that progress is being made and that at least part of the system is ready to begin serving the
end users.

E-R Model Verification Process


Identify E-R model's central entity: participates in most relationships, is the focus of most system
operations
Identify each module and its components
Identify each module's internal and external transaction requirements
Verify all processes against the E-R model
Revise as necessary

See Figure 6.10

The Database Life Cycle: Analyzing modules


During the E-R model verification process, the DB designer must:

Ensure the module's cohesivity -- the strength of the relationships found among the module's entities.
Analyze each module's relationships with other modules to address module coupling -- the extent to which
modules are independent of one another.

Processes may be classified according to their:

Frequency (daily, weekly, monthly, yearly, or exceptions).


Operational type (INSERT or ADD, UPDATE or CHANGE, DELETE, queries and reports, batches, maintenance,
and backups).

All identified processes must be verified against the E-R model. If necessary, appropriate changes are implemented.

The Database Life Cycle: Distributed Database Design


Portions of a database may reside in different physical locations.
If the database process is to be distributed across the system, the designer must also develop the data distribution
and allocation strategies for the database.

The Database Life Cycle: Database Software Selection


Common factors affecting the decision:

Existing systems: if the organization already has a DBMS it may be wise to use it.
Cost -- Purchase, maintenance, operational, license, installation, training, and conversion costs.
DBMS features and tools.

Development tools such as screen painters, report generators etc.


Database Administration facilities
Performance and scalability
DBMS hardware requirements.

Underlying model (almost always relational, sometimes object oriented).


Portability -- Platforms, systems, and languages.

Page 40 of 19 | P a g e
The Database Life Cycle: Logical Design
Logical design translates the conceptual design into the internal model for a selected DBMS.
It includes mapping of all objects in the model to the specific constructs used by the selected database software.
For a relational DBMS, the logical design includes the design of tables, obvious indexes, views, transactions, access
authorities, and so on.

The Database Life Cycle: Physical Design


Physical design is the process of selecting the data storage and data access characteristics of the database. It affects
not only the location of the data in the storage device(s) but also the performance.

The storage characteristics are a function of:


The types of devices supported by the hardware.
The type of data access methods supported by the system.
The DBMS.

Physical design is particularly important in the older hierarchical and network models and in very large databases.
Relational databases are more insulated from physical layer details than hierarchical and network models.

The Database Life Cycle: Implementation and Loading


Create the database storage group.
Create the database within the storage group.
Assign the rights to use the database to a database administrator.
Create the table space(s) within the database.
Create the table(s) within the table space(s).
Assign access rights to the table spaces and the tables within specified table spaces.
Load the data.
See Figure 6.12 for an example of DB2 storage architecture

The Database Life Cycle: Physical Design Issues


Performance
Sicherheit

Physical security
Access rights and security methods (e.g. Passwords, smartcards, biometrics)
Audit trails
Data encryption
Client/Server, thin clients, web enabled databases
Backup and Recovery
Integrity
Company standards
Concurrency controls

The Database Life Cycle: Testing and Evaluation


The testing and evaluation phase occurs in parallel with application programming.
Programmers use database tools (e.g., report generators, screen painters, and menu generators) to prototype the

Page 41 of 19 | P a g e
applications during the coding of the programs.
Options to enhance the system if the implementation fails.

Fine-tuning the specific system and DBMS configuration parameters.


Modify physical design.
Upgrade or change the DBMS and hardware platform.

The Database Life Cycle: Operation


Once the database has passed the evaluation stage, it is considered to be operational.
The beginning of the operational phase invariably starts the process of system evolution.

The Database Life Cycle: Maintenance and Evolution


Preventive maintenance
Corrective maintenance
Adaptive maintenance
Assignment and maintenance of access permissions
Generation of database access statistics
Periodic security audits based on the system-generated statistics
Periodic system-usage summaries for internal billing or budgeting purposes.

Database Life Cycle and Systems Development Life Cycle

Page 42 of 19 | P a g e
A Special Note about Database Design Strategies
Two Classical Approaches to Database Design:
Top-down design starts by identifying the data sets, and then defines the data elements for each of these sets.
Bottom-up design first identifies the data elements (items), and then groups them together in data sets.

Page 43 of 19 | P a g e
Centralized vs Decentralized Design: Two Different Database Design Philosophies:

Centralized design
It is productive when the data component is composed of a relatively small number of objects and procedures.

Two Different Database Design Philosophies:


Decentralized design
It may be used when the data component of the system has a considerable number of entities and complex
relations on which very complex operations are performed. (Figure 6.16)

4.0 Relational database system

4.1 Meaning of relational database system

A Relational database management system (RDBMS) is a database management system (DBMS) that is
based on the relational model as introduced by E. F. Codd.

Page 44 of 19 | P a g e
The data in an RDBMS is stored in database objects which are called as tables. This table is basically a
collection of related data entries and it consists of numerous columns and rows.

4.2 Relational Database Characteristics

 Data in the relational database must be represented in tables, with values in columns within rows.
 Data within a column must be accessible by specifying the table name, the column name, and the value
of the primary key of the row.
 The DBMS must support missing and inapplicable information in a systematic way, distinct from
regular values and independent of data type.
 The DBMS must support an active on-line catalogue.
 The DBMS must support at least one language that can be used independently and from within
programs, and supports data definition operations, data manipulation, constraints, and transaction
management.
 Views must be updatable by the system.
 The DBMS must support insert, update, and delete operations on sets.
 The DBMS must support logical data independence.
 The DBMS must support physical data independence.
 Integrity constraints must be stored within the catalogue, separate from the application.
 The DBMS must support distribution independence. The existing application should run when the
existing data is redistributed or when the DBMS is redistributed.
 If the DBMS provides a low level interface (row at a time), that interface cannot bypass the integrity
constraints.

RELATIONAL DATABASE SYSTEM COMPRISES OF:

 Relational data structure


 Relational Integrity constraints
 Relational algebra or its equivalent SQL

The Relational Data Model

The relational data model was introduced by C. F. Codd in 1970. Currently, it is the most widely used data
model.

The relational model has provided the basis for:

 Research on the theory of data/relationship/constraint


 Numerous database design methodologies
 The standard database access language called structured query language (SQL)
 Almost all modern commercial database management systems

The relational data model describes the database as “a collection of inter-related relations (or tables).”

Fundamental Concepts in the Relational Data Model


Page 45 of 19 | P a g e
Relation

A relation, also known as a table or file, is a subset of the Cartesian product of a list of domains
characterized by a name. And within a table, each row represents a group of related data values. A row, or
record, is also known as a tuple.

The columns in a table is a field and is also referred to as an attribute. You can also think of it this way: an
attribute is used to define the record and a record contains a set of attributes.

Column

A database stores pieces of information or facts in an organized way. Understanding how to use and get the
most out of databases requires us to understand that method of organization.

The principal storage units are called columns or fields or attributes.

When deciding which fields to create, you need to think generically about your information, for example,
drawing out the common components of the information that you will store in the database and avoiding
the specifics that distinguish one item from another.

Domain

A domain is the original sets of atomic values used to model data. By atomic value, we mean that each
value in the domain is indivisible as far as the relational model is concerned. For example:

 The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
 The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
 The domain of Salary is the set of all floating-point numbers greater than 0 and less than
200,000.
 The domain of First Name is the set of character strings that represents names of people.

In summary, a domain is a set of acceptable values that a column is allowed to contain. This is based on
various properties and the data type for the column. We will discuss data types in another chapter.

Records

Just as the content of any one document or item needs to be broken down into its constituent bits of data for
storage in the fields, the link between them also needs to be available so that they can be reconstituted into
their whole form. Records allow us to do this. Records contain fields that are related, such as a customer or
an employee. As noted earlier, a tuple is another term used for record.

Records and fields form the basis of all databases. A simple table gives us the clearest picture of how
records and fields work together in a database storage project.

The simple table example in Figure 7.3 shows us how fields can hold a range of different sorts of data. This
one has:

Page 46 of 19 | P a g e
 A Record ID field: this is an ordinal number; its data type is an integer.
 A PubDate field: this is displayed as day/month/year; its data type is date.
 An Author field: this is displayed as Initial. Surname; its data type is text.
 A Title field text: free text can be entered here.

You can command the database to sift through its data and organize it in a particular way. For example,
you can request that a selection of records be limited by date: 1. all before a given date, 2. all after a given
date or 3. all between two given dates. Similarly, you can choose to have records sorted by date. Because
the field, or record, containing the data is set up as a Date field, the database reads the information in the
Date field not just as numbers separated by slashes, but rather, as dates that must be ordered according to a
calendar system.

Degree

The degree is the number of attributes in a table.

Properties of a Table

 A table has a name that is distinct from all other tables in the database.
 There are no duplicate rows; each row is distinct.
 Entries in columns are atomic. The table does not contain repeating groups or multivalued attributes.
 Entries from columns are from the same domain based on their data type including:
 Number (numeric, integer, float, smallint,…)
 character (string)
 date
 logical (true or false)
 Operations combining different data types are disallowed.
 Each attribute has a distinct name.
 The sequence of columns is insignificant.
 The sequence of rows is insignificant.

Database Schema

A database schema is the skeleton structure that represents the logical view of the entire database. It
defines how the data is organized and how the relations among them are associated. It formulates all the
constraints that are to be applied on the data.

A database schema defines its entities and the relationship among them. It contains a descriptive detail of
the database, which can be depicted by means of schema diagrams. It’s the database designers who design
the schema to help programmers understand the database and make it useful.

A database schema can be divided broadly into two categories −

Page 47 of 19 | P a g e
 Physical Database Schema − this schema pertains to the actual storage of data and its form of
storage like files, indices, etc. It defines how the data will be stored in a secondary storage.
 The physical database schema gives the blueprint for how each piece of data is stored in the
database.
 Logical Database Schema − this schema defines all the logical constraints that need to be applied
on the data stored. It defines tables, views, and integrity constraints.
 The logical schema gives structure to the tables and relationships inside of the database. Generally
speaking, the logical schema is created before the physical schema.

Relational databases Integrity Rules and Constraints

Constraints are useful because they allow a designer to specify the semantics of data in the
database. Constraints are the rules that force DBMSs to check that data satisfies the semantics.

Domain Integrity

Domain Level Integrity

A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a
database system, the domain integrity is defined by:
 Data Type - Basic data types are integer, decimal, or character. Most data bases support variants of
these plus special data types for date and time.
 Length - This is the number of digits or characters in the value. For example, a value of 5 digits or 40
characters.
 Date Format - The format for date values such as dd/mm/yy or mm/dd/yyyy or yy/mm/dd.
 Range - The range specifies the lower and upper boundaries of the values the attribute may legally
have.
 Constraints - Are special restrictions on allowable values. For example, the LeavingDate for an
Employee must always be greater than the HireDate for that Employee.
 Null support - Indicates whether the attribute can have null values.
 Default value (if any) - The value an attribute instance will have if a value is not entered.
There are several kinds of integrity constraints, described below.

Entity integrity

Entity Integrity ensures that there are no duplicate records within the table and that the field that identifies
each record within the table is unique and never null.
The existence of the Primary Key is the core of the entity integrity. If you define a primary key for each
entity, they follow the entity integrity rule.
Entity integrity specifies that the Primary Keys on every instance of an entity must be kept, must be unique
and must have values other than NULL.
Although most relational databases do not specifically dictate that a table needs to have a Primary Key, it is
good practice to design a Primary Key for each table in the relational model. This mandates
no NULL content, so that every row in a table must have a value that denotes the row as a unique element
of the entity.

Page 48 of 19 | P a g e
Entity Integrity is the mechanism the system provides to maintain primary keys. The primary key serves as
a unique identifier for rows in the table. Entity Integrity ensures two properties for primary keys:
 The primary key for a row is unique; it does not match the primary key of any other row in the table.
 The primary key is not null, no component of the primary key may be set to null.
The uniqueness property ensures that the primary key of each row uniquely identifies it; there are no
duplicates. The second property ensures that the primary key has meaning, has a value; no component of
the key is missing.
The system enforces Entity Integrity by not allowing operations (INSERT, UPDATE) to produce an
invalid primary key. Any operation that creates a duplicate primary key or one containing nulls is rejected.
Referential integrity

Referential integrity requires that a foreign key must have a matching primary key or it must be null. This
constraint is specified between two tables (parent and child); it maintains the correspondence between rows
in these tables. It means the reference from a row in one table to another table must be valid.

Cascade actions:

Cascade: a cascade action propagates the delete or update operation on the parent key to each dependent
child key.

On delete cascade action: when a parent row is deleted, each row in the child table that was associated with
the deleted parent row is also deleted.

On delete restrict action: rejects the delete or update operation for the parent table if there is a related
foreign key value in the child table.

On delete set null: if record in the parent table is deleted, then the corresponding records in the child table
will be set to null (mysql). In sql the records will be deleted.

4.3 Relational algebra and relational calculus

Relational database systems are expected to be equipped with a query language that can assist its users to
query the database instances.

There are two kinds of query languages:

1. Relational algebra
2. Relational calculus

Relational Algebra

Page 49 of 19 | P a g e
Relational algebra is a procedural query language, which takes instances of relations as input and yields
instances of relations as output. It uses operators to perform queries.

An operator can be either unary or binary. They accept relations as their input and yield relations as their
output. Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.

The fundamental operations of relational algebra are as follows

1. Select
2. Project
3. Union
4. Set different
5. Cartesian product
6. Rename

Select Operation σ

It selects tuples that satisfy the given predicate from a relation.

Page 50 of 19 | P a g e
Page 51 of 19 | P a g e
Page 52 of 19 | P a g e
Page 53 of 19 | P a g e
Join

Join- is a combination of a Cartesian product followed by a selection process. A Join operation pairs
two tuples from different relations, if and only if a given join condition is satisfied. We will briefly
describe various join types in the following sections.

Theta θ Join

Theta join combines tuples from different relations provided they satisfy the theta condition. The
join condition is denoted by the symbol θ.

Notation

R1 ⋈θ R2

R1 and R2 are relations having attributes A1, A2, . . , An and B1, B2, . . , Bn such that the attributes
don’t have anything in common, that is R1 ∩ R2 = Φ. Theta join can use all kinds of comparison
operators.

Example

Page 54 of 19 | P a g e
Equijoin

When Theta join uses only equality comparison operator, it is said to be equijoin. The above example
corresponds to equijoin.

Page 55 of 19 | P a g e
Natural Join (⋈)

Natural join does not use any comparison operator. It does not concatenate the way a Cartesian product
does. We can perform a Natural Join only if there is at least one common attribute that exists between two
relations. In addition, the attributes must have the same name and domain. Natural join acts on those
matching attributes where the values of attributes in both the relations are same.

Outer Joins

Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those tuples with
matching attributes and the rest are discarded in the resulting relation. Therefore, we need to use outer joins
to include all the tuples from the participating relations in the resulting relation. There are three kinds of
outer joins − left outer join, right outer join, and full outer join.

Page 56 of 19 | P a g e
Page 57 of 19 | P a g e
Page 58 of 19 | P a g e
4.4 Relational Calculus

Relational calculus

Relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus,
that are part of the relational model for databases and provide a declarative way to specify database queries.
This in contrast to the relational algebra which is also part of the relational model but provides a more
procedural way for specifying queries

The relational algebra might suggest these steps to retrieve the phone numbers and names of book stores
that supply Some Sample Book:

1. Join book stores and titles over the BookstoreID.


2. Restrict the result of that join to tuples for the book Some Sample Book.
3. Project the result of that restriction over StoreName and StorePhone.

The relational calculus would formulate a descriptive, declarative way:

Get StoreName and StorePhone for supplies such that there exists a title BK with the same BookstoreID
value and with a BookTitle value of Some Sample Book.

The relational algebra and the relational calculus are essentially logically equivalent: for any algebraic
expression, there is an equivalent expression in the calculus, and vice versa. This result is known as Codd's
theorem.

Tuple relational calculus

Tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational model, in order
to provide a declarative database-query language for this data model. It formed the inspiration for the
database-query languages QUEL and SQL, of which the latter, although far less faithful to the original
relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is
used by nearly every relational-database-management system. Lacroix and Pirotte proposed domain
calculus, which is closer to first-order logic and which showed that both of these calculi (as well as
relational algebra) are equivalent in expressive power. Subsequently, query languages for the relational
model were called relationally complete if they could express at least all of these queries.

Definition of the calculus

Since the calculus is a query language for relational databases we first have to define a relational database.
The basic relational building block is the domain, or data type. A tuple is an ordered multiset of attributes,
which are ordered pairs of domain and value; or just a row. A relvar (relation variable) is a set of ordered
pairs of domain and name, which serves as the header for a relation. A relation is a set of tuples. Although
these relational concepts are mathematically defined, those definitions map loosely to traditional database
concepts. A table is an accepted visual representation of a relation; a tuple is similar to the concept of row.

We first assume the existence of a set C of column names, examples of which are "name", "author",
"address" et cetera. We define headers as finite subsets of C. A relational database schema is defined as a

Page 59 of 19 | P a g e
tuple S = (D, R, h) where D is the domain of atomic values (see relational model for more on the notions of
domain and atomic value), R is a finite set of relation names, and

h : R → 2C

a function that associates a header with each relation name in R. (Note that this is a simplification from the
full relational model where there is more than one domain and a header is not just a set of column names
but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial
function that maps some column names to an atomic value in D. An example would be (name : "Harry",
age : 25).

t:C→D

The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is called the
domain of t (not to be confused with the domain in the schema) and denoted as dom(t).

Finally we define a relational database given a schema S = (D, R, h) as a function

db : R → 2TD

that maps the relation names in R to finite subsets of TD, such that for every relation name r in R and tuple t
in db(r) it holds that

dom(t) = h(r).

The latter requirement simply says that all the tuples in a relation should contain the same column names,
namely those defined for it in the schema..........

Atoms

For the construction of the formulae we will assume an infinite set V of tuple variables. The formulas are
defined given a database schema S = (D, R, h) and a partial function type : V -> 2C that defines a type
assignment that assigns headers to some tuple variables. We then define the set of atomic formulas
A[S,type] with the following rules:

1. if v and w in V, a in type(v) and b in type(w) then the formula " v.a = w.b " is in A[S,type],
2. if v in V, a in type(v) and k denotes a value in D then the formula " v.a = k " is in A[S,type], and
3. if v in V, r in R and type(v) = h(r) then the formula " r(v) " is in A[S,type].

Examples of atoms are:

 (t.age = s.age) — t has an age attribute and s has an age attribute with the same value
 (t.name = "Codd") — tuple t has a name attribute and its value is "Codd"
 Book(t) — tuple t is present in relation Book.

The formal semantics of such atoms is defined given a database db over S and a tuple variable binding val :
V -> TD that maps tuple variables to tuples over the domain in S:

Page 60 of 19 | P a g e
1. " v.a = w.b " is true if and only if val(v)(a) = val(w)(b)
2. " v.a = k " is true if and only if val(v)(a) = k
3. " r(v) " is true if and only if val(v) is in db(r)

Formulae

The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧
(and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to
bind the variables. We define the set of formulas F[S,type] inductively with the following rules:

1. every atom in A[S,type] is also in F[S,type]


2. if f1 and f2 are in F[S,type] then the formula " f1 ∧ f2 " is also in F[S,type]
3. if f1 and f2 are in F[S,type] then the formula " f1 ∨ f2 " is also in F[S,type]
4. if f is in F[S,type] then the formula " ¬ f " is also in F[S,type]
5. if v in V, H a header and f a formula in F[S,type[v->H]] then the formula " ∃ v : H ( f ) " is also in
F[S,type], where type[v->H] denotes the function that is equal to type except that it maps v to H,
6. if v in V, H a header and f a formula in F[S,type[v->H]] then the formula " ∀ v : H ( f ) " is also in
F[S,type]

Examples of formulas:

 t.name = "C. J. Date" ∨ t.name = "H. Darwen"


 Book(t) ∨ Magazine(t)
 ∀ t : {author, title, subject} ( ¬ ( Book(t) ∧ t.author = "C. J. Date" ∧ ¬ ( t.subject = "relational
model")))

Note that the last formula states that all books that are written by C. J. Date have as their subject the
relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.

We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema.
This leads to the following formal semantics for formulas given a database db over S and a tuple variable
binding val : V -> TD:

1. " f1 ∧ f2 " is true if and only if " f1 " is true and " f2 " is true,
2. " f1 ∨ f2 " is true if and only if " f1 " is true or " f2 " is true or both are true,
3. " ¬ f " is true if and only if " f " is not true,
4. " ∃ v : H ( f ) " is true if and only if there is a tuple t over D such that dom(t) = H and the formula " f
" is true for val[v->t], and
5. " ∀ v : H ( f ) " is true if and only if for all tuples t over D such that dom(t) = H the formula " f " is
true for val[v->t].

SEMANTIC AND SYNTACTIC RESTRICTION OF THE CALCULUS

Domain-independent queries

Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the
schema it can be that a query may return a different result for a certain database if another schema is
presumed. For example, consider the two schemas S1 = ( D1, R, h ) and S2 = ( D2, R, h ) with domains D1 = {

Page 61 of 19 | P a g e
1 }, D2 = { 1, 2 }, relation names R = { "r1" } and headers h = { ("r1", {"a"}) }. Both schemas have a
common instance:

db = { ( "r1", { ("a", 1) } ) }

If we consider the following query expression

{ t : {a} | t.a = t.a }

then its result on db is either { (a : 1) } under S1 or { (a : 1), (a : 2) } under S2. It will also be clear that if we
take the domain to be an infinite set, then the result of the query will also be infinite. To solve these
problems we will restrict our attention to those queries that are domain independent, i.e., the queries that
return the same result for a database under all of its schemas.

An interesting property of these queries is that if we assume that the tuple variables range over tuples over
the so-called active domain of the database, which is the subset of the domain that occurs in at least one
tuple in the database or in the query expression, then the semantics of the query expressions does not
change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is
defined, which makes all queries by definition domain independent.

Safe queries

In order to limit the query expressions such that they express only domain-independent queries a
syntactical notion of safe query is usually introduced. To determine whether a query expression is safe we
will derive two types of information from a query. The first is whether a variable-column pair t.a is bound
to the column of a relation or a constant, and the second is whether two variable-column pairs are directly
or indirectly equated (denoted t.v == s.w).

For deriving boundedness we introduce the following reasoning rules:

1. in " v.a = w.b " no variable-column pair is bound,


2. in " v.a = k " the variable-column pair v.a is bound,
3. in " r(v) " all pairs v.a are bound for a in type(v),
4. in " f1 ∧ f2 " all pairs are bound that are bound either in f1 or in f2,
5. in " f1 ∨ f2 " all pairs are bound that are bound both in f1 and in f2,
6. in " ¬ f " no pairs are bound,
7. in " ∃ v : H ( f ) " a pair w.a is bound if it is bound in f and w <> v, and
8. in " ∀ v : H ( f ) " a pair w.a is bound if it is bound in f and w <> v.

For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for
equivalence relations: reflexivity, symmetry and transitivity):

1. in " v.a = w.b " it holds that v.a == w.b,


2. in " v.a = k " no pairs are equated,
3. in " r(v) " no pairs are equated,
4. in " f1 ∧ f2 " it holds that v.a == w.b if it holds either in f1 or in f2,
5. in " f1 ∨ f2 " it holds that v.a == w.b if it holds both in f1 and in f2,
6. in " ¬ f " no pairs are equated,

Page 62 of 19 | P a g e
7. in " ∃ v : H ( f ) " it holds that w.a == x.b if it holds in f and w<>v and x<>v, and
8. in " ∀ v : H ( f ) " it holds that w.a == x.b if it holds in f and w<>v and x<>v.

We then say that a query expression { v : H | f(v) } is safe if

 for every column name a in H we can derive that v.a is equated with a bound pair in f,
 for every subexpression of f of the form " ∀ w : G ( g ) " we can derive that for every column name
a in G we can derive that w.a is equated with a bound pair in g, and
 for every subexpression of f of the form " ∃ w : G ( g ) " we can derive that for every column name
a in G we can derive that w.a is equated with a bound pair in g.

The restriction to safe query expressions does not limit the expressiveness since all domain-independent
queries that could be expressed can also be expressed by a safe query expression. This can be proven by
showing that for a schema S = (D, R, h), a given set K of constants in the query expression, a tuple variable
v and a header H we can construct a safe formula for every pair v.a with a in H that states that its value is in
the active domain. For example, assume that K={1,2}, R={"r"} and h = { ("r", {"a, "b"}) } then the
corresponding safe formula for v.b is:

v.b = 1 ∨ v.b = 2 ∨ ∃ w ( r(w) ∧ ( v.b = w.a ∨ v.b = w.b ) )

This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query
expression by adding such a formula for every variable v and column name a in its type where it is used in
the expression. Effectively this means that we let all variables range over the active domain, which, as was
already explained, does not change the semantics if the expressed query is domain independent.

Domain relational calculus

In computer science, domain relational calculus (DRC) is a calculus that was introduced by Michel Lacroix
and Alain Pirotte as a declarative database query language for the relational data model.

In DRC, queries have the form:

where each Xi is either a domain variable or constant, and denotes a DRC


formula. The result of the query is the set of tuples Xi to Xn which makes the DRC formula true.

This language uses the same operators as tuple calculus, the logical connectives ∧ (and), ∨ (or) and ¬ (not).
The existential quantifier (∃) and the universal quantifier (∀) can be used to bind the variables.

Its computational expressiveness is equivalent to that of Relational algebra.

Examples

Let (A, B, C) mean (Rank, Name, ID) in the Enterprise relation

and let (D, E, F) mean (Name, DeptName, ID) in the Department relation

Page 63 of 19 | P a g e
Find all captains of the starship USS Enterprise:

In this example, A, B, C denotes both the result set and a set in the table Enterprise.

Find names of Enterprise crew members who are in Stellar Cartography:

In this example, we're only looking for the name, and that's B. F = C is a requirement, because we need to
find Enterprise crew members AND they are in the Stellar Cartography Department.

An alternate representation of the previous example would be:

In this example, the value of the requested F domain is directly placed in the formula and the C domain
variable is re-used in the query for the existence of a department, since it already holds a crew member's id.

Relational Data Model: The Relational model uses relation (table) to represent both entities and
relationships among entities. A relation may be visualized as a table. However table is just one of the way,
among many, to represent a relation.

5.0 Entity Relationships

5.1 Meaning of Entity Relationships

An entity-relationship (ER) diagram is a specialized graphic that illustrates the relationships between
entities in a database. ER diagrams often use symbols to represent three different types of information.
Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships and
ovals are used to represent attributes.

Examples: Consider the example of a database that contains information on the residents of a city. The ER
digram shown in the image above contains two entities -- people and cities. There is a single "Lives In"
relationship. Each person lives in only one city, but each city can house many people.

Entity Relationship Diagrams

Entity Relationship diagrams (also known as E-R or ER diagrams) provide database designers with a
valuable tool for modeling the relationships between database entities in a clear, precise format. This
industry standard approach uses a series of block shapes and lines to describe the structure of a database in
Page 64 of 19 | P a g e
a manner understandable to all database professionals. Many database software packages,
including Microsoft Access, SQL Server, and Oracle, provide automated methods to quickly create E-R
diagrams from existing databases.

In this article, we provide an overview of E-R diagramming techniques to help you read, modify or create
your own data models.

Entities

In a database model, each object that you wish to track in the database is known as an entity. Normally,
each entity is stored in a database table and every instance of an entity corresponds to a row in that table. In
an ER diagram, each entity is depicted as a rectangular box with the name of the entity contained within it.

For example, a database containing information about individual people would likely have an entity called
Person. This would correspond to a table with the same name in the database and every person tracked in
the database would be an instance of that Person entity and have a corresponding row in the Person table.
Database designers creating an E-R diagram would draw the Person entity using a shape similar to this:

They would then repeat the process to create a rectangular box for each entity in the data model.
Types of DBMS Entities

The following are the types of entities in DBMS:

Strong Entity- The strong entity has a primary key. Weak entities are dependent on strong entity. Its existence is not
dependent on any other entity.

Strong Entity is represented by a single rectangle:

Weak Entity- The weak entity in DBMS do not have a primary key and are dependent on the parent entity. It mainly
depends on other entities.

Weak Entity is represented by double rectangle:

Attributes
Entities are represented by means of their properties, called attributes. All attributes have values. For
example, a student entity may have name, class, and age as attributes.

There exists a domain or range of values that can be assigned to attributes. For example, a student's name
cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Databases contain information about each entity. This information is tracked in individual fields known as
attributes, which normally correspond to the columns of a database table.

For example, the Person entity might have attributes corresponding to the person's first and last name, date
of birth, and a unique person identifier. Each of these attributes is depicted in an E-R diagram as an oval, as
Page 65 of 19 | P a g e
shown in the figure below:

Attribute(s):
Attributes are the properties which define the entity type. For example, Roll_No, Name, DOB,
Age, Address, Mobile_No are the attributes which defines entity type Student. In ER diagram,
attribute is represented by an oval.

Types

1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key
attribute.For example, Roll_No will be unique for each student. In ER diagram, key attribute

is represented by an oval with underlying lines.


2. Composite Attribute –
An attribute composed of many other attribute is called as composite attribute. For
example, Address attribute of student Entity type consists of Street, City, State, and
Country. In ER diagram, composite attribute is represented by an oval comprising of ovals.

3. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, multivalued attribute is
represented by double oval.

4. Derived Attribute –
An attribute which can be derived from other attributes of the entity type is known as
Page 66 of 19 | P a g e
derived attribute. e.g.; Age (can be derived from DOB). In ER diagram, derived attribute is
represented by dashed oval.

The complete entity type Student with its attributes can be


represented as:

Relationships and Cardinality

The power of the E-R diagram lies in its ability to accurately display information about the relationships
between entities. For example, we might track information in our database about the city where each
person lives. Information about the city itself is tracked within a City entity and a relationship is used to tie
together Person and City instances.

Relationships are normally given names that are verbs, while attributes and entities are named after nouns.
This convention makes it easy to express relationships. For example, if we name our Person/City
relationship "Lives In", we can string them together to say "A person lives in a city." We express
relationships in E-R diagrams by drawing a line between the related entities and placing a diamond shape
that contains the relationship name in the middle of the line. Here's how our Person/City relationship would

look:

Notice that there are some additional shapes on the line. The double hashed line appearing just to the left of
the City entity indicates that this part of the relationship has a cardinality of 1. On the other hand, the
crow's foot symbol to the right of the Person entity indicates that this part of the relationship has a
cardinality of "many". Stated more plainly, each person may live in only one city, while a city may contain
Page 67 of 19 | P a g e
many people.

Those are the basics of Entity-Relationship diagrams. You should now have the information you need to
create basic diagrams for your databases.

Relationship degree, cardinality and optionality

Relationships between entities have three


characteristics: degree, cardinality and
optionality.

Degree- The degree of a relation is the number


of attributes in its header, or, in other words,
the number of columns

Cardinality refers to the maximum number


of times an instance in one entity can relate
to instances of another entity. Ordinality, on
the other hand, is the minimum number of
times an instance in one entity can be associated with an instance in the related entity.

Cardinality and Ordinality are shown by the styling of a line and its endpoint, according to the chosen
notation style.

Types of cardinality

Page 68 of 19 | P a g e
 One-to-one − One instance from entity set A can be associated with at most one instance of entity
set B and vice versa.

 One-to-many − One instance from entity set A can be associated with more than one instance of
entity set B however an instance from entity set B, can be associated with at most one instance.

 Many-to-one − More than one instances from entity A can be associated with at most one instance
of entity B, however an instance from entity B can be associated with more than one instances
from entity A.

 Many-to-many − One instance from entity set A can be associated with more than one instances
from B and vice versa.

5.2 Connotations of entity Relationship


Page 69 of 19 | P a g e
Relationships illustrate an association between two tables. In the physical data model, relationships are
represented by stylized lines.

Cardinality and ordinality, respectively, refer to the maximum number of times an instance in one entity
can be associated with instances in the related entity, and the minimum number of times an instance in one
entity can be associated with an instance in the related entity. Cardinality and or dinality are
represented by the styling of a line and its endpoint, as denoted by the chosen notation style.

5.3 Drawing ERDs

Procedure

6.0 Normalization

6.1 Meaning and importance of normalization

Database normalization is the process of restructuring a relational database in accordance with a series of
so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed
by Edgar F. Codd as an integral part of his relational model.

Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that
their dependencies are properly enforced by database integrity constraints. It is accomplished by applying
Page 70 of 19 | P a g e
some formal rules either by a process of synthesis (creating a new database design) or decomposition
(improving an existing database design).

The objectives of normalization beyond 1NF (first normal form) were stated as follows by Codd:

1. To free the collection of relations from undesirable insertion, update and deletion
dependencies;
2. To reduce the need for restructuring the collection of relations, as new types of data are
introduced, and thus increase the life span of application programs;
3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query statistics, where these statistics are
liable to change as time goes by.

Normalization is used for mainly two purposes,

 Eliminating redundant (useless) data.


 Ensuring data dependencies make sense i.e. data is logically stored.

Problems without Normalization

If a table is not properly normalized and have data redundancy then it will not only eat up extra memory
space but will also make it difficult to handle and update the database, without facing data loss. Insertion,
Updation and Deletion Anomalies are very frequent if database is not normalized.
To understand these anomalies let us take an example of a Student table.
.

Rollno name branch hod office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

404 Dkon CSE Mr. X 53337

In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod
(Head of Department) and office_tel is repeated for the students who are in the same branch in the college,
this is Data Redundancy.

Page 71 of 19 | P a g e
Insertion Anomaly

 Insertion anomaly. There are circumstances in which certain facts cannot be recorded at all. For
example, each record in a "Faculty and Their Courses" relation might contain a Faculty ID, Faculty
Name, Faculty Hire Date, and Course Code. Therefore, we can record the details of any faculty
member who teaches at least one course, but we cannot record a newly hired faculty member who has
not yet been assigned to teach any courses, except by setting the Course Code to null. This phenomenon
is known as an insertion anomaly.
 Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be
inserted, or else we will have to set the branch information as NULL.
 Also, if we have to insert data of 100 students of same branch, then the branch information will be
repeated for all those 100 students.
 These scenarios are nothing but Insertion anomalies.

Updating Anomaly

Update anomaly. The same information can be expressed on multiple rows; therefore updates to the
relation may result in logical inconsistencies. For example, each record in an "Employees' Skills" relation
might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular
employee may need to be applied to multiple records (one for each skill). If the update is only partially
successful – the employee's address is updated on some records but not others – then the relation is left in
an inconsistent state. Specifically, the relation provides conflicting answers to the question of what this
particular employee's address is. This phenomenon is known as an update anomaly.

What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all
the student records will have to be updated, and if by mistake we miss any record, it will lead to data
inconsistency. This is Updating anomaly.

Deletion Anomaly

 Deletion anomaly. Under certain circumstances, deletion of data representing certain facts necessitates
deletion of data representing completely different facts. The "Faculty and Their Courses" relation
described in the previous example suffers from this type of anomaly, for if a faculty member
temporarily ceases to be assigned to any courses, we must delete the last of the records on which that
faculty member appears, effectively also deleting the faculty member, unless we set the Course Code to
null. This phenomenon is known as a deletion anomaly.
 In our Student table, two different information are kept together, Student information and Branch
information. Hence, at the end of the academic year, if student records are deleted, we will also lose the
branch information. This is Deletion anomaly.

Page 72 of 19 | P a g e
Normalization is the process of splitting relations into well structured relations that allow users to insert,
delete, and update tuples without introducing database. Without normalization many problems can occur
when trying to load an integrated conceptual model into the DBMS. These problems arise from relations
that are generated directly from user views are called anomalies. There are three types of anomalies:
update, deletion and insertion anomalies.

An update anomaly is a data inconsistency that results from data redundancy and a partial update. For
example, each employee in a company has a department associated with them as well as the student group
they participate in.

Employee_ID Name Department Student_Group


123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Management Club
456 A. Bruchs CIS Technology Org.
456 A. Bruchs CIS Beta Alpha Psi

If A. Bruchs’ department is an error it must be updated at least 2 times or there will be inconsistent data in
the database. If the user performing the update does not realize the data is stored redundantly the update
will not be done properly.

A deletion anomaly is the unintended loss of data due to deletion of other data. For example, if the student
group Beta Alpha Psi disbanded and was deleted from the table above, J. Longfellow and the Accounting
department would cease to exist. This results in database inconsistencies and is an example of how
combining information that does not really belong together into one table can cause problems.

An insertion anomaly is the inability to add data to the database due to absence of other data. For example,
assume Student_Group is defined so that null values are not allowed. If a new employee is hired but not
immediately assigned to a Student_Group then this employee could not be entered into the database. This
results in database inconsistencies due to omission.

Update, deletion, and insertion anomalies are very undesirable in any database. Anomalies are avoided by
the process of normalization.

6.2 Normalization Rule

Normalization rules are divided into the following normal forms:

1. First Normal Form


2. Second Normal Form
3. Third Normal Form
4. BCNF

Page 73 of 19 | P a g e
5. Fourth Normal Form

A functional dependency (FD)

A functional dependency (FD) is a relationship between two attributes, typically between the PK and other
non-key attributes within a table. For any relation R, attribute Y is functionally dependent on attribute X
(usually the PK), if for every valid instance of X, that value of X uniquely determines the value of Y. This
relationship is indicated by the representation below:

X ———–> Y

The left side of the above FD diagram is called the determinant, and the right side is the dependent. Here
are a few examples.

In the first example, below, SIN determines Name, Address and Birthdate. Given SIN, we can determine
any of the other attributes within the table.

SIN ———-> Name, Address, Birthdate

For the second example, SIN and Course determine the date completed (DateCompleted). This must also
work for a composite PK.

SIN, Course ———> DateCompleted

The third example indicates that ISBN determines Title.

ISBN ———–> Title

Inference Rules

Armstrong’s axioms are a set of inference rules used to infer all the functional dependencies on a relational
database. They were developed by William W. Armstrong. The following describes what will be used, in
terms of notation, to explain these axioms.

Let R(U) be a relation scheme over the set of attributes U. We will use the letters X, Y, Z to represent any
subset of and, for short, the union of two sets of attributes, instead of the usual X U Y.

Axiom of reflexivity

This axiom says, if Y is a subset of X, then X determines Y (see Figure 11.1).

Page 74 of 19 | P a g e
For example, PartNo —> NT123 where X (PartNo) is composed of more than one piece of information;
i.e., Y (NT) and partID (123).

Axiom of augmentation

The axiom of augmentation, also known as a partial dependency, says if X determines Y, then XZ
determines YZ for any Z (see Figure 11.2 ).

The axiom of augmentation says that every non-key attribute must be fully dependent on the PK. In the
example shown below, StudentName, Address, City, Prov, and PC (postal code) are only dependent on the
StudentNo, not on the StudentNo and Grade.

StudentNo, Course —> StudentName, Address, City, Prov, PC, Grade, DateCompleted

This situation is not desirable because every non-key attribute has to be fully dependent on the PK. In this
situation, student information is only partially dependent on the PK (StudentNo).

To fix this problem, we need to break the original table down into two as follows:

 Table 1: StudentNo, Course, Grade, DateCompleted


 Table 2: StudentNo, StudentName, Address, City, Prov, PC

Axiom of transitivity

The axiom of transitivity says if X determines Y, and Y determines Z, then X must also determine Z (see
Figure 11.3).

The table below has information not directly related to the student; for instance, ProgramID and
ProgramName should have a table of its own. ProgramName is not dependent on StudentNo; it’s dependent
on ProgramID.

StudentNo —> StudentName, Address, City, Prov, PC, ProgramID, ProgramName

This situation is not desirable because a non-key attribute (ProgramName) depends on another non-key
attribute (ProgramID).

To fix this problem, we need to break this table into two: one to hold information about the student and the
other to hold information about the program.

 Table 1: StudentNo —> StudentName, Address, City, Prov, PC, ProgramID


 Table 2: ProgramID —> ProgramName

Page 75 of 19 | P a g e
However we still need to leave an FK in the student table so that we can identify which program the
student is enrolled in.

Union

This rule suggests that if two tables are separate, and the PK is the same, you may want to consider putting
them together. It states that if X determines Y and X determines Z then X must also determine Y and Z (see
Figure 11.4).

For example, if:

 SIN —> EmpName


 SIN —> SpouseName

You may want to join these two tables into one as follows:

SIN –> EmpName, SpouseName

Some database administrators (DBA) might choose to keep these tables separated for a couple of reasons.
One, each table describes a different entity so the entities should be kept apart. Two, if SpouseName is to
be left NULL most of the time, there is no need to include it in the same table as EmpName.

Decomposition

Decomposition is the reverse of the Union rule. If you have a table that appears to contain two entities that
are determined by the same PK, consider breaking them up into two tables. This rule states that if X
determines Y and Z, then X determines Y and X determines Z separately (see Figure 11.5).

Dependency Diagram

A dependency diagram, shown in Figure 11.6, illustrates the various dependencies that might exist in a
non-normalized table. A non-normalized table is one that has data redundancy in it.

The following dependencies are identified in this table:

 ProjectNo and EmpNo, combined, are the PK.


 Partial Dependencies:

Page 76 of 19 | P a g e
o ProjectNo —> ProjName
o EmpNo —> EmpName, DeptNo,
o ProjectNo, EmpNo —> HrsWork
 Transitive Dependency:
o DeptNo —> DeptName

Functional Dependency

Functional dependency is a relationship that exists when one attribute uniquely determines another
attribute.
If R is a relation with attributes X and Y, a functional dependency between the attributes is represented as
X->Y, which specifies Y is functionally dependent on X. Here X is a determinant set and Y is a dependent
attribute. Each value of X is associated with precisely one Y value.
Functional dependency in a database serves as a constraint between two sets of attributes. Defining
functional dependency is an important part of relational database design and contributes to aspect
normalization.

Advantages of Functional Dependency

 Functional Dependency avoids data redundancy where same data should not be repeated at
multiple locations in same database.
 It maintains the quality of data in database.
 It allows clearly defined meanings and constraints of databases.
 It helps in identifying bad designs.
 It expresses the facts about the database design.
Types of Functional dependency

Page 77 of 19 | P a g e
1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
3. Employee_Id is a subset of {Employee_Id, Employee Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies
too.
2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB

Transitive Dependency

A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.


For e.g.

X -> Z is a transitive dependency if the following three functional dependencies hold true:

 X->Y
 Y does not ->X
 Y->Z

Note: A transitive dependency can only occur in a relation of three of more attributes. This dependency
helps us normalizing the database in 3NF (3rd Normal Form).

Transitive Dependency Example

AUTHORS

Author_ID Author Book Author_Nationality

Auth_001 Orson Scott Card Ender's Game United States

Auth_001 Orson Scott Card Ender's Game United States


Page 78 of 19 | P a g e
Author_ID Author Book Author_Nationality

Auth_002 Margaret Atwood The Handmaid's Canada


Tale

In the AUTHORS example above:

 Book → Author: Here, the Book attribute determines the Author attribute. If you know the book
name, you can learn the author's name. However, Authordoes not determine Book, because an
author can write multiple books. For example, just because we know the author's name Orson Scott
Card, we still don't know the book name.
 Author → Author_Nationality: Likewise, the Author attribute determines the Author_Nationality,
but not the other way around; just because we know the nationality does not mean we can determine
the author.

But this table introduces a transitive dependency:

 Book →Author_Nationality: If we know the book name, we can determine the nationality via the
Author column.

Avoiding Transitive Dependencies

To ensure Third Normal Form, let's remove the transitive dependency.

We can start by removing the Book column from the Authors table and creating a separate Books table:

BOOKS

Book_ID Book Author_ID

Book_001Ender's Game Auth_001

Book_001Children of the Mind Auth_001

Book_002The Handmaid's TaleAuth_002

AUTHORS

Author_ID Author Author_Nationality

Auth_001 Orson Scott Card United States

Auth_002 Margaret AtwoodCanada

Did this fix it? Let's examine our dependencies now:

Page 79 of 19 | P a g e
BOOKS table:

 Book_ID → Book: The Book depends on the Book_ID.


 No other dependencies in this table exist, so we are okay. Note that the foreign key Author_ID links
this table to the AUTHORS table through its primary key Author_ID. We have created a
relationship to avoid a transitive dependency, a key design of relational databases.

AUTHORS table:

 Author_ID → Author: The Author depends on the Author_ID.


 Author → Author_Nationality: The nationality can be determined by the author.
 Author_ID → Author_Nationality: The nationality can be determined from the Author_ID through
the Author attribute. We still have a transitive dependency.

We need to add a third table to normalize this data:

COUNTRIES

Country_ID Country

Coun_001 United States

Coun_002 Canada

AUTHORS

Author_ID Author Country_ID

Auth_001 Orson Scott Card Coun_001

Auth_002 Margaret AtwoodCoun_002

Now we have three tables, making use of foreign keys to link between the tables:

 The BOOK table's foreign key Author_ID links a book to an author in the AUTHORS table.
 The AUTHORS table's foreign key Country_ID links an author to a country in the
COUNTRIES table.

 The COUNTRIES table has no foreign key because it has no need to link to another table in this
design.

Why Transitive Dependencies Are Bad Database Design

What is the value of avoiding transitive dependencies to help ensure 3NF? Let's consider our first table
again and see the issues it creates:

AUTHORS
Page 80 of 19 | P a g e
Author ID Author Book Author_Nationality

Auth_001 Orson Scott Card Ender's Game United States

Auth_001 Orson Scott Card Children of the Mind United States

Auth_002 Margaret AtwoodThe Handmaid's TaleCanada

This kind of design can contribute to data anomalies and inconsistencies, for example:

 If you deleted the two books "Children of the Mind" and "Ender's Game," you would delete the
author "Orson Scott Card" and his nationality completely from the database.

 You cannot add a new author to the database unless you also add a book; what if the author is yet
unpublished or you don't know the name of a book she has authored?
 If "Orson Scott Card" changed his citizenship, you would have to change it in all records in which
he appears. Having multiple records with the same author can result in inaccurate data: what if the
data entry person doesn't realize there are multiple records for him and changes the data in only one
record?
 You cannot delete a book like "The Handmaid's Tale" without also deleting the author completely.

Full functional dependency (FFD)

Full Functional Dependency: In a relation, there exists Full Functional Dependency between any two
attributes X and Y, when X is functionally dependent on Y and is not functionally dependent on any proper
subset of Y.
Partial Functional Dependency: In a relation, there exists Partial Dependency, when a non prime attribute
(the attributes which are not a part of any candidate key) is functionally dependent on a proper subset of
Candidate Key.
For example: Let there be a relation R (Course, Sid, Sname, fid, schedule, room, marks)
Full Functional Dependencies: {Course, Sid) -> Sname, {Course, Sid} -> Marks, etc.
Partial Functional Dependencies: Course -> Schedule, Course -> Room
A full functional dependency is a state of database normalization that equates to the normalization
standard of Second Normal Form (2NF). In brief, this means that it meets the requirements of First
Normal Form (1NF), and all non-key attributes are fully functionally dependent on the primary key.

De-normalization is a strategy used on a previously-normalized database to increase performance.


In computing, de-normalization is the process of trying to improve the read performance of a database, at
the expense of losing some write performance, by adding redundant copies of data or by grouping data. It is
often motivated by performance or scalability in relational database software needing to carry out very
large numbers of read operations. De-normalization should not be confused with Un-normalized form.
Databases/tables must first be normalized to efficiently de-normalize them.

6.3 Performing Normalization

Page 81 of 19 | P a g e
Normalization is a method to remove all these anomalies and bring the database to a consistent state.

First Normal Form First Normal Form is defined in the definition of relations tables itself.

This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic
domain are indivisible units.

Second Normal Form

Before we learn about the second normal form, we need to understand the following − Prime attribute − An
attribute, which is a part of the prime-key, is known as a prime attribute. Non-prime attribute − An
attribute, which is not a part of the prime-key, is said to be a non-prime attribute. If we follow second
normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute.
That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds
true.

Page 82 of 19 | P a g e
Page 83 of 19 | P a g e
Example here

Page 84 of 19 | P a g e
7.0 Querying a database

7.1 Meaning of database query

A database query is a request for data from a database. Usually the request is to retrieve
data; however, data can also be manipulated using queries. The data can come from one or
more tables, or even other queries.

Sql

SQL is Structured Query Language, which is a computer language for storing, manipulating and retrieving
data stored in a relational database.

SQL is the standard language for Relational Database System. All the Relational Database Management
Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL
as their standard database language.
Also, they are using different dialects, such as:

 MS SQL Server using T-SQL,


 Oracle using PL/SQL,
 MS Access version of SQL is called JET SQL (native format) etc.

Why SQL?
SQL is widely popular because it offers the following advantages:

 Allows users to access data in the relational database management systems.


 Allows users to describe the data.
 Allows users to define the data in a database and manipulate that data.
 Allows embedding within other languages using SQL modules, libraries & pre-compilers.
 Allows users to create and drop databases and tables.
 Allows users to create view, stored procedure, functions in a database.
 Allows users to set permissions on tables, procedures and views.

Page 85 of 19 | P a g e
7.2 Components of database query

When you are executing an SQL command for any RDBMS, the system determines the best way to carry
out your request and SQL engine figures out how to interpret the task. There are various components
included in this process.
These components are –

 Query Dispatcher
 Optimization Engines
 Classic Query Engine
 SQL Query Engine, etc.

A classic query engine handles all the non-SQL queries, but a SQL query engine won't handle logical files.
Following is a simple diagram showing the SQL Architecture:

 Query parser, takes query text and produces a parse tree (or produces syntax or semantic errors as
appropriate).
 Query optimizer, takes the parse tree structure and produces an execution plan data structure. At
this point, the best indexes to be used will be determined, join methods and join order will be
figured out, etc, and all this stuff will annotate the execution plan structure.
 Query executor or query processor - takes the execution plan and interacts with the Storage
Manager to actually fetch the data from storage (or cache, etc), and presents the data to the user
using outbound client-side APIs.
 Database engine- A database engine (or storage engine) is the underlying software component
that a database management system (DBMS) uses to create, read, update and delete
(CRUD) data from a database.

7.3 Categories of SQL statements

SQL | DDL, DML, DCL and TCL Commands

Page 86 of 19 | P a g e
Structured Query Language (SQL) as we all know is the database language by the use of which we can
perform certain operations on the existing database and also we can use this language to create a database.
SQL uses certain commands like Create, Drop, Insert etc. to carry out the required tasks.
These SQL commands are mainly categorized into four categories as discussed below:
1. DDL (Data Definition Language): DDL or Data Definition Language actually consists of the SQL
commands that can be used to define the database schema. It simply deals with descriptions of the
database schema and is used to create and modify the structure of database objects in database.
Examples of DDL commands:
 CREATE – is used to create the database or its objects (like table, index, function, views, store
procedure and triggers).
 DROP – is used to delete objects from the database.
 ALTER-is used to alter the structure of the database.
 TRUNCATE–is used to remove all records from a table, including all spaces allocated for the
records are removed.
 COMMENT –is used to add comments to the data dictionary.
 RENAME –is used to rename an object existing in the database.
2. DML (Data Manipulation Language) : The SQL commands that deals with the manipulation of
data present in database belong to DML or Data Manipulation Language and this includes most of the
SQL statements.
Examples of DML:
 SELECT – is used to retrieve data from the database.
 INSERT – is used to insert data into a table.
 UPDATE – is used to update existing data within a table.
 DELETE – is used to delete records from a database table.

3. DCL (Data Control Language): DCL includes commands such as GRANT and REVOKE which
mainly deals with the rights, permissions and other controls of the database system.
Examples of DCL commands:
 GRANT-gives user’s access privileges to database.
 REVOKE-withdraw user’s access privileges given by using the GRANT command.
4. TCL (transaction Control Language): TCL commands deals with the transaction within the
database.
Examples of TCL commands:
 COMMIT– commits a Transaction.
 ROLLBACK– rollbacks a transaction in case of any error occurs.
 SAVEPOINT–sets a save point within a transaction.
 SET TRANSACTION–specify characteristics for the transaction.

Page 87 of 19 | P a g e
Page 88 of 19 | P a g e
7.4 Design SQL queries

SQL General Data Types

Each column in a database table is required to have a name and a data type.
SQL developers have to decide what types of data will be stored inside each and every table column when
creating a SQL table. The data type is a label and a guideline for SQL to understand what type of data is
expected inside of each column, and it also identifies how SQL will interact with the stored data.
The following table lists the general data types in SQL:

MySQL Data Types

In MySQL there are three main data types: text, number, and date.

Text data types:

Data type Description

Holds a fixed length string (can contain letters, numbers, and special characters). The
CHAR(size)
fixed size is specified in parenthesis. Can store up to 255 characters

Holds a variable length string (can contain letters, numbers, and special characters).
VARCHAR(size) The maximum size is specified in parenthesis. Can store up to 255 characters. Note:
If you put a greater value than 255 it will be converted to a TEXT type

TINYTEXT Holds a string with a maximum length of 255 characters

TEXT Holds a string with a maximum length of 65,535 characters

BLOB For BLOBs (Binary Large OBjects). Holds up to 65,535 bytes of data

MEDIUMTEXT Holds a string with a maximum length of 16,777,215 characters

MEDIUMBLOB For BLOBs (Binary Large OBjects). Holds up to 16,777,215 bytes of data

LONGTEXT Holds a string with a maximum length of 4,294,967,295 characters

LONGBLOB For BLOBs (Binary Large OBjects). Holds up to 4,294,967,295 bytes of data

Let you enter a list of possible values. You can list up to 65535 values in an ENUM
list. If a value is inserted that is not in the list, a blank value will be inserted.
ENUM(x,y,z,etc.)
Note: The values are sorted in the order you enter them.

You enter the possible values in this format: ENUM('X','Y','Z')

Page 89 of 19 | P a g e
Number data types:

Data type Description

-128 to 127 normal. 0 to 255 UNSIGNED*. The maximum number of digits may be
TINYINT(size)
specified in parenthesis

-32768 to 32767 normal. 0 to 65535 UNSIGNED*. The maximum number of digits


SMALLINT(size)
may be specified in parenthesis

-8388608 to 8388607 normal. 0 to 16777215 UNSIGNED*. The maximum number


MEDIUMINT(size)
of digits may be specified in parenthesis

-2147483648 to 2147483647 normal. 0 to 4294967295 UNSIGNED*. The maximum


INT(size)
number of digits may be specified in parenthesis

-9223372036854775808 to 9223372036854775807 normal. 0 to


BIGINT(size) 18446744073709551615 UNSIGNED*. The maximum number of digits may be
specified in parenthesis

A small number with a floating decimal point. The maximum number of digits may
FLOAT(size,d) be specified in the size parameter. The maximum number of digits to the right of the
decimal point is specified in the d parameter

A large number with a floating decimal point. The maximum number of digits may be
DOUBLE(size,d) specified in the size parameter. The maximum number of digits to the right of the
decimal point is specified in the d parameter

A DOUBLE stored as a string , allowing for a fixed decimal point. The maximum
DECIMAL(size,d) number of digits may be specified in the size parameter. The maximum number of
digits to the right of the decimal point is specified in the d parameter

Date data types:

Data type Description

A date. Format: YYYY-MM-DD


DATE()
Note: The supported range is from '1000-01-01' to '9999-12-31'
*A date and time combination. Format: YYYY-MM-DD HH:MI:SS
DATETIME()
Note: The supported range is from '1000-01-01 00:00:00' to '9999-12-31 23:59:59'
*A timestamp. TIMESTAMP values are stored as the number of seconds since the
Unix epoch ('1970-01-01 00:00:00' UTC). Format: YYYY-MM-DD HH:MI:SS
TIMESTAMP()
Note: The supported range is from '1970-01-01 00:00:01' UTC to '2038-01-09
03:14:07' UTC
Page 90 of 19 | P a g e
A time. Format: HH:MI:SS
TIME()
Note: The supported range is from '-838:59:59' to '838:59:59'
A year in two-digit or four-digit format.
YEAR()
Note: Values allowed in four-digit format: 1901 to 2155. Values allowed in two-digit
format: 70 to 69, representing years from 1970 to 2069

SQL CREATE DATABASE Statement

Syntax

CREATE DATABASE databasename;


Example

CREATE DATABASE testDB;


Selecting database
Use databasename;
The SQL DROP DATABASE Statement

The DROP DATABASE statement is used to drop an existing SQL database.

Syntax

DROP DATABASE databasename;


Example

DROP DATABASE testDB;


The SQL BACKUP DATABASE Statement

The BACKUP DATABASE statement is used in SQL Server to create a full back up of an existing SQL
database.

Syntax

BACKUP DATABASE databasename


TO DISK = 'filepath';
The SQL BACKUP WITH DIFFERENTIAL Statement

A differential back up only backs up the parts of the database that have changed since the last full database
backup.

Syntax

Page 91 of 19 | P a g e
BACKUP DATABASE databasename
TO DISK = 'filepath'
WITH DIFFERENTIAL;

BACKUP DATABASE Example

The following SQL statement creates a full back up of the existing database "testDB" to the D disk:

Example

BACKUP DATABASE testDB


TO DISK = 'D:\backups\testDB.bak';
The SQL CREATE TABLE Statement

The CREATE TABLE statement is used to create a new table in a database.

Syntax

CREATE TABLE table_name (


column1 datatype,
column2 datatype,
column3 datatype,
....
);

The column parameters specify the names of the columns of the table.
The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date, etc.).
Tip: For an overview of the available data types, go to our complete Data Types Reference.

SQL CREATE TABLE Example

The following example creates a table called "Persons" that contains five columns: PersonID, LastName,
FirstName, Address, and City:

Example

CREATE TABLE Persons (


PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);

Page 92 of 19 | P a g e
Create Table Using another Table

A copy of an existing table can also be created using CREATE TABLE.


The new table gets the same column definitions. All columns or specific columns can be selected.
If you create a new table using an existing table, the new table will be filled with the existing values from
the old table.

Syntax

CREATE TABLE new_table_name AS


SELECT column1, column2,...
FROM existing_table_name
WHERE ....;

The following SQL creates a new table called "TestTables" (which is a copy of the "Customers" table):

Example

CREATE TABLE TestTable AS


SELECT customername, contactname
FROM customers;
The SQL DROP TABLE Statement

The DROP TABLE statement is used to drop an existing table in a database.

Syntax

DROP TABLE table_name;

Note: Be careful before dropping a table. Deleting a table will result in loss of complete information stored
in the table!

SQL DROP TABLE Example

The following SQL statement drops the existing table "Shippers":

Example

DROP TABLE Shippers;


SQL TRUNCATE TABLE

The TRUNC

ATE TABLE statement is used to delete the data inside a table, but not the table itself.

Page 93 of 19 | P a g e
Syntax

TRUNCATE TABLE table_name;

SQL ALTER TABLE Statement

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
The ALTER TABLE statement is also used to add and drop various constraints on an existing table.

ALTER TABLE - ADD Column

To add a column in a table, use the following syntax:

ALTER TABLE table_name


ADD column_name datatype;

The following SQL adds an "Email" column to the "Customers" table:

Example

ALTER TABLE Customers


ADD Email varchar(255);

ALTER TABLE - DROP COLUMN

To delete a column in a table, use the following syntax (notice that some database systems don't allow
deleting a column):

ALTER TABLE table_name


DROP COLUMN column_name;

The following SQL deletes the "Email" column from the "Customers" table:

Example

ALTER TABLE Customers


DROP COLUMN Email;

ALTER TABLE - ALTER/MODIFY COLUMN

To change the data type of a column in a table, use the following syntax:
SQL Server / MS Access:

ALTER TABLE table_name


ALTER COLUMN column_name datatype;

Page 94 of 19 | P a g e
My SQL / Oracle (prior version 10G):

ALTER TABLE table_name


MODIFY COLUMN column_name datatype;
Oracle 10G and later:

ALTER TABLE table_name


MODIFY column_name datatype;

SQL ALTER TABLE Example

Look at the "Persons" table:

I LastName FirstName Address City


D
1 Hansen Ola Timoteivn Sandnes
10
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to add a column named "DateOfBirth" in the "Persons" table.


We use the following SQL statement:

ALTER TABLE Persons


ADD DateOfBirth date;

Notice that the new column, "DateOfBirth", is of type date and is going to hold a date. The data type
specifies what type of data the column can hold. For a complete reference of all the data types available in
MS Access, MySQL, and SQL Server, go to our complete Data Types reference.
The "Persons" table will now look like this:

ID LastNam FirstName Address City DateOfBirth


e
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Change Data Type Example

Now we want to change the data type of the column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:

ALTER TABLE Persons


ALTER COLUMN DateOfBirth year;

Notice that the "DateOfBirth" column is now of type year and is going to hold a year in a two- or four-digit
format.
Page 95 of 19 | P a g e
DROP COLUMN Example

Next, we want to delete the column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:

ALTER TABLE Persons


DROP COLUMN DateOfBirth;

The "Persons" table will now look like this:

I LastName FirstName Address City


D
1 Hansen Ola Timoteivn Sandnes
10
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

SQL Constraints

SQL constraints are used to specify rules for the data in a table.
Constraints are used to limit the type of data that can go into a table. This ensures the accuracy and
reliability of the data in the table. If there is any violation between the constraint and the data action, the
action is aborted.
Constraints can be column level or table level. Column level constraints apply to a column, and table level
constraints apply to the whole table.
The following constraints are commonly used in SQL:

NOT NULL - Ensures that a column cannot have a NULL value



UNIQUE - Ensures that all values in a column are different

PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in

a table
 FOREIGN KEY - Uniquely identifies a row/record in another table
 CHECK - Ensures that all values in a column satisfies a specific condition
 DEFAULT - Sets a default value for a column when no value is specified
 INDEX - Used to create and retrieve data from the database very quickly
SQL PRIMARY KEY Constraint

The PRIMARY KEY constraint uniquely identifies each record in a table.


Primary keys must contain UNIQUE values, and cannot contain NULL values.
A table can have only one primary key, which may consist of single or multiple fields.

SQL PRIMARY KEY on CREATE TABLE

The following SQL creates a PRIMARY KEY on the "ID" column when the "Persons" table is created:

MySQL:

Page 96 of 19 | P a g e
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);
SQL Server / Oracle / MS Access:

CREATE TABLE Persons (


ID int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on
multiple columns, use the following SQL syntax:
MySQL / SQL Server / Oracle / MS Access:

CREATE TABLE Persons (


ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CONSTRAINT PK_Person PRIMARY KEY (ID,LastName)
);

Note: In the example above there is only ONE PRIMARY KEY (PK_Person). However, the VALUE of
the primary key is made up of TWO COLUMNS (ID + LastName).

SQL FOREIGN KEY Constraint

A FOREIGN KEY is a key used to link two tables together.


A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in
another table.
The table containing the foreign key is called the child table, and the table containing the candidate key is
called the referenced or parent table.
Look at the following two tables:
"Persons" table:

PersonID LastName FirstName Age

1 Hansen Ola 30

Page 97 of 19 | P a g e
2 Svendson Tove 23

3 Pettersen Kari 20

"Orders" table:

OrderID OrderNumber PersonID

1 77895 3

2 44678 3

3 22456 2

4 24562 1

Notice that the "PersonID" column in the "Orders" table points to the "PersonID" column in the "Persons"
table.
The "PersonID" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.
The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key column,
because it has to be one of the values contained in the table it points to.

SQL FOREIGN KEY on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "PersonID" column when the "Orders" table is
created:

MySQL:

CREATE TABLE Orders (


OrderID int NOT NULL,
OrderNumber int NOT NULL,
PersonID int,
PRIMARY KEY (OrderID),
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);
SQL Server / Oracle / MS Access:

CREATE TABLE Orders (


OrderID int NOT NULL PRIMARY KEY,
OrderNumber int NOT NULL,

Page 98 of 19 | P a g e
PersonID int FOREIGN KEY REFERENCES Persons(PersonID)
);

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on
multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

CREATE TABLE Orders (


OrderID int NOT NULL,
OrderNumber int NOT NULL,
PersonID int,
PRIMARY KEY (OrderID),
CONSTRAINT FK_PersonOrder FOREIGN KEY (PersonID)
REFERENCES Persons(PersonID)
);

SQL FOREIGN KEY on ALTER TABLE

To create a FOREIGN KEY constraint on the "PersonID" column when the "Orders" table is already
created, use the following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Orders


ADD FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on
multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Orders


ADD CONSTRAINT FK_PersonOrder
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);
DROP a FOREIGN KEY Constraint

To drop a FOREIGN KEY constraint, use the following SQL:

MySQL:

ALTER TABLE Orders


DROP FOREIGN KEY FK_PersonOrder;
SQL Server / Oracle / MS Access:

Page 99 of 19 | P a g e
ALTER TABLE Orders
DROP CONSTRAINT FK_PersonOrder;

SQL PRIMARY KEY on ALTER TABLE

To create a PRIMARY KEY constraint on the "ID" column when the table is already created, use the
following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD PRIMARY KEY (ID);

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on
multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CONSTRAINT PK_Person PRIMARY KEY (ID,LastName);

Note: If you use the ALTER TABLE statement to add a primary key, the primary key column(s) must
already have been declared to not contain NULL values (when the table was first created).

DROP a PRIMARY KEY Constraint

To drop a PRIMARY KEY constraint, use the following SQL:

MySQL:

ALTER TABLE Persons


DROP PRIMARY KEY;
SQL Server / Oracle / MS Access:

ALTER TABLE Persons


DROP CONSTRAINT PK_Person;

SQL INSERT INTO Statement

The SQL INSERT INTO Statement

The INSERT INTO statement is used to insert new records in a table.

INSERT INTO Syntax

Page 100 of 19 | P a g e
It is possible to write the INSERT INTO statement in two ways.
The first way specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1, column2, column3 …)


VALUES (value1, value2, value3 ...);

If you are adding values for all the columns of the table, you do not need to specify the column names in
the SQL query. However, make sure the order of the values is in the same order as the columns in the table.
The INSERT INTO syntax would be as follows:

INSERT INTO table_name


VALUES (value1, value2, value3 ...);
Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCod Country


e
89 White Clover Karl Jablonski 305 - 14th Ave. Seattle 98128 USA
Markets S. Suite 3B
90 Wilman Kala Matti Keskuskatu 45 Helsinki 21240 Finland
Karttunen
91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland

INSERT INTO Example

The following SQL statement inserts a new record in the "Customers" table:

Example

INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)


VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');

The selection from the "Customers" table will now look like this:

CustomerID CustomerName ContactName Address City PostalCod Country


e
89 White Clover Karl Jablonski 305 - 14th Ave. Seattle 98128 USA
Markets S. Suite 3B
90 Wilman Kala Matti Keskuskatu 45 Helsinki 21240 Finland
Karttunen
91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland

92 Cardinal Tom B. Skagen 21 Stavanger 4006 Norway


Erichsen

Page 101 of 19 | P a g e
Did you notice that we did not insert any number into the CustomerID field?
The CustomerID column is an auto-increment field and will be generated automatically when a new record
is inserted into the table.

Insert Data Only in Specified Columns

It is also possible to only insert data in specific columns.


The following SQL statement will insert a new record, but only insert data in the "CustomerName", "City",
and "Country" columns (CustomerID will be updated automatically):

Example

INSERT INTO Customers (CustomerName, City, Country)


VALUES ('Cardinal', 'Stavanger', 'Norway');

The selection from the "Customers" table will now look like this:

CustomerID CustomerName ContactName Address City PostalCod Country


e
89 White Clover Karl Jablonski 305 - 14th Ave. Seattle 98128 USA
Markets S. Suite 3B
90 Wilman Kala Matti Keskuskatu 45 Helsinki 21240 Finland
Karttunen
91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland

92 Cardinal null null Stavanger null Norway

7.5 Use SQL statements to interrogate a database

SQL SELECT Statement

The SQL SELECT Statement

The SELECT statement is used to select data from a database.


The data returned is stored in a result table, called the result-set.

SELECT Syntax

SELECT column1, column2, ...


FROM table_name;

Here, column1, column2, ... are the field names of the table you want to select data from. If you want to
select all the fields available in the table, use the following syntax:

SELECT * FROM table_name;


Demo Database

Page 102 of 19 | P a g e
Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

SELECT Column Example

The following SQL statement selects the "CustomerName" and "City" columns from the "Customers"
table:

Example

SELECT CustomerName, City FROM Customers;


SELECT * Example

The following SQL statement selects all the columns from the "Customers" table:

Example

SELECT * FROM Customers;

The SQL SELECT DISTINCT Statement

The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the
different (distinct) values.

SELECT DISTINCT Syntax

SELECT DISTINCT column1, column2, ...


FROM table_name;

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country

Page 103 of 19 | P a g e
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

SELECT Example

The following SQL statement selects all (and duplicate) values from the "Country" column in the
"Customers" table:

Example

SELECT Country FROM Customers;

Now, let us use the DISTINCT keyword with the above SELECT statement and see the result.

SELECT DISTINCT Examples

The following SQL statement selects only the DISTINCT values from the "Country" column in the
"Customers" table:

Example

SELECT DISTINCT Country FROM Customers;

The following SQL statement lists the number of different (distinct) customer countries:

Example

SELECT COUNT(DISTINCT Country) FROM Customers;

Note: The example above will not work in Firefox and Microsoft Edge! Because COUNT(DISTINCT
column_name) is not supported in Microsoft Access databases. Firefox and Microsoft Edge are using
Microsoft Access in our examples.
Here is the workaround for MS Access:

Example

Page 104 of 19 | P a g e
SELECT Count(*) AS DistinctCountries
FROM (SELECT DISTINCT Country FROM Customers);

SQL WHERE Clause

The SQL WHERE Clause

The WHERE clause is used to filter records.


The WHERE clause is used to extract only those records that fulfill a specified condition.

WHERE Syntax

SELECT column1, column2, ...


FROM table_name
WHERE condition;

Note: The WHERE clause is not only used in SELECT statement, it is also used in UPDATE, DELETE
statement, etc.!

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

WHERE Clause Example

The following SQL statement selects all the customers from the country "Mexico", in the "Customers"
table:

Example

SELECT * FROM Customers


WHERE Country='Mexico';

Page 105 of 19 | P a g e
Text Fields vs. Numeric Fields

SQL requires single quotes around text values (most database systems will also allow double quotes).
However, numeric fields should not be enclosed in quotes:

Example

SELECT * FROM Customers


WHERE CustomerID=1;

Operators in The WHERE Clause

The following operators can be used in the WHERE clause:

Operator Description
= Equal
<> Not equal. Note: In some versions of SQL this operator may be written as !=
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between a certain range
LIKE Search for a pattern
IN To specify multiple possible values for a column

SQL AND, OR and NOT Operators

The SQL AND, OR and NOT Operators

The WHERE clause can be combined with AND, OR, and NOT operators.
The AND and OR operators are used to filter records based on more than one condition:

 The AND operator displays a record if all the conditions separated by AND is TRUE.
 The OR operator displays a record if any of the conditions separated by OR is TRUE.

The NOT operator displays a record if the condition(s) is NOT TRUE.

AND Syntax

SELECT column1, column2, ...


FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;
OR Syntax

SELECT column1, column2, ...


FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;

Page 106 of 19 | P a g e
NOT Syntax

SELECT column1, column2, ...


FROM table_name
WHERE NOT condition;

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

AND Example

The following SQL statement selects all fields from "Customers" where country is "Germany" AND city is
"Berlin":

Example

SELECT * FROM Customers


WHERE Country='Germany' AND City='Berlin';
OR Example

The following SQL statement selects all fields from "Customers" where city is "Berlin" OR "München":

Example

SELECT * FROM Customers


WHERE City='Berlin' OR City='München';
SQL ORDER BY Keyword

The SQL ORDER BY Keyword

The ORDER BY keyword is used to sort the result-set in ascending or descending order.

Page 107 of 19 | P a g e
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending
order, use the DESC keyword.

ORDER BY Syntax

SELECT column1, column2, ...


FROM table_name
ORDER BY column1, column2, ... ASC|DESC;

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

ORDER BY Example

The following SQL statement selects all customers from the "Customers" table, sorted by the "Country"
column:

Example

SELECT * FROM Customers


ORDER BY Country;
SQL UPDATE Statement

The SQL UPDATE Statement

The UPDATE statement is used to modify the existing records in a table.

UPDATE Syntax

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Page 108 of 19 | P a g e
Note: Be careful when updating records in a table! Notice the WHERE clause in the UPDATE statement.
The WHERE clause specifies which record(s) that should be updated. If you omit the WHERE clause, all
records in the table will be updated!

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

Ana Trujillo Avda. de la México


2 Ana Trujillo 05021 Mexico
Emparedados y helados Constitución 2222 D.F.

Antonio Moreno Antonio México


3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.

4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund

UPDATE Table

The following SQL statement updates the first customer (CustomerID = 1) with a new contact person and a
new city.

Example

UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;

The selection from the "Customers" table will now look like this:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Alfred Schmidt Obere Str. 57 Frankfurt 12209 Germany

Ana Trujillo
Avda. de la México
2 Emparedados y Ana Trujillo 05021 Mexico
Constitución 2222 D.F.
helados

3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico


Page 109 of 19 | P a g e
Taquería Moreno D.F.

4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund

UPDATE Multiple Records

It is the WHERE clause that determines how many records that will be updated.
The following SQL statement will update the contactname to "Juan" for all records where country is
"Mexico":

Example

UPDATE Customers
SET ContactName='Juan'
WHERE Country='Mexico';

The selection from the "Customers" table will now look like this:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Alfred Schmidt Obere Str. 57 Frankfurt 12209 Germany

Ana Trujillo
Avda. de la México
2 Emparedados y Juan 05021 Mexico
Constitución 2222 D.F.
helados

Antonio Moreno México


3 Juan Mataderos 2312 05023 Mexico
Taquería D.F.

4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund

Update Warning!

Be careful when updating records. If you omit the WHERE clause, ALL records will be updated!

Example

Page 110 of 19 | P a g e
UPDATE Customers
SET ContactName='Juan';

The selection from the "Customers" table will now look like this:

CustomerID CustomerName ContactNam Address City PostalCode Country


e
1 Alfreds Futterkiste Juan Obere Str. 57 Frankfurt 12209 Germany

2 Ana Trujillo Juan Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Juan Mataderos México 05023 Mexico
Taquería 2312 D.F.
4 Around the Horn Juan 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Juan Berguvsvägen Luleå S-958 22 Sweden
snabbköp 8

The SQL DELETE Statement

The DELETE statement is used to delete existing records in a table.

DELETE Syntax

DELETE FROM table_name WHERE condition;

Note: Be careful when deleting records in a table! Notice the WHERE clause in the DELETE statement.
The WHERE clause specifies which record(s) should be deleted. If you omit the WHERE clause, all
records in the table will be deleted!

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
Page 111 of 19 | P a g e
snabbköp Berglund

SQL DELETE Example

The following SQL statement deletes the customer "Alfreds Futterkiste" from the "Customers" table:

Example

DELETE FROM Customers WHERE CustomerName='Alfreds Futterkiste';

The "Customers" table will now look like this:

CustomerID CustomerName ContactName Address City PostalCod Country


e

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222

3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico


Taquería Moreno D.F.

4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.

5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden


snabbköp Berglund

Delete All Records

It is possible to delete all rows in a table without deleting the table. This means that the table structure,
attributes, and indexes will be intact:

DELETE FROM table_name;

The following SQL statement deletes all rows in the "Customers" table, without deleting the table:

Example

DELETE FROM Customers;


SQL TOP, LIMIT or ROWNUM Clause

The SQL SELECT TOP Clause

The SELECT TOP clause is used to specify the number of records to return.

Page 112 of 19 | P a g e
The SELECT TOP clause is useful on large tables with thousands of records. Returning a large number of
records can impact on performance.
Note: Not all database systems support the SELECT TOP clause. MySQL supports the LIMIT clause to
select a limited number of records, while Oracle uses ROWNUM.

SQL Server / MS Access Syntax:

SELECT TOP number|percent column_name(s)


FROM table_name
WHERE condition;
MySQL Syntax:

SELECT column_name(s)
FROM table_name
WHERE condition
LIMIT number;
Oracle Syntax:

SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number;
Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

SQL TOP, LIMIT and ROWNUM Examples

The following SQL statement selects the first three records from the "Customers" table:

Example

SELECT TOP 3 * FROM Customers;

The following SQL statement shows the equivalent example using the LIMIT clause:

Page 113 of 19 | P a g e
Example

SELECT * FROM Customers


LIMIT 3;

The following SQL statement shows the equivalent example using ROWNUM:

Example

SELECT * FROM Customers


WHERE ROWNUM <= 3;
SQL TOP PERCENT Example

The following SQL statement selects the first 50% of the records from the "Customers" table:

Example

SELECT TOP 50 PERCENT * FROM Customers;


ADD a WHERE CLAUSE

The following SQL statement selects the first three records from the "Customers" table, where the country
is "Germany":

Example

SELECT TOP 3 * FROM Customers


WHERE Country='Germany';

The following SQL statement shows the equivalent example using the LIMIT clause:

Example

SELECT * FROM Customers


WHERE Country='Germany'
LIMIT 3;

The following SQL statement shows the equivalent example using ROWNUM:

Example

SELECT * FROM Customers


WHERE Country='Germany' AND ROWNUM <= 3;

Page 114 of 19 | P a g e
SQL COUNT (), AVG() and SUM() Functions

The SQL COUNT(), AVG() and SUM() Functions

The COUNT() function returns the number of rows that matches a specified criteria.
The AVG() function returns the average value of a numeric column.
The SUM() function returns the total sum of a numeric column.

COUNT() Syntax

SELECT COUNT(column_name)
FROM table_name
WHERE condition;
AVG() Syntax

SELECT AVG(column_name)
FROM table_name
WHERE condition;
SUM() Syntax

SELECT SUM(column_name)
FROM table_name
WHERE condition;
Demo Database

Below is a selection from the "Products" table in the Northwind sample database:

ProductI ProductName SupplierID CategoryID Unit Price


D
1 Chais 1 1 10 boxes x 20 bags 18
2 Chang 1 1 24 - 12 oz bottles 19
3 Aniseed Syrup 1 2 12 - 550 ml bottles 10
4 Chef Anton's Cajun Seasoning 2 2 48 - 6 oz jars 22
5 Chef Anton's Gumbo Mix 2 2 36 boxes 21.35

COUNT() Example

The following SQL statement finds the number of products:

Example

SELECT COUNT(ProductID)
FROM Products;
Note: NULL values are not counted.
Page 115 of 19 | P a g e
AVG() Example

The following SQL statement finds the average price of all products:

Example

SELECT AVG(Price)
FROM Products;

Note: NULL values are ignored.

Demo Database

Below is a selection from the "OrderDetails" table in the Northwind sample database:

OrderDetailID OrderI ProductID Quantity


D
1 10248 11 12
2 10248 42 10
3 10248 72 5
4 10249 14 9
5 10249 51 40

SUM() Example

The following SQL statement finds the sum of the "Quantity" fields in the "OrderDetails" table:

Example

SELECT SUM(Quantity)
FROM OrderDetails;

SQL LIKE Operator

The SQL LIKE Operator

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are two wildcards used in conjunction with the LIKE operator:

 % - The percent sign represents zero, one, or multiple characters


 _ - The underscore represents a single character

Note: MS Access uses a question mark (?) instead of the underscore (_).
The percent sign and the underscore can also be used in combinations!

Page 116 of 19 | P a g e
LIKE Syntax

SELECT column1, column2, ...


FROM table_name
WHERE columnN LIKE pattern;

Tip: You can also combine any number of conditions using AND or OR operators.
Here are some examples showing different LIKE operators with '%' and '_' wildcards:

LIKE Operator Description


WHERE CustomerName LIKE 'a%' Finds any values that start with "a"
WHERE CustomerName LIKE '%a' Finds any values that end with "a"
WHERE CustomerName LIKE '%or Finds any values that have "or" in any position
%'
WHERE CustomerName LIKE '_r%' Finds any values that have "r" in the second position
WHERE CustomerName LIKE 'a_%_ Finds any values that start with "a" and are at least 3 characters in
%' length
WHERE ContactName LIKE 'a%o' Finds any values that start with "a" and ends with "o"
Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

SQL LIKE Examples

The following SQL statement selects all customers with a CustomerName starting with "a":

Example

SELECT * FROM Customers


WHERE CustomerName LIKE 'a%';

The following SQL statement selects all customers with a CustomerName ending with "a":

Example

Page 117 of 19 | P a g e
SELECT * FROM Customers
WHERE CustomerName LIKE '%a';

The following SQL statement selects all customers with a CustomerName that have "or" in any position:

Example

SELECT * FROM Customers


WHERE CustomerName LIKE '%or%';

The following SQL statement selects all customers with a CustomerName that have "r" in the second
position:

Example

SELECT * FROM Customers


WHERE CustomerName LIKE '_r%';

The following SQL statement selects all customers with a CustomerName that starts with "a" and are at
least 3 characters in length:

Example

SELECT * FROM Customers


WHERE CustomerName LIKE 'a_%_%';

The following SQL statement selects all customers with a ContactName that starts with "a" and ends with
"o":

Example

SELECT * FROM Customers


WHERE ContactName LIKE 'a%o';

The following SQL statement selects all customers with a CustomerName that does NOT start with "a":

Example

SELECT * FROM Customers


WHERE CustomerName NOT LIKE 'a%';
Top of Form

Exercise:

Select all records where the value of the City column starts with the letter "a".

SQL Wildcard Characters

Page 118 of 19 | P a g e
A wildcard character is used to substitute any other character(s) in a string.
Wildcard characters are used with the SQL LIKE operator. The LIKE operator is used in a WHERE clause
to search for a specified pattern in a column.
There are two wildcards used in conjunction with the LIKE operator:

 % - The percent sign represents zero, one, or multiple characters


 _ - The underscore represents a single character

Note: MS Access uses a question mark (?) instead of the underscore (_).
In MS Access and SQL Server you can also use:

 [charlist] - Defines sets and ranges of characters to match


 [^charlist] or [!charlist] - Defines sets and ranges of characters NOT to match

The wildcards can also be used in combinations!


Here are some examples showing different LIKE operators with '%' and '_' wildcards:

LIKE Operator Description


WHERE CustomerName LIKE 'a%' Finds any values that starts with "a"
WHERE CustomerName LIKE '%a' Finds any values that ends with "a"
WHERE CustomerName LIKE '%or Finds any values that have "or" in any position
%'
WHERE CustomerName LIKE '_r%' Finds any values that have "r" in the second position
WHERE CustomerName LIKE 'a_ Finds any values that starts with "a" and are at least 3 characters in
%_%' length
WHERE ContactName LIKE 'a%o' Finds any values that starts with "a" and ends with "o"
Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

Using the % Wildcard

The following SQL statement selects all customers with a City starting with "ber":

Example
Page 119 of 19 | P a g e
SELECT * FROM Customers
WHERE City LIKE 'ber%';
The SQL IN Operator

The IN operator allows you to specify multiple values in a WHERE clause.


The IN operator is a shorthand for multiple OR conditions.

IN Syntax

SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);

or:

SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

Ana Trujillo Avda. de la México


2 Ana Trujillo 05021 Mexico
Emparedados y helados Constitución 2222 D.F.

Antonio Moreno Antonio México


3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.

4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund

IN Operator Examples

The following SQL statement selects all customers that are located in "Germany", "France" and "UK":

Example

Page 120 of 19 | P a g e
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');

The following SQL statement selects all customers that are NOT located in "Germany", "France" or "UK":

Example

SELECT * FROM Customers


WHERE Country NOT IN ('Germany', 'France', 'UK');

The following SQL statement selects all customers that are from the same countries as the suppliers:

Example

SELECT * FROM Customers


WHERE Country IN (SELECT Country FROM Suppliers);
SQL BETWEEN Operator

The SQL BETWEEN Operator

The BETWEEN operator selects values within a given range. The values can be numbers, text, or dates.
The BETWEEN operator is inclusive: begin and end values are included.

BETWEEN Syntax

SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;

Demo Database

Below is a selection from the "Products" table in the Northwind sample database:

ProductID ProductName SupplierID CategoryID Unit Price

1 Chais 1 1 10 boxes x 20 bags 18

2 Chang 1 1 24 - 12 oz bottles 19

3 Aniseed Syrup 1 2 12 - 550 ml bottles 10

4 Chef Anton's Cajun Seasoning 1 2 48 - 6 oz jars 22

5 Chef Anton's Gumbo Mix 1 2 36 boxes 21.35

BETWEEN Example
Page 121 of 19 | P a g e
The following SQL statement selects all products with a price BETWEEN 10 and 20:

Example

SELECT * FROM Products


WHERE Price BETWEEN 10 AND 20;

NOT BETWEEN Example

To display the products outside the range of the previous example, use NOT BETWEEN:

Example

SELECT * FROM Products


WHERE Price NOT BETWEEN 10 AND 20;

BETWEEN with IN Example

The following SQL statement selects all products with a price BETWEEN 10 and 20. In addition; do not
show products with a CategoryID of 1,2, or 3:

Example

SELECT * FROM Products


WHERE (Price BETWEEN 10 AND 20)
AND NOT CategoryID IN (1,2,3);

BETWEEN Text Values Example

The following SQL statement selects all products with a ProductName BETWEEN 'Carnarvon Tigers' and
'Mozzarella di Giovanni':

Example

SELECT * FROM Products


WHERE ProductName BETWEEN 'Carnarvon Tigers' AND 'Mozzarella di Giovanni'
ORDER BY ProductName;

NOT BETWEEN Text Values Example

The following SQL statement selects all products with a ProductName NOT BETWEEN 'Carnarvon
Tigers' and 'Mozzarella di Giovanni':

Example

Page 122 of 19 | P a g e
SELECT * FROM Products
WHERE ProductName NOT BETWEEN 'Carnarvon Tigers' AND 'Mozzarella di Giovanni'
ORDER BY ProductName;

Sample Table

Below is a selection from the "Orders" table in the Northwind sample database:

OrderID CustomerID EmployeeID OrderDate ShipperID

10248 90 5 7/4/1996 3

10249 81 6 7/5/1996 1

10250 34 4 7/8/1996 2

10251 84 3 7/9/1996 1

10252 76 4 7/10/1996 2

BETWEEN Dates Example

The following SQL statement selects all orders with an OrderDate BETWEEN '01-July-1996' and '31-July-
1996':

Example

SELECT * FROM Orders


WHERE OrderDate BETWEEN #01/07/1996# AND #31/07/1996#;

OR:

Example

SELECT * FROM Orders


WHERE OrderDate BETWEEN '1996-07-01' AND '1996-07-31';
SQL Aliases

SQL aliases are used to give a table, or a column in a table, a temporary name.
Aliases are often used to make column names more readable.
An alias only exists for the duration of the query.

Alias Column Syntax

SELECT column_name AS alias_name


FROM table_name;

Page 123 of 19 | P a g e
Alias Table Syntax

SELECT column_name(s)
FROM table_name AS alias_name;

Demo Database

In this tutorial we will use the well-known Northwind sample database.


Below is a selection from the "Customers" table:

CustomerID CustomerName ContactName Address City PostalCode Country

Ana Trujillo Avda. de la México


2 Ana Trujillo 05021 Mexico
Emparedados y helados Constitución 2222 D.F.

Antonio Moreno Antonio México


3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.

4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

And a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10354 58 8 1996-11-14 3

10355 4 6 1996-11-15 1

10356 86 6 1996-11-18 2

Alias for Columns Examples

The following SQL statement creates two aliases, one for the CustomerID column and one for the
CustomerName column:

Example

SELECT CustomerID AS ID, CustomerName AS Customer


FROM Customers;

The following SQL statement creates two aliases, one for the CustomerName column and one for the
ContactName column. Note: It requires double quotation marks or square brackets if the alias name
contains spaces:

Example
Page 124 of 19 | P a g e
SELECT CustomerName AS Customer, ContactName AS [Contact Person]
FROM Customers;

The following SQL statement creates an alias named "Address" that combine four columns (Address,
PostalCode, City and Country):

Example

SELECT CustomerName, Address + ', ' + PostalCode + ' ' + City + ', ' + Country AS Address
FROM Customers;

Note: To get the SQL statement above to work in MySQL use the following:

SELECT CustomerName, CONCAT(Address,', ',PostalCode,', ',City,', ',Country) AS Address


FROM Customers;

Alias for Tables Example

The following SQL statement selects all the orders from the customer with CustomerID=4 (Around the
Horn). We use the "Customers" and "Orders" tables, and give them the table aliases of "c" and "o"
respectively (Here we use aliases to make the SQL shorter):

Example

SELECT o.OrderID, o.OrderDate, c.CustomerName


FROM Customers AS c, Orders AS o
WHERE c.CustomerName="Around the Horn" AND c.CustomerID=o.CustomerID;

The following SQL statement is the same as above, but without aliases:

Example

SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerName


FROM Customers, Orders
WHERE Customers.CustomerName="Around the Horn" AND
Customers.CustomerID=Orders.CustomerID;

Aliases can be useful when:

 There are more than one table involved in a query


 Functions are used in the query
 Column names are big or not very readable
 Two or more columns are combined together

Page 125 of 19 | P a g e
SQL - Sub Queries

A Subquery or Inner query or a Nested query is a query within another SQL query and embedded within
the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition to further restrict the
data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the
operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
 Subqueries must be enclosed within parentheses.
 A subquery can have only one column in the SELECT clause, unless multiple columns are in the
main query for the subquery to compare its selected columns.
 An ORDER BY command cannot be used in a subquery, although the main query can use an
ORDER BY. The GROUP BY command can be used to perform the same function as the ORDER
BY in a subquery.
 Subqueries that return more than one row can only be used with multiple value operators such as
the IN operator.
 The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY,
CLOB, or NCLOB.
 A subquery cannot be immediately enclosed in a set function.
 The BETWEEN operator cannot be used with a sub query. However, the BETWEEN operator can
be used within the sub query.

Subqueries with the SELECT Statement

Subqueries are most frequently used with the SELECT statement. The basic syntax is as follows −

SELECT column_name [, column_name ]


FROM table1 [, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE])
Example

Consider the CUSTOMERS table having the following records −

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

Page 126 of 19 | P a g e
Now, let us check the following subquery with a SELECT statement.

SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500) ;

This would produce the following result.

+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
Subqueries with the INSERT Statement

Subqueries also can be used with INSERT statements. The INSERT statement uses the data returned from
the subquery to insert into another table. The selected data in the subquery can be modified with any of the
character, date or number functions.
The basic syntax is as follows.

INSERT INTO table_name [ (column1 [, column2 ]) ]


SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]
Example

Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table. Now to copy the
complete CUSTOMERS table into the CUSTOMERS_BKP table, you can use the following syntax.

SQL> INSERT INTO CUSTOMERS_BKP


SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS) ;
Subqueries with the UPDATE Statement

The subquery can be used in conjunction with the UPDATE statement. Either single or multiple columns in
a table can be updated when using a subquery with the UPDATE statement.
The basic syntax is as follows.

UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)

Page 127 of 19 | P a g e
[ WHERE) ]
Example

Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table. The
following example updates SALARY by 0.25 times in the CUSTOMERS table for all the customers whose
AGE is greater than or equal to 27.

SQL> UPDATE CUSTOMERS


SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27);

This would impact two rows and finally CUSTOMERS table would have the following records.

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 125.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 2125.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Subqueries with the DELETE Statement

The subquery can be used in conjunction with the DELETE statement like with any other statements
mentioned above.
The basic syntax is as follows.

DELETE FROM TABLE_NAME


[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]

Example

Assuming, we have a CUSTOMERS_BKP table available which is a backup of the CUSTOMERS table.
The following example deletes the records from the CUSTOMERS table for all the customers whose AGE
is greater than or equal to 27.

SQL> DELETE FROM CUSTOMERS


WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );

This would impact two rows and finally the CUSTOMERS table would have the following records.
Page 128 of 19 | P a g e
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+

SQL JOIN

A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
Let's look at a selection from the "Orders" table:

OrderID CustomerID OrderDate

10308 2 1996-09-18

10309 37 1996-09-19

10310 77 1996-09-20

Then, look at a selection from the "Customers" table:

CustomerID CustomerName ContactName Country

1 Alfreds Futterkiste Maria Anders Germany

2 Ana Trujillo Emparedados y helados Ana Trujillo Mexico

3 Antonio Moreno Taquería Antonio Moreno Mexico

Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the "Customers"
table. The relationship between the two tables above is the "CustomerID" column.
Then, we can create the following SQL statement (that contains an INNER JOIN), that selects records that
have matching values in both tables:

Example

SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate


FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;

and it will produce something like this:

Page 129 of 19 | P a g e
OrderID CustomerName OrderDate

10308 Ana Trujillo Emparedados y helados 9/18/1996

10365 Antonio Moreno Taquería 11/27/1996

10383 Around the Horn 12/16/1996

10355 Around the Horn 11/15/1996

10278 Berglunds snabbköp 8/12/1996

Different Types of SQL JOINs

Here are the different types of the JOINs in SQL:

 (INNER) JOIN: Returns records that have matching values in both tables
 LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the
right table
 RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the
left table
 FULL (OUTER) JOIN: Return all records when there is a match in either left or right table

SQL INNER JOIN Keyword

SQL INNER JOIN Keyword

The INNER JOIN keyword selects records that have matching values in both tables.

INNER JOIN Syntax

SELECT column_name(s)
FROM table1
INNER JOIN table2 ON table1.column_name = table2.column_name;
Page 130 of 19 | P a g e
Demo Database

In this tutorial we will use the well-known Northwind sample database.


Below is a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

10310 77 8 1996-09-20 2

And a selection from the "Customers" table:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

Ana Trujillo Avda. de la México


2 Ana Trujillo 05021 Mexico
Emparedados y helados Constitución 2222 D.F.

Antonio Moreno Antonio México


3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.

SQL INNER JOIN Example

The following SQL statement selects all orders with customer information:

Example

SELECT Orders.OrderID, Customers.CustomerName


FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

Page 131 of 19 | P a g e
Note: The INNER JOIN keyword selects all rows from both tables as long as there is a match between the
columns. If there are records in the "Orders" table that do not have matches in "Customers", these orders
will not be shown!

JOIN Three Tables

The following SQL statement selects all orders with customer and shipper information:

Example

SELECT Orders.OrderID, Customers.CustomerName, Shippers.ShipperName


FROM ((Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID)
INNER JOIN Shippers ON Orders.ShipperID = Shippers.ShipperID);
SQL LEFT JOIN Keyword

The LEFT JOIN keyword returns all records from the left table (table1), and the matched records from the
right table (table2). The result is NULL from the right side, if there is no match.

LEFT JOIN Syntax

SELECT column_name(s)
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;

Note: In some databases LEFT JOIN is called LEFT OUTER JOIN.

Demo Database

In this tutorial we will use the well-known Northwind sample database.


Below is a selection from the "Customers" table:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico

Page 132 of 19 | P a g e
Emparedados y helados Constitución 2222 D.F.

Antonio Moreno Antonio México


3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.

And a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

10310 77 8 1996-09-20 2

SQL LEFT JOIN Example

The following SQL statement will select all customers, and any orders they might have:

Example

SELECT Customers.CustomerName, Orders.OrderID


FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

Note: The LEFT JOIN keyword returns all records from the left table (Customers), even if there are no
matches in the right table (Orders).

SQL RIGHT JOIN Keyword

SQL RIGHT JOIN Keyword

The RIGHT JOIN keyword returns all records from the right table (table2), and the matched records from
the left table (table1). The result is NULL from the left side, when there is no match.

RIGHT JOIN Syntax

SELECT column_name(s)
FROM table1
RIGHT JOIN table2 ON table1.column_name = table2.column_name;
Note: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.

Page 133 of 19 | P a g e
Demo Database

In this tutorial we will use the well-known Northwind sample database.


Below is a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

10310 77 8 1996-09-20 2

And a selection from the "Employees" table:

EmployeeID LastName FirstName BirthDate Photo

1 Davolio Nancy 12/8/1968 EmpID1.pic

2 Fuller Andrew 2/19/1952 EmpID2.pic

3 Leverling Janet 8/30/1963 EmpID3.pic

SQL RIGHT JOIN Example

The following SQL statement will return all employees, and any orders they might have placed:

Example

SELECT Orders.OrderID, Employees.LastName, Employees.FirstName


FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;
SQL FULL OUTER JOIN Keyword

Page 134 of 19 | P a g e
SQL FULL OUTER JOIN Keyword

The FULL OUTER JOIN keyword return all records when there is a match in either left (table1) or right
(table2) table records.
Note: FULL OUTER JOIN can potentially return very large result-sets!

FULL OUTER JOIN Syntax

SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;

Demo Database

In this tutorial we will use the well-known Northwind sample database.


Below is a selection from the "Customers" table:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

Ana Trujillo
Avda. de la México
2 Emparedados y Ana Trujillo 05021 Mexico
Constitución 2222 D.F.
helados

Antonio Moreno Antonio México


3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.

And a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

Page 135 of 19 | P a g e
10310 77 8 1996-09-20 2

SQL FULL OUTER JOIN Example

The following SQL statement selects all customers, and all orders:

SELECT Customers.CustomerName, Orders.OrderID


FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;

A selection from the result set may look like this:

CustomerName OrderID

Alfreds Futterkiste

Ana Trujillo Emparedados y helados 10308

Antonio Moreno Taquería 10365

10382

10351

Note: The FULL OUTER JOIN keyword returns all the rows from the left table (Customers), and all the
rows from the right table (Orders). If there are rows in "Customers" that do not have matches in "Orders",
or if there are rows in "Orders" that do not have matches in "Customers", those rows will be listed as well.

The SQL UNION Operator

The UNION operator is used to combine the result-set of two or more SELECT statements.

Each SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
UNION Syntax

SELECT column_name(s) FROM table1


UNION
SELECT column_name(s) FROM table2;
UNION ALL Syntax

The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL:
Page 136 of 19 | P a g e
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;

Note: The column names in the result-set are usually equal to the column names in the first SELECT
statement in the UNION.

Demo Database

In this tutorial we will use the well-known Northwind sample database.


Below is a selection from the "Customers" table:

CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

Ana Trujillo Avda. de la México


2 Ana Trujillo 05021 Mexico
Emparedados y helados Constitución 2222 D.F.

Antonio Moreno Antonio México


3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.

And a selection from the "Suppliers" table:

SupplierID SupplierName ContactName Address City PostalCode Country

Charlotte
1 Exotic Liquid 49 Gilbert St. London EC1 4SD UK
Cooper

New Orleans Cajun P.O. Box New


2 Shelley Burke 70117 USA
Delights 78934 Orleans

Grandma Kelly's 707 Oxford


3 Regina Murphy Ann Arbor 48104 USA
Homestead Rd.

SQL UNION Example

The following SQL statement returns the cities (only distinct values) from both the "Customers" and the
"Suppliers" table:

Example

Page 137 of 19 | P a g e
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;

Note: If some customers or suppliers have the same city, each city will only be listed once, because
UNION selects only distinct values. Use UNION ALL to also select duplicate values!

SQL UNION ALL Example

The following SQL statement returns the cities (duplicate values also) from both the "Customers" and the
"Suppliers" table:

Example

SELECT City FROM Customers


UNION ALL
SELECT City FROM Suppliers
ORDER BY City;

SQL UNION With WHERE

The following SQL statement returns the German cities (only distinct values) from both the "Customers"
and the "Suppliers" table:

Example

SELECT City, Country FROM Customers


WHERE Country='Germany'
UNION
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;

SQL UNION ALL With WHERE

The following SQL statement returns the German cities (duplicate values also) from both the "Customers"
and the "Suppliers" table:

Example

SELECT City, Country FROM Customers


WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
Page 138 of 19 | P a g e
WHERE Country='Germany'
ORDER BY City;

Another UNION Example

The following SQL statement lists all customers and suppliers:

Example

SELECT 'Customer' As Type, ContactName, City, Country


FROM Customers
UNION
SELECT 'Supplier', ContactName, City, Country
FROM Suppliers;
What is a Stored Procedure?

A stored procedure is a prepared SQL code that you can save, so the code can be reused over and over
again.
So if you have an SQL query that you write over and over again, save it as a stored procedure, and then just
call it to execute it.
You can also pass parameters to a stored procedure, so that the stored procedure can act based on the
parameter value(s) that is passed.

Stored Procedure Syntax

CREATE PROCEDURE procedure_name


AS
sql_statement
GO;
Execute a Stored Procedure

EXEC procedure_name;

Demo Database

Below is a selection from the "Customers" table in the Northwind sample database:

CustomerID CustomerName ContactName Address City PostalCode Country


1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Mataderos 2312 México 05023 Mexico
Taquería Moreno D.F.
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
Page 139 of 19 | P a g e
5 Berglunds Christina Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund

Stored Procedure Example

The following SQL statement creates a stored procedure named "SelectAllCustomers" that selects all
records from the "Customers" table:

Example

CREATE PROCEDURE SelectAllCustomers


AS
SELECT * FROM Customers
GO;

Execute the stored procedure above as follows:

Example

EXEC SelectAllCustomers;
Stored Procedure with One Parameter

The following SQL statement creates a stored procedure that selects Customers from a particular City from
the "Customers" table:

Example

CREATE PROCEDURE SelectAllCustomers @City nvarchar(30)


AS
SELECT * FROM Customers WHERE City = @City
GO;

Execute the stored procedure above as follows:

Example

EXEC SelectAllCustomers City = "London";

Stored Procedure With Multiple Parameters

Setting up multiple parameters is very easy. Just list each parameter and the data type separated by a
comma as shown below.
The following SQL statement creates a stored procedure that selects Customers from a particular City with
a particular PostalCode from the "Customers" table:

Example

Page 140 of 19 | P a g e
CREATE PROCEDURE SelectAllCustomers @City nvarchar(30), @PostalCode nvarchar(10)
AS
SELECT * FROM Customers WHERE City = @City AND PostalCode = @PostalCode
GO;

Execute the stored procedure above as follows:

Example

A database view is a virtual table or logical table which is defined as a SQL SELECT query with joins.
Because a database view is similar to a database table, which consists of rows and columns, so you can
query data against it. Most database management systems, including MySQL, allow you to update data in
the underlying tables through the database view with some prerequisites.

A database view is dynamic because it is not related to the physical schema. The database system stores
views as a SQL SELECT statement with joins. When the data of the tables changes, the view reflects that
changes as well.

The SQL MIN () and MAX () Functions


The MIN () function returns the smallest value of the selected column.

The MAX () function returns the largest value of the selected column.

MIN() Syntax
SELECT MIN(column_name)
FROM table_name
WHERE condition;

MAX() Syntax
SELECT MAX(column_name)
FROM table_name
WHERE condition;

The SQL COUNT(), AVG() and SUM() Functions


The COUNT() function returns the number of rows that matches a specified criteria.

The AVG() function returns the average value of a numeric column.

The SUM() function returns the total sum of a numeric column.

Page 141 of 19 | P a g e
COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;

AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;

SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;

DATABASE VIEW

A database view is a virtual table or logical table which is defined as a SQL SELECT query with joins. Because a
database view is similar to a database table, which consists of rows and columns, so you can query data against it.
Most database management systems, including MySQL, allow you to update data in the underlying tables through
the database view with some prerequisites.

Advantages of database view


The following are advantages of using database views.

 A database view allows you to simplify complex queries: Through a database view, you only have
to use simple SQL statements instead of complex ones with many joins.
 A database view helps limit data access to specific users. You can use a database view to expose
only non-sensitive data to a specific group of users.
 A database view provides extra security layer. The database view offers additional protection for a
database management system. The database view allows you to create the read-only view to expose
read-only data to specific users. Users can only retrieve data in read-only view but cannot update it.
 A database view enables computed columns. A database table should not have calculated columns
however a database view should. When you query data from the database view, the data of the
computed column is calculated on the fly.
 A database view enables backward compatibility. Suppose you have a central database, which many
applications are using it. One day, you decide to redesign the database to adapt to the new business
requirements. You remove some tables and create new tables, and you don’t want the changes to
affect other applications. In this scenario, you can create database views with the same schema as
the legacy tables that you will remove.

Disadvantages of database view


Besides the advantages above, there are several disadvantages of using database views:
Page 142 of 19 | P a g e
 Performance: querying data from a database view can be slow especially if the view is created
based on other views.
 Tables dependency: you create a view based on underlying tables of the database. Whenever you
change the structure of these tables that view associated with, you have to change the view as well.

CREATING VIEW

CREATE VIEW view_name AS


SELECT column1, column2, ...
FROM table_name
WHERE condition;

Let's use a simple example to illustrate. Say we have the following table:

Table Customer

Column Name Data Type


First_Name char(50)
Last_Name char(50)
Address char(50)
City char(50)
Country char(25)
Birth_Date datetime

We want to create a view called V_Customer that contains only the First_Name, Last_Name, and Country
columns from this table, we would type in,

CREATE VIEW V_Customer


AS SELECT First_Name, Last_Name, Country
FROM Customer;

EXAMPLE 2:
SQL > CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;

Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual table. Following is an
example for the same.

SQL > SELECT * FROM CUSTOMERS_VIEW;

This would produce the following result.

+----------+-----+
| name | age |
+----------+-----+
| Ramesh | 32 |
| Khilan | 25 |
Page 143 of 19 | P a g e
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+

8.0: Function of database management system

8.1 Transaction processing

A transaction is a program including a collection of database operations, executed as a logical unit of


data processing. The operations performed in a transaction include one or more of database operations
like insert, delete, update or retrieve data.
It is an atomic process that is either performed into completion entirely or is not performed at all. A
transaction involving only data retrieval without any data update is called read-only transaction.
Each high level operation can be divided into a number of low level tasks or operations. For example, a
data update operation can be divided into three tasks −
 read_item() − reads data item from storage to main memory.
 modify_item() − change value of item in the main memory.
 write_item() − write the modified value from main memory to storage.
Database access is restricted to read_item() and write_item() operations. Likewise, for all transactions,
read and write forms the basic database operations.
Transaction Operations

The low level operations performed in a transaction are −


 begin_transaction − A marker that specifies start of transaction execution.
 read_item or write_item − Database operations that may be interleaved with main memory
operations as a part of transaction.
 end_transaction − A marker that specifies end of transaction.
 commit − A signal to specify that the transaction has been successfully completed in its entirety
and will not be undone.
 rollback − A signal to specify that the transaction has been unsuccessful and so all temporary
changes in the database are undone. A committed transaction cannot be rolled back.

States of Transactions

A transaction in a database can be in one of the following states −

Page 144 of 19 | P a g e
 Active − In this state, the transaction is being executed. This is the initial state of every transaction.
 Partially Committed − When a transaction executes its final operation, it is said to be in a partially
committed state.
 Failed − A transaction is said to be in a failed state if any of the checks made by the database
recovery system fails. A failed transaction can no longer proceed further.
 Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery
manager rolls back all its write operations on the database to bring the database back to its original
state where it was prior to the execution of the transaction. Transactions in this state are called
aborted.
The database recovery module can select one of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
 Committed − If a transaction executes all its operations successfully, it is said to be committed. All
its effects are now permanently established on the database system.

ACID Properties in DBMS


A transaction is a single logical unit of work which accesses and possibly modifies the contents of
a database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after transaction, certain properties
are followed. These are called ACID properties.
Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as one
unit and either runs to completion or is not executed at all. It involves following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.

Page 145 of 19 | P a g e
ACID Properties
A transaction is a very small unit of a program and it may contain several low level
tasks. A transaction in a database system must
maintain Atomicity, Consistency, Isolation, and Durability − commonly known as
ACID properties − in order to ensure accuracy, completeness, and data integrity.

 Atomicity − this property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either
before the execution of the transaction or after the execution/abortion/failure of the
transaction.
 Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
 Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be
updated once the system springs back into action.
 Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.

8.2 Concurrency controls

Concurrency control is the activity of co- ordinating concurrent accesses to a data- base in a
multiuser database management system (DBMS). Concurrency control permits users to access
a database in a multi- programmed fashion while preserving the illusion that each user is executing alone
on a dedicated system.
In a database management system (DBMS), concurrency control manages simultaneous access to a
database.
It prevents two users from editing the same record at the same time and also serializes transactions for
backup and recovery.

Page 146 of 19 | P a g e
Advantages of concurrency

The good is to serve many users and provides better throughput by sharing resources.

 Reduced waiting time response time or turnaround time.


 Increased throughput or resource utilization
 If we run only one transaction at a time than the acid property is sufficient but it is possible that when
multiple transactions are executed concurrently than database may become inconsistent.
 Overlapping with the input-output activity with CPU also makes the response time better.
 But interleaving of instruction between transaction may also lead to many problems due to which
concurrency control is required.

We have concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions.
Concurrency control protocols can be broadly divided into two categories −

 Lock based protocols


 Time stamp based protocols
Lock-based Protocols

Database systems equipped with lock-based protocols use a mechanism by which any transaction cannot
read or write data until it acquires an appropriate lock on it. Locks are of two kinds −
 Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
 Shared/exclusive − this type of locking mechanism differentiates the locks based on their uses. If a
lock is acquired on a data item to perform a write operation, it is an exclusive lock. Allowing more
than one transaction to write on the same data item would lead the database into an inconsistent
state. Read locks are shared because no data value is being changed.
There are four types of lock protocols available −
Simplistic Lock Protocol

Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write'
operation is performed. Trasactions may unlock the data item after completing the ‘write’ operation.
Pre-claiming Lock Protocol

Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks.
Before initiating an execution, the transaction requests the system for all the locks it need to execute to
completion. If all the locks are granted, the transaction executes and releases all the locks when all its
operations are over. If all the locks are not granted, the transaction rolls back and waits until all the locks
are granted.

Page 147 of 19 | P a g e
Two-Phase Locking 2PL

This locking protocol divides the execution phase of a transaction into three parts.
In the first part, when the transaction starts executing, it seeks permission for the locks it requires.
The second part is where the transaction acquires all the locks. As soon as the transaction releases its first
lock, the third phase starts.
In third phase, the transaction cannot demand any new locks; it only releases the acquired locks.

Two-phase locking has two phases, one is growing, where all the locks are being acquired by the
transaction; and the second phase is shrinking, where the locks held by the transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it
to an exclusive lock.
Strict Two-Phase Locking

The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the transaction
continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock after using it.
Strict-2PL holds all the locks until the commit point and releases all the locks at a time.

Page 148 of 19 | P a g e
Strict-2PL does not have cascading abort as 2PL does.
Timestamp-based Protocols

The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses either
system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the time of
execution, whereas timestamp-based protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age of the
transaction. A transaction created at 0002 clock time would be older than all other transactions that come
after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and the
priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system know when
the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol

The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and
write operations. This is the responsibility of the protocol system that the conflicting pair of tasks should
be executed according to the timestamp values of the transactions.

 The timestamp of transaction Ti is denoted as TS(Ti).


 Read time-stamp of data-item X is denoted by R-timestamp(X).
 Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows −
 If a transaction Ti issues a read(X) operation −
o If TS(Ti) < W-timestamp(X)
 Operation rejected.
o If TS(Ti) >= W-timestamp(X)
 Operation executed.
o All data-item timestamps updated.
Page 149 of 19 | P a g e
 If a transaction Ti issues a write(X) operation −
o If TS(Ti) < R-timestamp(X)
 Operation rejected.
o If TS(Ti) < W-timestamp(X)
Operation rejected and Ti rolled back.

o Otherwise, operation executed.
Thomas' Write Rule

This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and T i is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.

8.3 Database recovery

Data recovery is the process of restoring data that has been lost, accidentally deleted, corrupted or made
inaccessible. In enterprise IT, data recovery typically refers to the restoration of data to a desktop, laptop,
server or external storage system from a backup.

Database failure

Classification of failure:

To see wherever the matter has occurred, we tend to generalize a failure into numerous classes, as follows:

 Transaction failure
 System crash
 Disk failure

Page 150 of 19 | P a g e
Types of Failure

1. Transaction failure: A transaction needs to abort once it fails to execute or once it reaches to any
further extent from wherever it can’t go to any extent further. This is often known as transaction
failure wherever solely many transactions or processes are hurt. The reasons for transaction failure
are:

 Logical errors
 System errors

1. Logical errors: Where a transaction cannot complete as a result of its code error or an internal error
condition.
2. System errors: Wherever the information system itself terminates an energetic transaction as a
result of the DBMS isn’t able to execute it, or it’s to prevent due to some system condition. to
Illustrate, just in case of situation or resource inconvenience, the system aborts an active
transaction.
3. System crash: There are issues − external to the system − that will cause the system to prevent
abruptly and cause the system to crash. For instance, interruptions in power supply might cause the
failure of underlying hardware or software package failure. Examples might include OS errors.
4. Disk Failure-In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability to the
disk, disk head crash or any other failure, which destroys all or a part of disk storage.

Storage Structure
In brief, the storage structure can be divided into two categories −

 Volatile storage − As the name suggests, a volatile storage cannot survive system crashes. Volatile
storage devices are placed very close to the CPU; normally they are embedded onto the chipset
itself. For example, main memory and cache memory are examples of volatile storage. They are fast
but can store only a small amount of information.
Page 151 of 19 | P a g e
 Non-volatile storage − These memories are made to survive system crashes. They are huge in data
storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic tapes,
flash memory, and non-volatile (battery backed up) RAM.

Recovery Techniques:

1. Salvation program: Run after a crash to attempt to restore the system to a valid state. No recovery
data used. Used when all other techniques fail or were not used. Good for cases where buffers were
lost in a crash and one wants to reconstruct what was lost...
2. Incremental dumping: Modified files copied to archive after job completed or at intervals.
3. Audit trail: Sequences of actions on files are recorded. Optimal for "backing out" of transactions.
(Ideal if trail is written out before changes).
4. Differential files: Separate file is maintained to keep track of changes, periodically merged with the
main file.
5. Backup/current version: Present files form the current version of the database. Files containing
previous values form a consistent backup version.
6. Multiple copies: Multiple active copies of each file are maintained during normal operation of the
database. In cases of failure, comparison between the versions can be used to find a consistent
version.
7. Careful replacement: Nothing is updated in place, with the original only being deleted after
operation is complete.

Database security threats

 Virus Infection- Depending on the type of virus, it could have the ability to steal, corrupt, modify
and even delete the complete database.
 Natural Disasters- Natural disasters like earthquake or tsunami have the ability to destroy the entire
infrastructure. In such an event, there is absolutely no way to even find, let alone recover, the data.
 Disgruntled Employees- A disgruntled employee could provide essential and confidential
information to outsiders, causing untold damage to an organization. And if the employee has access
or gains unauthorized access to systems or applications, he/she can inject a virus or delete data to
halt the company’s day to day operations.
 Excessive privileges. When workers are granted default database privileges that exceed the
requirements of their job functions, these privileges can be abused, Gerhart said. “For example, a
bank employee whose job requires the ability to change only account holder contact information
may take advantage of excessive database privileges and increase the account balance of a
colleague’s savings account.” Further, some companies fail to update access privileges for
employees who change roles within an organization or leave altogether.
 Legitimate privilege abuse. Users may abuse legitimate database privileges for unauthorized
purposes, Gerhart said.
 Database injection attacks. The two major types of database injection attacks are SQL injections
that target traditional database systems and NoSQL injections that target “big data” platforms.
Malware. A perennial threat, malware is used to steal sensitive data via legitimate users using
infected devices.

Page 152 of 19 | P a g e
 Exploitation of vulnerable databases. It generally takes organizations months to patch databases,
during which time they remain vulnerable. Attackers know how to exploit unpatched databases or
databases that still have default accounts and configuration parameters.
 *Unmanaged sensitive data. Many companies struggle to maintain an accurate inventory of their
databases and the critical data objects contained within them. “Forgotten databases may contain
sensitive information, and new databases can emerge without visibility to the security team.
Sensitive data in these databases will be exposed to threats if the required controls and permissions
are not implemented,” he said.
 The human factor. The root cause for 30 percent of data breach incidents is human negligence,
according to the Ponemon Institute Cost of Data Breach Study. “Often this is due to the lack of
expertise required to implement security controls, enforce policies or conduct incident response
processes.
 Privilege abuse: When database users are provided with privileges that exceeds their day-to-day
job requirement, these privileges may be abused intentionally or unintentionally.
 Operating System vulnerabilities: Vulnerabilities in underlying operating systems like Windows,
UNIX, Linux, etc., and the services that are related to the databases could lead to unauthorized
access. This may lead to a Denial of Service (DoS) attack. This could be prevented by updating the
operating system related security patches as and when they become available.
 Weak authentication: Weak authentication models allow attackers to employ strategies such as
social engineering and brute force to obtain database login credentials and assume the identity of
legitimate database users.
 Weak audit trails: A weak audit logging mechanism in a database server represents a critical risk
to an organization especially in retail, financial, healthcare, and other industries with stringent
regulatory compliance. Regulations such as PCI, SOX, and HIPAA demand extensive logging of
actions to reproduce an event at a later point of time in case of an incident. Logging of sensitive or
unusual transactions happening in a database must be done in an automated manner for resolving
incidents. Audit trails act as the last line of database defense. Audit trails can detect the existence of
a violation that could help trace back the violation to a particular point of time and a particular user.

9.0 Emerging trends in database management system

9.1 Emerging trends in database management system

Concepts in database management hardly fall in the category of come-and-go, as the cost of shifting
between technical approaches overwhelms producers, managers, and designers. However, there are several
trends in database management, and knowing how to take advantage of them will benefit your
organization. Following are the some of the current trends:
1. Databases that bridge SQL/NoSQL- The latest trends in database products are those that don’t simply
embrace a single database structure. Instead, the databases bridge SQL and NoSQL, giving users the
best capabilities offered by both. This includes products that allow users to access a NoSQL database in
the same way as a relational database, for example.
2. Databases in the cloud/Platform as a Service- As developers continue pushing their enterprises to the
cloud, organizations are carefully weighing the trade-offs associated with public versus private.
Developers are also determining how to combine cloud services with existing applications and
Page 153 of 19 | P a g e
infrastructure. Providers of cloud service offer many options to database administrators. Making the
move towards the cloud doesn’t mean changing organizational priorities, but finding products and
services that help your group meet its goals.
3. Automated management- Automating database management is another emerging trend. The set of
such techniques and tools intend to simplify maintenance, patching, provisioning, updates and upgrades
— even project workflow. However, the trend may have limited usefulness since database management
frequently needs human intervention.
4. An increased focus on security- While not exactly a trend given the constant focus on data security,
recent ongoing retail database breaches among US-based organizations show with ample clarity the
importance for database administrators to work hand-in-hand with their IT security colleagues to ensure
all enterprise data remains safe. Any organization that stores data is vulnerable.
Database administrators must also work with the security team to eliminate potential internal
weaknesses that could make data vulnerable. These could include issues related to network privileges,
even hardware or software misconfigurations that could be misused, resulting in data leaks.
5. In-memory databases- Within the data warehousing community there are similar questions about
columnar versus row-based relational tables; the rise of in-memory databases, the use of flash or solid-
state disks (which also applies within transaction processing), clustered versus no-clustered solutions
and so on.
6. Big Data- To be clear, big data does not necessarily mean lots of data. What it really refers to is the
ability to process any type of data: what is typically referred to as semi-structured and unstructured data
as well as structured data. Current thinking is that these will typically live alongside conventional
solutions as separate technologies, at least in large organizations, but this will not always be the case.
7. Decentralized data management- Although there are benefits to decentralized data management, it
presents challenges as well. How will the data be distributed? What’s the best decentralization method?
What’s the proper degree of decentralization? A major challenge in designing and managing a
distributed database results from the inherent lack of centralized knowledge of the entire database.

9.2 Coping with emerging trends in database management system

1. Data Security Problems- For organizations to stay ahead of third-party threats, companies must start
by assessing their own security strategy. They should enact a multi-layered defense strategy that
covers their entire enterprise — all endpoints, all mobile devices, all applications and all data.
Following this assessment, companies should evaluate the technology, compliance procedures and
security standards that their partner network has in place.
2. Managing the Data Overload- Enterprises need to think about data traffic patterns in their
organizations, Vincent said, and recognize when the traffic no longer flows through a central point
(whether public cloud or private cloud) and ready their corporate networks for a whole new traffic
flow as part of their digital transformation. To help de-saturate the enterprise of data, enterprises
need to think about storage and moving inactive data from active enterprise applications to data
warehouses, or the cloud. Generally, any workload that can process entities as a single object can be
a candidate for object storage. This includes archival and retrieval of database backups or storage of

Page 154 of 19 | P a g e
unstructured data, such as images, video and text documents, Ray Johnson, chief data scientist at
Chicago-based consultancy SPR said.
3. Insufficient understanding and acceptance of big data- Oftentimes, companies fail to know even the
basics: what big data actually is, what its benefits are, what infrastructure is needed, etc. Without a
clear understanding, a big data adoption project risks to be doomed to failure. Companies may
waste lots of time and resources on things they don’t even know how to use.
4. And if employees don’t understand big data’s benefits and/or don’t want to change the existing
processes for the sake of its adoption, they can resist it and impede the company’s progress.
5. Big data, being a huge change for a company, should be accepted by top management first and then
down the ladder. To ensure big data understanding and acceptance at all levels, IT departments need
to organize numerous trainings and workshops. To see to big data acceptance even more, the
implementation and use of the new big data solution need to be monitored and controlled. However,
top management should not overdo with control because it may have an adverse effect.

10.0 References
1. Database System Concept by Abraham Silberschatz and S Sudarshan.
2. Principles of Database Systems by J D Ullman.

Page 155 of 19 | P a g e

You might also like