Professional Documents
Culture Documents
Database Management Systems - MTVC
Database Management Systems - MTVC
ICT DEPARTMENT
Page 1 of 19 | P a g e
COURSE SUMMARY AND TIME ALLOCATION
Table of Contents
1.0 Introduction to database management...............................................................................................................3
1.2 Meaning of DBMS................................................................................................................................................3
1.2 Historical evolution of DBMS...............................................................................................................................6
1.3 Traditional vs. database approaches...................................................................................................................9
1.4 Components of a database management systems............................................................................................12
1.5 Classification of database Systems....................................................................................................................13
1.6 Advantages of DBMS.........................................................................................................................................16
1.7 Role of key players in database design and development.................................................................................17
2.0: Database organization..........................................................................................................................................21
2.1 centralized database.........................................................................................................................................21
2.2 Client - Server Architecture...............................................................................................................................22
2.3 Distributed Database Systems...........................................................................................................................24
3.0 Principles and techniques of database design........................................................................................................28
3.1 Meaning............................................................................................................................................................28
Page 3 of 19 | P a g e
3.2 Database design cycle.......................................................................................................................................28
4.0 Relational database system....................................................................................................................................30
4.1 Meaning of relational database system............................................................................................................30
4.2 Relational Database Characteristics..................................................................................................................30
4.3 Relational algebra..............................................................................................................................................36
4.4 Relational Calculus............................................................................................................................................44
5.0 Entity Relationships..............................................................................................................................................50
5.1 Meaning of Entity Relationships........................................................................................................................50
5.2 Connotations of entity Relationship..................................................................................................................54
5.3 Drawing ERDs....................................................................................................................................................55
6.0 Normalization........................................................................................................................................................55
6.1 Meaning and importance of normalization.......................................................................................................55
6.2 Normalization Rule............................................................................................................................................57
6.3 Performing Normalization.................................................................................................................................65
7.0 Querying a database..............................................................................................................................................69
7.1 Meaning of database query...............................................................................................................................69
7.2 Features of database query...............................................................................................................................69
7.3 Categories of SQL statements...........................................................................................................................70
7.4 Design SQL queries............................................................................................................................................73
8.0: Function of database management system.........................................................................................................124
8.1 Transaction processing....................................................................................................................................124
8.2 Concurrency controls......................................................................................................................................125
8.3 Database recovery...........................................................................................................................................129
9.0 Emerging trends in database management system...............................................................................................132
9.1 Emerging trends in database management system.........................................................................................132
9.2 Coping with emerging trends in database management system.....................................................................133
10.0 References.........................................................................................................................................................133
Page 4 of 19 | P a g e
1.0 Introduction to database management
A database is a collection of information that is organized so that it can be easily accessed, updated and
managed.
DBMS
A DBMS is software that allows creation and manipulation of database, allowing users to store, process
and analyze data easily.
DBMS provides us with an interface or a tool, to perform various operations like creating database, storing
data in it, updating data, creating tables in the database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data consistency in case of
multiple users.
MySQL Database- MySQL was found in the year of 1995. Sun Microsystems acquired MySQL in 2008
and Sun Microsystems was acquired by oracle in 2010.
MySQL comes among the largest open source company of the world. MySQL is so famous due its high
efficiency, reliability and cost.
MS- Access- MS- Access was developed by Microsoft and it is a computer based application that is used to
create and maintain computer based database on desktop computers. This can be used for personal use and
for small business that needs a database
Oracle Database- Oracle database is developed by Oracle Corporation and it is the fourth generation
of Relational database management system. Oracle database is used mostly by big companies that need to
manage a large amount of data. Oracle database is very flexible and it most useful features are integrity
constrains, triggers, shared SQL, and Locking.
DB2- DB2 database is developed by IBM Corporation. DB2 is also used to store data for large companies.
It is an relational database management system and its extended version also supports object –oriented
features. The main problem with DB2 is its cost.
Page 5 of 19 | P a g e
Microsoft SQL Server- As its name shows, it was developed by Microsoft. It is an RDBMS that is used to
create computer database for MS- Windows. MS SQL Server create database that can be accessed from
workstations and with internet. Microsoft has produced many versions of SQL server depending upon the
customer demands.
File Maker- It was developed by Filemaker inc. and it is a cross-platform rdbms widely used by many
companies. It has a database engine with graphical user interface. It can be used for both windows and mac.
It gives many security features that allows user to alter database by simply dragging new element into
forms, screens and layouts.
NoSQL- It stands for not only SQL. It is different from other database management system as it is a non-
relational database management system. It is used in distributed data stores like in google and facebook that
collects terabits of data every day. It is used to store huge amount of data of social media sites that SQl
Servers can never do.
Postgresql- Postgresql a cross plat ORDBMS that runs on different operating systems like linuz, windows
and solaris etc. It is developed by PostgreSQL development group. This is and open source database that is
free to use under free software license.
MS Fox Pro-Fox pro is a DBMS initially developed by Fox software then later by Microsoft corporation.
Fox pro is the combination of both dbms and rdbms. Fox pro supports multiple relationship between DBF
Files but it lacks transactional processing.
Page 6 of 19 | P a g e
5. Query Language: DBMS provides users with a simple Query language, using which data can be easily
fetched, inserted, deleted and updated in a database.
6. Security: The DBMS also takes care of the security of data, protecting the data from un-authorized
access. In a typical DBMS, we can create user accounts with different access permissions, using which
we can easily secure our data by restricting user access.
7. DBMS supports transactions, which allows us to better handle and manage data integrity in real world
applications where multi-threading is extensively used.
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire database.
It defines how the data is organized and how the relations among them are associated. It formulates all the
constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive detail of
the database, which can be depicted by means of schema diagrams. It’s the database designers who design
the schema to help programmers understand the database and make it useful.
Page 7 of 19 | P a g e
A database schema can be divided broadly into two categories −
Physical Database Schema − this schema pertains to the actual storage of data and its
form of storage like files, indices, etc. It defines how the data will be stored in a secondary
storage.
Logical Database Schema − this schema defines all the logical constraints that need to be
applied on the data stored. It defines tables, views, and integrity constraints.
1. Physical Level
2. Conceptual Level
3. External Level
1. Physical Level
Physical level describes the physical storage structure of data in database.
It is also known as Internal Level.
This level is very close to physical storage of data.
At lowest level, it is stored in the form of bits with the physical addresses on the secondary storage device.
At highest level, it can be viewed in the form of files.
The internal schema defines the various stored data types. It uses a physical data model.
2. Conceptual Level
Conceptual level describes the structure of the whole database for a group of users.
It is also called as the data model.
Conceptual schema is a representation of the entire content of the database.
These schema contains all the information to build relevant external records.
It hides the internal details of physical storage.
3. External Level
External level is related to the data which is viewed by individual end users.
This level includes a no. of user views or external schemas.
This level is closest to the user.
External view describes the segment of the database that is required for a particular user group and hides the
rest of the database from that user group.
DATA INDEPENDENCE
It is the property of the database which tries to ensure that if we make any change in any level of
schema of the database, the schema immediately above it would require minimal or no need of change. It
removes the need for additional amount of work needed in adopting the single change into all the levels
above.
1. Physical Data Independence: This means that for any change made in the physical schema, the need
to change the logical schema is minimal. This is practically easier to achieve.
2. Logical Data Independence: This means that for any change made in the logical schema, the need to
change the external schema is minimal; this is a little difficult to achieve.
Page 9 of 19 | P a g e
1.2 Historical evolution of DBMS
The development of database technology can be divided into three eras based on data model or structure:
navigational, SQL/relational, and post-relational.
The two main early navigational data models were the hierarchical model, epitomized by IBM's IMS
system, and the CODASYL model (network model), implemented in a number of products such as IDMS.
The relational model, first proposed in 1970 departed from this tradition by insisting that applications
should search for data by content, rather than by following links.
The relational model employs sets of ledger-style tables, each used for a different type of entity. Only in
the mid-1980s did computing hardware become powerful enough to allow the wide deployment of
relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated
in all large-scale data processing applications, and as of 2014 they remain dominant except in niche areas.
The dominant database language, standardized SQL for the relational model, has influenced database
languages for other data models.
Object databases were developed in the 1980s to overcome the inconvenience of object-relational
impedance mismatch, which led to the coining of the term "post-relational" and also the development of
hybrid object-relational databases.
The next generation of post-relational databases in the late 2000s became known as NoSQL databases,
introducing fast key-value stores and document-oriented databases. A competing "next generation" known
as NewSQL databases attempted new implementations that retained the relational/SQL model while aiming
to match the high performance of NoSQL compared to commercially available relational DBMSs.
The next generation of post-relational databases in the 2000s became known as NoSQL databases,
including fast key-value stores and document-oriented databases
NoSQL databases are often very fast, do not require fixed table schemas, avoid join operations by storing
denormalized data, and are designed to scale horizontally. The most popular NoSQL systems include
MongoDB, Couchbase, Riak, memcached, Redis, CouchDB, Hazelcast, Apache Cassandra and HBase,
which are all open-source software products.
Page 10 of 19 | P a g e
File based system
File-based system is a collection of application programs that perform services for the end-users, such as
updating, insertion, deletion adding new files to database etc. Each program defines and manages its data.
Files are stored in specific locations on the hard disk (directories). The user can create new files to place
data in, delete a file that contains data, rename the file, etc. which is known as file management; a function
provided by the Operating System (OS).
Although a computer file-based processing system has many advantages over manual record keeping
system, but it has some limitations. The basic disadvantages (or limitations) of computer file-based
processing system are described below.
Data Redundancy -Redundancy means having multiple copies of the same data. In computer file-based
processing system, each application program has its own data files. The same data may be duplicated in
more than one file. The duplication of data may create many problems such as:
To update a specific data/record, the same data must be updated in all files; otherwise different file may
have different information about a specific item.
Data Inconsistency - Data inconsistency mean that different files may contain different information of a
particular object in a database. Actually redundancy leads to inconsistency. When the same data is stored in
multiple locations, the inconsistency may occur.
Data Isolation - In computer file-based system, data is isolated in separate files. It is difficult to update and
to access particular information from data files.
Data Atomicity- Data atomicity means data or record is either entered as a whole or it is not entered at all.
Data Dependence - The data stored in file depends upon the application program through which the file
was created. It means that the structure of data files is coupled with application program. The physical
structure of data files and records are defined in the application program code. It is difficult to change the
structure of data files or records. If you want to change the structure of data file (or format of file), then you
have to modify the application program.
Program Maintenance- In computer file-based processing system, the structure of data file is coupled with
the individual application programs. Therefore, any modification to a data file such as size of a data field,
its type etc. requires the modification of the application program also. This process of modifying the
program is referred to as program maintenance.
Data Sharing- In computer file-based processing systems, each application program uses its own private
data files. The computer file-based processing systems do not provide the facility to share data of a data file
among multiple users on the network.
Page 11 of 19 | P a g e
Data Security- The computer file-based processing system do not provide the proper security system
against illegal access of data. Anyone can easily change or delete valuable data stored in the data file. It is
the most complicated problem of file-processing system.
Incompatible File Format- In computer file-based processing systems, the structure of data file is coupled
with the application program and the structure of data file is dependent on the programming languages in
which the application program was developed
The improvement of the File-Based System (FBS) was the Database Management System (DBMS) which
came up in the 60's.
The Database Management System removed the trouble of manually locating data, and having to go
through it. The user could create a suitable structure for the data beforehand, to place the information in the
database that the DBMS is managing. Hence, the physical organizing of files is done away with and
provides the user with a logical view of the data input.
A database is a collection of interrelated information stored in a database server; these data will be stored
in the form of tables. The primary aim of database is to provide a way to store and retrieve database
information fast and in an efficient manner.
Advantages
1. Control of data redundancy- Although the database approach does not remove redundancy
completely, it controls the amount of redundancy in the database.
2. Data consistency- By removing or controlling redundancy, the database approach reduces the risk
of inconsistencies occurring. It ensures all copies of the idea are kept consistent.
3. Sharing of data- Database belongs to the entire organization and can be shared by all authorized
users.
4. Improved data integrity- Database integrity provides the validity and consistency of stored data.
Integrity is usually expressed in terms of constraints, which are consistency rules that the database
is not permitted to violate.
5. Improved security- Provides protection of data from unauthorized users. It will require user names
and passwords to identify user type and their access right in the operation including retrieval,
insertion, updating and deletion.
6. Enforcement of standards- The integration of the database enforces the necessary standards
including data formats, naming conventions, documentation standards, update procedures and
access rules.
7. Economy of scale- Cost savings can be obtained by combining all organization's operational data
into one database with applications to work on one source of data.
8. Balance of conflicting requirements- By having a structural design in the database, the conflicts
between users or departments can be resolved. Decisions will be based on the base use of resources
for the organization as a whole rather than for an individual person.
Page 12 of 19 | P a g e
9. Improved data accessibility and responsiveness- by having integration in the database approach,
data accessing can cross departmental boundaries. This feature provides more functionality and
better services to the users.
10. Improved maintenance- Provides data independence. As a change of data structure in the database
will not affect the application program, it simplifies database application maintenance.
11. Increased concurrency- Database can manage concurrent data access effectively. It ensures no
interference between users that would not result any loss of information or loss of integrity.
12. Improved backing and recovery services- Modern database management system provides facilities
to minimize the amount of processing that can be lost following a failure by using the transaction
approach.
There are five major components in the database system environment and their interrelationship is.
1. Hardware
2. Software
3. Data
4. Users
5. Procedures
1. Hardware: The hardware is the actual computer system used for keeping and accessing the database.
Conventional DBMS hardware consists of secondary storage devices, usually hard disks, on which the
database physically resides, together with the associated Input-Output devices, device controllers and· so
forth. Databases run on a' range of machines, from Microcomputers to large mainframes. Other hardware
issues for a DBMS includes database machines, which is hardware designed specifically to support a
database system.
2. Software: The software is the actual DBMS. Between the physical database itself (i.e. the data as
actually stored) and the users of the system is a layer of software, usually called the Database Management
System or DBMS. All requests from users for access to the database are handled by the DBMS. One
general function provided by the DBMS is thus the shielding of database users from complex hardware-
level detail.
Page 13 of 19 | P a g e
The DBMS allows the users to communicate with the database. In a sense, it is the mediator between the
database and the users. The DBMS controls the access and helps to maintain the consistency of the data.
Utilities are usually included as part of the DBMS. Some of the most common utilities are report writers
and application development.
3. Data: It is the most important component of DBMS environment from the end users point of view. As
shown in observes that data acts as a bridge between the machine components and the user components.
The database contains the operational data and the meta-data, the 8'data about data'.
The database should contain all the data needed by the organization. One of the major features of databases
is that the actual data are separated from the programs that use the data. A database should always be
designed, built and populated for a particular audience and for a specific purpose.
4. Users: access or retrieve data on demand using the applications and interfaces provided by the DBMS.
Each type of user needs different software capabilities. The users of a database system can be classified in
the following groups, depending on their degrees of expertise or the mode of their interactions with the
DBMS. The users can be:
5. Procedures: Procedures refer to the instructions and rules that govern the design and use of the
database. The users of the system and the staff that manage the database require documented procedures on
how to use or run the system.
Change the structure of a table, reorganize the database across multiple disks, improve performance, or
archive data to secondary storage
Relational database – This is the most popular data model used in industries. It is based on the SQL. They
are table oriented which means data is stored in different access control tables, each has the key field whose
task is to identify each row. The tables or the files with the data are called as relations that help in
Page 14 of 19 | P a g e
designating the row or record, and columns are referred to attributes or fields. Few examples are MYSQL
(Oracle, open source), Oracle database (Oracle), Microsoft SQL server(Microsoft) and DB2(IBM).
Object oriented database – The information here is in the form of the object as used in object oriented
programming. It adds the database functionality to object programming languages. It requires less code, use
more natural data and also code bases are easy to maintain. Examples are ObjectDB (ObjectDB software).
Object relational database – Relational DBMS are evolving continuously and they have been
incorporating many concepts developed in object database leading to a new class called extended relational
database or object relational database.
Hierarchical database – In this, the information about the groups of parent or child relationships is
present in the records which is similar to the structure of a tree. Here the data follows a series of records,
set of values attached to it. They are used in industry on mainframe platforms. Examples are IMS(IBM),
Windows registry(Microsoft).
Network database – Mainly used on large digital computers. If there are more connections, then this
database is efficient. They are similar to hierarchical database, they look like a cobweb or interconnected
network of records. Examples are CA-IDMS(COMPUTER associates), IMAGE(HP).
Single user – As the name itself indicates it can support only one user at a time. It is mostly used with the
personal computer on which the data resides accessible to a single person. The user may design, maintain
and write the database programs.
Multiple users – It supports multiple users concurrently. Data can be both integrated and shared,a database
should be integrated when the same information is not need be recorded in two places. For example a
student in the college should have the database containing his information. It must be accessible to all the
departments related to him. For example the library department and the fee section department should have
information about student’s database. So in such case, we can integrate and even though database resides in
only one place both the departments will have the access to it.
Page 15 of 19 | P a g e
Based on the sites over which network is distributed
Centralized database system – The DBMS and database are stored at the single site that is used by
several other systems too. We can simply say that data here is maintained on the centralized server.
Parallel network database system – This system has the advantage of improving processing input and
output speeds. Majorly used in the applications that have query to larger database. It holds the multiple
central processing units and data storage disks in parallel.
Distributed database system – In this data and the DBMS software are distributed over several sites but
connected to the single computer.
Online transaction processing (OLTP) DBMS – They manage the operational data. Database server must
be able to process lots of simple transactions per unit of time. Transactions are initiated in real time, in
simultaneous by lots of user and applications hence it must have high volume of short, simple queries.
Page 16 of 19 | P a g e
Online analytical processing (OLAP) DBMS – They use the operational data for tactical and strategical
decision making. They have limited users deal with huge amount of data and complex queries.
Big data and analytics DBMS – To cope with big data new database technologies have been introduced.
One such is NoSQL (not only SQL) which abandons the well-known relational database scheme.
Multimedia DBMS – Stores data such as text, images, audio, video and 3D games which are usually
stored in binary large object
1. Control of data redundancy- Although the database approach does not remove redundancy
completely, it controls the amount of redundancy in the database.
2. Data consistency- By removing or controlling redundancy, the database approach reduces the risk
of inconsistencies occurring. It ensures all copies of the idea are kept consistent.
3. Sharing of data- Database belongs to the entire organization and can be shared by all authorized
users.
4. Improved data integrity- Database integrity provides the validity and consistency of stored data.
Integrity is usually expressed in terms of constraints, which are consistency rules that the database
is not permitted to violate.
5. Improved security- Provides protection of data from unauthorized users. It will require user names
and passwords to identify user type and their access right in the operation including retrieval,
insertion, updating and deletion.
6. Enforcement of standards- The integration of the database enforces the necessary standards
including data formats, naming conventions, documentation standards, update procedures and
access rules.
7. Economy of scale- Cost savings can be obtained by combining all organization's operational data
into one database with applications to work on one source of data.
8. Balance of conflicting requirements- By having a structural design in the database, the conflicts
between users or departments can be resolved. Decisions will be based on the base use of resources
for the organization as a whole rather than for an individual person.
9. Improved data accessibility and responsiveness- by having integration in the database approach,
data accessing can cross departmental boundaries. This feature provides more functionality and
better services to the users.
10. Improved maintenance- Provides data independence. As a change of data structure in the database
will affect the application program, it simplifies database application maintenance.
11. Increased concurrency- Database can manage concurrent data access effectively. It ensures no
interference between users that would not result any loss of information or loss of integrity.
Disadvantages of DBMS
1. Complexity: The provision of the functionality that is expected of a good DBMS makes the DBMS an
extremely complex system.
Page 17 of 19 | P a g e
2. Size: The complexity and breadth of functionality makes the DBMS an extremely large piece of
software, occupying many megabytes of disk space and requiring substantial amounts of memory to run
efficiently.
3. Performance: Since DBMS is written to be more general i.e. to cater for many applications rather than
just one. The effect is that some applications may not run as fast as they do in file based system.
4. Higher impact of a failure: The centralization of resources increases the vulnerability of the system.
Since all users and applications rely on the availabi1ity of the DBMS, the failure of any component can
bring operations to a halt.
5. Cost of DBMS: The cost of DBMS varies significantly, depending on the environment and functionality
provided. There is also the recurrent annual maintenance cost.
6. Cost of Conversion: In some situations, the cost of the DBMS and extra hardware may be insignificant
compared with the cost of converting existing applications to run on the new DBMS and hardware. This
cost also includes the cost of training staff to use these new systems and possibly the employment of
specialist staff to help with conversion and running of the system. This cost is one of the main reasons why
some organizations feel tied to their current systems and cannot switch to modern database technology.
DATABASE ADMINISTRATOR
DATABASE DESIGNER
The database designer role defines the tables, indexes, views, constraints, triggers, stored procedures, table
spaces or storage parameters, and other database-specific constructs needed to store, retrieve, and delete
persistent objects
DATABASE ANALYST
Database Analyst: Maintains data storage and access by designing physical databases.
Confirms project requirements by studying user requirements; conferring with others on project team.
Maintains data dictionary by revising and entering definitions.
Maintains client confidence and protects operations by keeping information confidential.
Maintains technical knowledge by attending educational workshops; reviewing publications;
establishing personal networks; participating in technical societies.
Ensures operation of equipment by completing preventive maintenance requirements; following
manufacturer's instructions; troubleshooting malfunctions; calling for repairs; evaluating new equipment
and techniques.
Contributes to team effort by accomplishing related results as needed.
Determines changes in physical database by studying project requirements; identifying database
characteristics, such as location, amount of space, and access method.
Changes database system by coding database descriptions.
Protects database by developing access system; specifying user level of access.
Maintains user reference by writing and rewriting database descriptions.
DATABASE DEVELOPER
Page 19 of 19 | P a g e
Design, develop and implement database systems based on customer requirements.
Prepare design specifications and functional documentations for assigned database projects.
Identify any issues related to database performance and provide corrective measures.
Create complex functions, scripts, stored procedures and triggers to support application development.
End Users
Naive Users: Naive Users are those users who need not be aware of the presence of the database
system or any other system supporting their usage. Naive users are end users of the database who
work through a menu driven application program, where the type and range of response is always
indicated to the user.
A user of an Automatic Teller Machine (ATM) falls in this category. The user is instructed through
each step of a transaction. He or she then responds by pressing a coded key or entering a numeric
value. The operations that can be performed by valve users are very limited and affect only a
precise portion of the database. For example, in the case of the user of the Automatic Teller
Machine, user's action affects only one or more of his/her own accounts.
Online Users: Online users are those who may communicate with the database directly via an
online terminal or indirectly via a user interface and application program. These users are aware of
the presence of the database system and may have acquired a certain amount of expertise with in the
limited interaction permitted with a database.
Page 20 of 19 | P a g e
Sophisticated Users: Such users interact with the system without, writing programs. Instead, they
form their requests in database query language. Each such query is submitted to a very processor
whose function is to breakdown DML statement into instructions that the storage manager
understands.
Specialized Users: Such users are those, who write specialized database application that do not fit
into the fractional data-processing framework. For example: Computer-aided design systems,
knowledge base and expert system, systems that store data with complex data types (for example,
graphics data and audio data).
Application Programmers: Professional programmers are those who are responsible for developing
application programs or user interface. The application programs could be written using general
purpose programming language or the commands available to manipulate a database.
Sophisticated Users - They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to request the
database. They directly interact with the database by means of query language like SQL. These
users will be scientists, engineers, analysts who thoroughly study SQL and DBMS to apply the
concepts in their requirement. In short, we can say this category includes designers and developers
of DBMS and SQL.
Stand-alone Users - These users will have stand –alone database for their personal use. These kinds
of database will have readymade database packages which will have menus and graphical
interfaces.
Naïve or parametric Users - these are the users who use the existing application to interact with the
database. For example, online library system, ticket booking systems, ATMs etc which has existing
application and users use them to interact with the database to fulfill their requests.
Page 21 of 19 | P a g e
2.0: Database organization
Definition: Data organization, in broad terms, refers to the method of classifying and organizing data sets
to make them more useful.
Physical arrangement of database management system to make the system more useful in an organization,
the organization of database is mainly based on the size of the organization and also the level of security.
A centralized database (sometimes abbreviated CDB) is a database that is located, stored, and
maintained in a single location. This location is most often a central computer or database system, for
example a desktop or server CPU, or a mainframe computer. In most cases, a centralized database would
be used by an organization (e.g. a business company) or an institution (e.g. a university.) Users access a
centralized database through a computer network which is able to give them access to the central CPU,
which in turn maintains to the database itself.
Page 22 of 19 | P a g e
Advantages
1. Data integrity is maximized and data redundancy is minimized as the single storing place of all the
data also implies that a given set of data only has one primary record. This aids in the maintaining
of data as accurate and as consistent as possible and enhances data reliability.
2. Generally bigger data security, as the single data storage location implies only a one possible place
from which the database can be attacked and sets of data can be stolen or tampered with.
3. Better data preservation than other types of databases due to often-included fault-tolerant setup.
4. Easier for using by the end-user due to the simplicity of having a single database design.
5. Generally easier data portability and database administration.
6. More cost effective than other types of database systems as labor, power supply and maintenance
costs are all minimized.
7. Data kept in the same location is easier to be changed, re-organized, mirrored, or analyzed.
8. All the information can be accessed at the same time from the same location.
9. Updates to any given set of data are immediately received by every end-user.
Disadvantages
1. Centralized databases are highly dependent on network connectivity. The slower the internet
connection is, the more the database access time needed will be.
2. Bottlenecks can occur as a result of high traffic.
3. Limited access by more than one person to the same set of data as there is only one copy of it and it
is maintained in a single location. This can lead to major decreases in the general efficiency of the
system.
4. If there is no fault-tolerant setup and hardware failure occurs, all the data within the database will
be lost.
5. Since there is minimal to no data redundancy, if a set of data is unexpectedly lost it is very hard to
retrieve it back, in most cases it would have to be done manually.
The data processing is split into distinct parts. A part is either requester (client) or provider (server). The
client sends during the data processing one or more requests to the servers to perform specified tasks. The
server part provides services for the clients.
Page 23 of 19 | P a g e
Advantages and Disadvantages
Centralization: Unlike P2P, where there is no central administration, here in this architecture there
is a centralized control. Servers help in administering the whole set-up. Access rights and resource
allocation is done by Servers.
Proper Management : All the files are stored at the same place. In this way, management of files
becomes easy. Also it becomes easier to find files.
Back-up and Recovery possible: As all the data is stored on server its easy to make a back-up of
it. Also, in case of some break-down if data is lost, it can be recovered easily and efficiently. While
in peer computing we have to take back-up at every workstation.
Scalability in Client-server set-up: Changes can be made easily by just upgrading the server. Also
new resources and systems can be added by making necessary changes in server.
Accessibility: From various platforms in the network, server can be accessed remotely.
As new information is uploaded in database, each workstation need not have its own storage
capacities increased (as may be the case in peer-to-peer systems). All the changes are made only in
central computer on which server database exists.
Security: Rules defining security and access rights can be defined at the time of set-up of server.
Servers can play different roles for different clients.
Congestion in Network: Too many requests from the clients may lead to congestion, which rarely
takes place in P2P network. Overload can lead to breaking-down of servers. In peer-to-peer, the
total bandwidth of the network increases as the number of peers increase.
Client-Server architecture is not as robust as a P2P and if the server fails, the whole network
goes down. Also, if you are downloading a file from server and it gets abandoned due to some
error, download stops altogether. However, if there would have been peers, they would have
provided the broken parts of file.
Page 24 of 19 | P a g e
Cost: It is very expensive to install and manage this type of computing.
You need professional IT people to maintain the servers and other technical details of network.
Disadvantages:
In two tier architecture application performance will be degrade upon increasing the users.
Cost-ineffective.
Tightly coupled.
Not easy to scale.
Degrades performance when scale.
3-tier client/server architecture - is a type of software architecture which is composed of three “tiers” or
“layers” of logical computing. They are often used in applications as a specific type of client-server
system. 3-tier architectures provides many benefits for production and development environments by
modularizing the user interface, business logic, and data storage layers. Doing so gives greater flexibility to
development teams by allowing them to update a specific part of an application independently of the other
parts. This added flexibility can improve overall time-to-market and decrease development cycle times by
giving development teams the ability to replace or upgrade independent tiers without affecting the other
parts of the system..
Presentation Tier- The presentation tier is the front end layer in the 3-tier system and consists of
the user interface. This user interface is often a graphical one accessible through a web browser or
web-based application and which displays content and information useful to an end user. This tier is
Page 25 of 19 | P a g e
often built on web technologies such as HTML5, JavaScript, CSS, or through other popular web
development frameworks, and communicates with others layers through API calls.
Application Tier- The application tier contains the functional business logic which drives an
application’s core capabilities. It’s often written in Java, .NET, C#, Python, C++, etc.
Data Tier- The data tier comprises of the database/data storage system and data access layer.
Examples of such systems are MySQL, Oracle, PostgreSQL, Microsoft SQL Server, MongoDB,
etc. Data is accessed by the application layer via API calls.
Advantages
A distributed database is a collection of multiple interconnected databases, which are spread physically
across various locations that communicate via a computer network.
Eigenschaften
Databases in the collection are logically interrelated with each other. Often they represent a single
logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.
A distributed database is a database that consists of two or more files located in different sites either on the
same network or on entirely different networks. Portions of the database are stored in multiple physical
locations and processing is distributed among multiple database nodes. A distributed database management
system (DDBMS) is a centralized software system that manages a distributed database in a manner as if it
were all stored in a single location.
Page 26 of 19 | P a g e
Features
In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −
1. The sites use very similar software.
2. The sites use identical DBMS or DBMS from the same vendor.
3. Each site is aware of all other sites and cooperates with other sites to process user requests.
4. The database is accessed through a single interface as if it is a single database.
Types of Homogeneous Distributed Database
Page 27 of 19 | P a g e
Autonomous − each database is independent and functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.
In a heterogeneous distributed database, different sites have different operating systems, DBMS products
and data models. Its properties are −
The system may be composed of a variety of database models like relational, network, hierarchical or
object oriented.
A site may not be aware of other sites and so there is limited co-operation in processing user requests.
Federated − the heterogeneous database systems are independent in nature and integrated together so that
they function as a single database system.
Un-federated − the database systems employ a central coordinating module through which the databases
are accessed.
Page 28 of 19 | P a g e
Following are the advantages of distributed databases over centralized databases.
Modular Development/scalability − If the system needs to be expanded to new locations or new units,
in centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt.
However, in distributed systems, when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from local
data itself, thus providing faster response.
Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not feasible
in centralized systems.
3.1 Meaning
Database design is the organization of data according to a database model. The designer determines what
data must be stored and how the data elements interrelate. With this information, they can begin to fit the
data to the Database model. Database design involves classifying data and identifying interrelationships
There are six main objectives which must be fulfilled effectively by a good database.
1. Usability
2. Extensibility
3. Data Integrity
4. Performance
5. Availability
6. Security
Page 29 of 19 | P a g e
Let’s discuss a little in detail.
Usability
Any information which we are storing in any organization should be meaningful for that organization. If
we are storing those factors which are actually not fit with organization’s requirement then this is just waste
of resources.
Primary objective of any information system should be to meet organization requirements. Following are
few points to consider while going to start an architecture.
Extensibility/Scalability
As we know that everyday new business requirements come up and every day there is a need to change or
enhance information system to capture new requirements. So information design should be extensible so
that it can adopt new requirements without many efforts or without major breaking changes.
If your initial design is too much complex or unorganized then it may create trouble for you to adopt new
things effectively.
Data Integrity
Now at this point we understand that information is very much important for any organization. Based on
the historic information, every organization makes different strategies, decisions for growth. One small
mistake in data can lead to major issues with any organization’s key decision and hence a big risk for
growth.
When we are designing a good information system then we must keep in mind about integrity, correctness
of data. Our system should be smart enough to handle incorrect, missing data attributes and based on that it
should either take corrective actions or straightaway reject the data. Incorrect data should not be present in
system or at least should not exposed to individuals creating misunderstanding.
Entity Integrity
Involves the structure (primary key and its attributes) of the entity. If the primary key is unique and all
attributes are scalar and fully dependent on the primary key, then the integrity of the entity is good. In the
physical schema, the table’s primary key enforces entity integrity.
Domain Integrity
It defines that data should be of correct type and we should handle optional data in correct way. We should
apply Nullability to those attributes which are optional for organization. We can define proper data types
for different attributes based on organization’s requirement so that correct format data should present in
system.
Referential Integrity
This defines if any entity is dependent on another one then parent entity should be there in the system and
should be uniquely identifiable. We can do this by implementing foreign keys.
Transactional Integrity
This defines that transaction should have its ACID properties. Any transaction should be atomic,
consistent, durable and isolated. The quality of a database product is measured by its transactions’
adherence to the ACID properties:
Page 31 of 19 | P a g e
Atomic — all or nothing
Consistent — the database begins and ends the transaction in a consistent state
There are few business rules which we cannot validate just by primary keys, foreign keys etc. There has to
be some mechanism so that we can validate complex rules for integrity. We can implement these rules in
following ways:
Check Constraints
Triggers & Stored Procedures
Queries to identify incorrect data and handle in correct way.
Performance
As we know that information should be readily available as requested. Performance of the system should
be up to the mark. As data in increasing day by day so at some time there will be impact on performance if
database design is poor or we’ll not take any actions to improve performance.
Following could be few strategies which we can implement when there is need as data increases.
Availability
The availability of information refers to the information’s accessibility when required regarding uptime,
locations, and the availability of the data for future analysis. Disaster recovery, redundancy, archiving, and
network delivery all affect availability.
Page 32 of 19 | P a g e
Security
For any organizational asset, the level of security must be secured depending on its value and sensitivity.
Sometime organizations has suffered a lot because of data leaks which results in loss of faith and tends to
business risk. So security is one of the most important aspect of good database design.
Based on above principles, one should start designing databases and architectures.
Page 33 of 19 | P a g e
The Database Life Cycle
Page 34 of 19 | P a g e
The Database Life Cycle: The Database Initial Study
Overall Purpose of the Initial Study:
Page 35 of 19 | P a g e
The Database Life Cycle: Define the Objective
What is the proposed system's initial objective?
Will the system interfere with other existing or future systems in the company?
Will the system share the data with other systems or users?
Budget
Hardware and software
Extent of organizational change required
Page 36 of 19 | P a g e
The Database Life Cycle: Conceptual Design
Information needs.
Information users.
Information sources.
Information constitution.
Page 37 of 19 | P a g e
The Database Life Cycle: Data analysis and requirements
Sources of information for the designer
The designer must identify the company's business rules and analyze their impacts.
Page 38 of 19 | P a g e
The Database Life Cycle: Entity Relationship Modeling and Normalization
Page 39 of 19 | P a g e
Even if the entire system can't be brought on line quickly, implementation of one or more modules will
demonstrate that progress is being made and that at least part of the system is ready to begin serving the
end users.
Ensure the module's cohesivity -- the strength of the relationships found among the module's entities.
Analyze each module's relationships with other modules to address module coupling -- the extent to which
modules are independent of one another.
All identified processes must be verified against the E-R model. If necessary, appropriate changes are implemented.
Existing systems: if the organization already has a DBMS it may be wise to use it.
Cost -- Purchase, maintenance, operational, license, installation, training, and conversion costs.
DBMS features and tools.
Page 40 of 19 | P a g e
The Database Life Cycle: Logical Design
Logical design translates the conceptual design into the internal model for a selected DBMS.
It includes mapping of all objects in the model to the specific constructs used by the selected database software.
For a relational DBMS, the logical design includes the design of tables, obvious indexes, views, transactions, access
authorities, and so on.
Physical design is particularly important in the older hierarchical and network models and in very large databases.
Relational databases are more insulated from physical layer details than hierarchical and network models.
Physical security
Access rights and security methods (e.g. Passwords, smartcards, biometrics)
Audit trails
Data encryption
Client/Server, thin clients, web enabled databases
Backup and Recovery
Integrity
Company standards
Concurrency controls
Page 41 of 19 | P a g e
applications during the coding of the programs.
Options to enhance the system if the implementation fails.
Page 42 of 19 | P a g e
A Special Note about Database Design Strategies
Two Classical Approaches to Database Design:
Top-down design starts by identifying the data sets, and then defines the data elements for each of these sets.
Bottom-up design first identifies the data elements (items), and then groups them together in data sets.
Page 43 of 19 | P a g e
Centralized vs Decentralized Design: Two Different Database Design Philosophies:
Centralized design
It is productive when the data component is composed of a relatively small number of objects and procedures.
A Relational database management system (RDBMS) is a database management system (DBMS) that is
based on the relational model as introduced by E. F. Codd.
Page 44 of 19 | P a g e
The data in an RDBMS is stored in database objects which are called as tables. This table is basically a
collection of related data entries and it consists of numerous columns and rows.
Data in the relational database must be represented in tables, with values in columns within rows.
Data within a column must be accessible by specifying the table name, the column name, and the value
of the primary key of the row.
The DBMS must support missing and inapplicable information in a systematic way, distinct from
regular values and independent of data type.
The DBMS must support an active on-line catalogue.
The DBMS must support at least one language that can be used independently and from within
programs, and supports data definition operations, data manipulation, constraints, and transaction
management.
Views must be updatable by the system.
The DBMS must support insert, update, and delete operations on sets.
The DBMS must support logical data independence.
The DBMS must support physical data independence.
Integrity constraints must be stored within the catalogue, separate from the application.
The DBMS must support distribution independence. The existing application should run when the
existing data is redistributed or when the DBMS is redistributed.
If the DBMS provides a low level interface (row at a time), that interface cannot bypass the integrity
constraints.
The relational data model was introduced by C. F. Codd in 1970. Currently, it is the most widely used data
model.
The relational data model describes the database as “a collection of inter-related relations (or tables).”
A relation, also known as a table or file, is a subset of the Cartesian product of a list of domains
characterized by a name. And within a table, each row represents a group of related data values. A row, or
record, is also known as a tuple.
The columns in a table is a field and is also referred to as an attribute. You can also think of it this way: an
attribute is used to define the record and a record contains a set of attributes.
Column
A database stores pieces of information or facts in an organized way. Understanding how to use and get the
most out of databases requires us to understand that method of organization.
When deciding which fields to create, you need to think generically about your information, for example,
drawing out the common components of the information that you will store in the database and avoiding
the specifics that distinguish one item from another.
Domain
A domain is the original sets of atomic values used to model data. By atomic value, we mean that each
value in the domain is indivisible as far as the relational model is concerned. For example:
The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
The domain of Salary is the set of all floating-point numbers greater than 0 and less than
200,000.
The domain of First Name is the set of character strings that represents names of people.
In summary, a domain is a set of acceptable values that a column is allowed to contain. This is based on
various properties and the data type for the column. We will discuss data types in another chapter.
Records
Just as the content of any one document or item needs to be broken down into its constituent bits of data for
storage in the fields, the link between them also needs to be available so that they can be reconstituted into
their whole form. Records allow us to do this. Records contain fields that are related, such as a customer or
an employee. As noted earlier, a tuple is another term used for record.
Records and fields form the basis of all databases. A simple table gives us the clearest picture of how
records and fields work together in a database storage project.
The simple table example in Figure 7.3 shows us how fields can hold a range of different sorts of data. This
one has:
Page 46 of 19 | P a g e
A Record ID field: this is an ordinal number; its data type is an integer.
A PubDate field: this is displayed as day/month/year; its data type is date.
An Author field: this is displayed as Initial. Surname; its data type is text.
A Title field text: free text can be entered here.
You can command the database to sift through its data and organize it in a particular way. For example,
you can request that a selection of records be limited by date: 1. all before a given date, 2. all after a given
date or 3. all between two given dates. Similarly, you can choose to have records sorted by date. Because
the field, or record, containing the data is set up as a Date field, the database reads the information in the
Date field not just as numbers separated by slashes, but rather, as dates that must be ordered according to a
calendar system.
Degree
Properties of a Table
A table has a name that is distinct from all other tables in the database.
There are no duplicate rows; each row is distinct.
Entries in columns are atomic. The table does not contain repeating groups or multivalued attributes.
Entries from columns are from the same domain based on their data type including:
Number (numeric, integer, float, smallint,…)
character (string)
date
logical (true or false)
Operations combining different data types are disallowed.
Each attribute has a distinct name.
The sequence of columns is insignificant.
The sequence of rows is insignificant.
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire database. It
defines how the data is organized and how the relations among them are associated. It formulates all the
constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive detail of
the database, which can be depicted by means of schema diagrams. It’s the database designers who design
the schema to help programmers understand the database and make it useful.
Page 47 of 19 | P a g e
Physical Database Schema − this schema pertains to the actual storage of data and its form of
storage like files, indices, etc. It defines how the data will be stored in a secondary storage.
The physical database schema gives the blueprint for how each piece of data is stored in the
database.
Logical Database Schema − this schema defines all the logical constraints that need to be applied
on the data stored. It defines tables, views, and integrity constraints.
The logical schema gives structure to the tables and relationships inside of the database. Generally
speaking, the logical schema is created before the physical schema.
Constraints are useful because they allow a designer to specify the semantics of data in the
database. Constraints are the rules that force DBMSs to check that data satisfies the semantics.
Domain Integrity
A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a
database system, the domain integrity is defined by:
Data Type - Basic data types are integer, decimal, or character. Most data bases support variants of
these plus special data types for date and time.
Length - This is the number of digits or characters in the value. For example, a value of 5 digits or 40
characters.
Date Format - The format for date values such as dd/mm/yy or mm/dd/yyyy or yy/mm/dd.
Range - The range specifies the lower and upper boundaries of the values the attribute may legally
have.
Constraints - Are special restrictions on allowable values. For example, the LeavingDate for an
Employee must always be greater than the HireDate for that Employee.
Null support - Indicates whether the attribute can have null values.
Default value (if any) - The value an attribute instance will have if a value is not entered.
There are several kinds of integrity constraints, described below.
Entity integrity
Entity Integrity ensures that there are no duplicate records within the table and that the field that identifies
each record within the table is unique and never null.
The existence of the Primary Key is the core of the entity integrity. If you define a primary key for each
entity, they follow the entity integrity rule.
Entity integrity specifies that the Primary Keys on every instance of an entity must be kept, must be unique
and must have values other than NULL.
Although most relational databases do not specifically dictate that a table needs to have a Primary Key, it is
good practice to design a Primary Key for each table in the relational model. This mandates
no NULL content, so that every row in a table must have a value that denotes the row as a unique element
of the entity.
Page 48 of 19 | P a g e
Entity Integrity is the mechanism the system provides to maintain primary keys. The primary key serves as
a unique identifier for rows in the table. Entity Integrity ensures two properties for primary keys:
The primary key for a row is unique; it does not match the primary key of any other row in the table.
The primary key is not null, no component of the primary key may be set to null.
The uniqueness property ensures that the primary key of each row uniquely identifies it; there are no
duplicates. The second property ensures that the primary key has meaning, has a value; no component of
the key is missing.
The system enforces Entity Integrity by not allowing operations (INSERT, UPDATE) to produce an
invalid primary key. Any operation that creates a duplicate primary key or one containing nulls is rejected.
Referential integrity
Referential integrity requires that a foreign key must have a matching primary key or it must be null. This
constraint is specified between two tables (parent and child); it maintains the correspondence between rows
in these tables. It means the reference from a row in one table to another table must be valid.
Cascade actions:
Cascade: a cascade action propagates the delete or update operation on the parent key to each dependent
child key.
On delete cascade action: when a parent row is deleted, each row in the child table that was associated with
the deleted parent row is also deleted.
On delete restrict action: rejects the delete or update operation for the parent table if there is a related
foreign key value in the child table.
On delete set null: if record in the parent table is deleted, then the corresponding records in the child table
will be set to null (mysql). In sql the records will be deleted.
Relational database systems are expected to be equipped with a query language that can assist its users to
query the database instances.
1. Relational algebra
2. Relational calculus
Relational Algebra
Page 49 of 19 | P a g e
Relational algebra is a procedural query language, which takes instances of relations as input and yields
instances of relations as output. It uses operators to perform queries.
An operator can be either unary or binary. They accept relations as their input and yield relations as their
output. Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.
1. Select
2. Project
3. Union
4. Set different
5. Cartesian product
6. Rename
Select Operation σ
Page 50 of 19 | P a g e
Page 51 of 19 | P a g e
Page 52 of 19 | P a g e
Page 53 of 19 | P a g e
Join
Join- is a combination of a Cartesian product followed by a selection process. A Join operation pairs
two tuples from different relations, if and only if a given join condition is satisfied. We will briefly
describe various join types in the following sections.
Theta θ Join
Theta join combines tuples from different relations provided they satisfy the theta condition. The
join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes A1, A2, . . , An and B1, B2, . . , Bn such that the attributes
don’t have anything in common, that is R1 ∩ R2 = Φ. Theta join can use all kinds of comparison
operators.
Example
Page 54 of 19 | P a g e
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above example
corresponds to equijoin.
Page 55 of 19 | P a g e
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a Cartesian product
does. We can perform a Natural Join only if there is at least one common attribute that exists between two
relations. In addition, the attributes must have the same name and domain. Natural join acts on those
matching attributes where the values of attributes in both the relations are same.
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those tuples with
matching attributes and the rest are discarded in the resulting relation. Therefore, we need to use outer joins
to include all the tuples from the participating relations in the resulting relation. There are three kinds of
outer joins − left outer join, right outer join, and full outer join.
Page 56 of 19 | P a g e
Page 57 of 19 | P a g e
Page 58 of 19 | P a g e
4.4 Relational Calculus
Relational calculus
Relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus,
that are part of the relational model for databases and provide a declarative way to specify database queries.
This in contrast to the relational algebra which is also part of the relational model but provides a more
procedural way for specifying queries
The relational algebra might suggest these steps to retrieve the phone numbers and names of book stores
that supply Some Sample Book:
Get StoreName and StorePhone for supplies such that there exists a title BK with the same BookstoreID
value and with a BookTitle value of Some Sample Book.
The relational algebra and the relational calculus are essentially logically equivalent: for any algebraic
expression, there is an equivalent expression in the calculus, and vice versa. This result is known as Codd's
theorem.
Tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational model, in order
to provide a declarative database-query language for this data model. It formed the inspiration for the
database-query languages QUEL and SQL, of which the latter, although far less faithful to the original
relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is
used by nearly every relational-database-management system. Lacroix and Pirotte proposed domain
calculus, which is closer to first-order logic and which showed that both of these calculi (as well as
relational algebra) are equivalent in expressive power. Subsequently, query languages for the relational
model were called relationally complete if they could express at least all of these queries.
Since the calculus is a query language for relational databases we first have to define a relational database.
The basic relational building block is the domain, or data type. A tuple is an ordered multiset of attributes,
which are ordered pairs of domain and value; or just a row. A relvar (relation variable) is a set of ordered
pairs of domain and name, which serves as the header for a relation. A relation is a set of tuples. Although
these relational concepts are mathematically defined, those definitions map loosely to traditional database
concepts. A table is an accepted visual representation of a relation; a tuple is similar to the concept of row.
We first assume the existence of a set C of column names, examples of which are "name", "author",
"address" et cetera. We define headers as finite subsets of C. A relational database schema is defined as a
Page 59 of 19 | P a g e
tuple S = (D, R, h) where D is the domain of atomic values (see relational model for more on the notions of
domain and atomic value), R is a finite set of relation names, and
h : R → 2C
a function that associates a header with each relation name in R. (Note that this is a simplification from the
full relational model where there is more than one domain and a header is not just a set of column names
but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial
function that maps some column names to an atomic value in D. An example would be (name : "Harry",
age : 25).
t:C→D
The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is called the
domain of t (not to be confused with the domain in the schema) and denoted as dom(t).
db : R → 2TD
that maps the relation names in R to finite subsets of TD, such that for every relation name r in R and tuple t
in db(r) it holds that
dom(t) = h(r).
The latter requirement simply says that all the tuples in a relation should contain the same column names,
namely those defined for it in the schema..........
Atoms
For the construction of the formulae we will assume an infinite set V of tuple variables. The formulas are
defined given a database schema S = (D, R, h) and a partial function type : V -> 2C that defines a type
assignment that assigns headers to some tuple variables. We then define the set of atomic formulas
A[S,type] with the following rules:
1. if v and w in V, a in type(v) and b in type(w) then the formula " v.a = w.b " is in A[S,type],
2. if v in V, a in type(v) and k denotes a value in D then the formula " v.a = k " is in A[S,type], and
3. if v in V, r in R and type(v) = h(r) then the formula " r(v) " is in A[S,type].
(t.age = s.age) — t has an age attribute and s has an age attribute with the same value
(t.name = "Codd") — tuple t has a name attribute and its value is "Codd"
Book(t) — tuple t is present in relation Book.
The formal semantics of such atoms is defined given a database db over S and a tuple variable binding val :
V -> TD that maps tuple variables to tuples over the domain in S:
Page 60 of 19 | P a g e
1. " v.a = w.b " is true if and only if val(v)(a) = val(w)(b)
2. " v.a = k " is true if and only if val(v)(a) = k
3. " r(v) " is true if and only if val(v) is in db(r)
Formulae
The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧
(and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to
bind the variables. We define the set of formulas F[S,type] inductively with the following rules:
Examples of formulas:
Note that the last formula states that all books that are written by C. J. Date have as their subject the
relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.
We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema.
This leads to the following formal semantics for formulas given a database db over S and a tuple variable
binding val : V -> TD:
1. " f1 ∧ f2 " is true if and only if " f1 " is true and " f2 " is true,
2. " f1 ∨ f2 " is true if and only if " f1 " is true or " f2 " is true or both are true,
3. " ¬ f " is true if and only if " f " is not true,
4. " ∃ v : H ( f ) " is true if and only if there is a tuple t over D such that dom(t) = H and the formula " f
" is true for val[v->t], and
5. " ∀ v : H ( f ) " is true if and only if for all tuples t over D such that dom(t) = H the formula " f " is
true for val[v->t].
Domain-independent queries
Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the
schema it can be that a query may return a different result for a certain database if another schema is
presumed. For example, consider the two schemas S1 = ( D1, R, h ) and S2 = ( D2, R, h ) with domains D1 = {
Page 61 of 19 | P a g e
1 }, D2 = { 1, 2 }, relation names R = { "r1" } and headers h = { ("r1", {"a"}) }. Both schemas have a
common instance:
db = { ( "r1", { ("a", 1) } ) }
then its result on db is either { (a : 1) } under S1 or { (a : 1), (a : 2) } under S2. It will also be clear that if we
take the domain to be an infinite set, then the result of the query will also be infinite. To solve these
problems we will restrict our attention to those queries that are domain independent, i.e., the queries that
return the same result for a database under all of its schemas.
An interesting property of these queries is that if we assume that the tuple variables range over tuples over
the so-called active domain of the database, which is the subset of the domain that occurs in at least one
tuple in the database or in the query expression, then the semantics of the query expressions does not
change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is
defined, which makes all queries by definition domain independent.
Safe queries
In order to limit the query expressions such that they express only domain-independent queries a
syntactical notion of safe query is usually introduced. To determine whether a query expression is safe we
will derive two types of information from a query. The first is whether a variable-column pair t.a is bound
to the column of a relation or a constant, and the second is whether two variable-column pairs are directly
or indirectly equated (denoted t.v == s.w).
For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for
equivalence relations: reflexivity, symmetry and transitivity):
Page 62 of 19 | P a g e
7. in " ∃ v : H ( f ) " it holds that w.a == x.b if it holds in f and w<>v and x<>v, and
8. in " ∀ v : H ( f ) " it holds that w.a == x.b if it holds in f and w<>v and x<>v.
for every column name a in H we can derive that v.a is equated with a bound pair in f,
for every subexpression of f of the form " ∀ w : G ( g ) " we can derive that for every column name
a in G we can derive that w.a is equated with a bound pair in g, and
for every subexpression of f of the form " ∃ w : G ( g ) " we can derive that for every column name
a in G we can derive that w.a is equated with a bound pair in g.
The restriction to safe query expressions does not limit the expressiveness since all domain-independent
queries that could be expressed can also be expressed by a safe query expression. This can be proven by
showing that for a schema S = (D, R, h), a given set K of constants in the query expression, a tuple variable
v and a header H we can construct a safe formula for every pair v.a with a in H that states that its value is in
the active domain. For example, assume that K={1,2}, R={"r"} and h = { ("r", {"a, "b"}) } then the
corresponding safe formula for v.b is:
This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query
expression by adding such a formula for every variable v and column name a in its type where it is used in
the expression. Effectively this means that we let all variables range over the active domain, which, as was
already explained, does not change the semantics if the expressed query is domain independent.
In computer science, domain relational calculus (DRC) is a calculus that was introduced by Michel Lacroix
and Alain Pirotte as a declarative database query language for the relational data model.
This language uses the same operators as tuple calculus, the logical connectives ∧ (and), ∨ (or) and ¬ (not).
The existential quantifier (∃) and the universal quantifier (∀) can be used to bind the variables.
Examples
and let (D, E, F) mean (Name, DeptName, ID) in the Department relation
Page 63 of 19 | P a g e
Find all captains of the starship USS Enterprise:
In this example, A, B, C denotes both the result set and a set in the table Enterprise.
In this example, we're only looking for the name, and that's B. F = C is a requirement, because we need to
find Enterprise crew members AND they are in the Stellar Cartography Department.
In this example, the value of the requested F domain is directly placed in the formula and the C domain
variable is re-used in the query for the existence of a department, since it already holds a crew member's id.
Relational Data Model: The Relational model uses relation (table) to represent both entities and
relationships among entities. A relation may be visualized as a table. However table is just one of the way,
among many, to represent a relation.
An entity-relationship (ER) diagram is a specialized graphic that illustrates the relationships between
entities in a database. ER diagrams often use symbols to represent three different types of information.
Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships and
ovals are used to represent attributes.
Examples: Consider the example of a database that contains information on the residents of a city. The ER
digram shown in the image above contains two entities -- people and cities. There is a single "Lives In"
relationship. Each person lives in only one city, but each city can house many people.
Entity Relationship diagrams (also known as E-R or ER diagrams) provide database designers with a
valuable tool for modeling the relationships between database entities in a clear, precise format. This
industry standard approach uses a series of block shapes and lines to describe the structure of a database in
Page 64 of 19 | P a g e
a manner understandable to all database professionals. Many database software packages,
including Microsoft Access, SQL Server, and Oracle, provide automated methods to quickly create E-R
diagrams from existing databases.
In this article, we provide an overview of E-R diagramming techniques to help you read, modify or create
your own data models.
Entities
In a database model, each object that you wish to track in the database is known as an entity. Normally,
each entity is stored in a database table and every instance of an entity corresponds to a row in that table. In
an ER diagram, each entity is depicted as a rectangular box with the name of the entity contained within it.
For example, a database containing information about individual people would likely have an entity called
Person. This would correspond to a table with the same name in the database and every person tracked in
the database would be an instance of that Person entity and have a corresponding row in the Person table.
Database designers creating an E-R diagram would draw the Person entity using a shape similar to this:
They would then repeat the process to create a rectangular box for each entity in the data model.
Types of DBMS Entities
Strong Entity- The strong entity has a primary key. Weak entities are dependent on strong entity. Its existence is not
dependent on any other entity.
Weak Entity- The weak entity in DBMS do not have a primary key and are dependent on the parent entity. It mainly
depends on other entities.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have values. For
example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a student's name
cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Databases contain information about each entity. This information is tracked in individual fields known as
attributes, which normally correspond to the columns of a database table.
For example, the Person entity might have attributes corresponding to the person's first and last name, date
of birth, and a unique person identifier. Each of these attributes is depicted in an E-R diagram as an oval, as
Page 65 of 19 | P a g e
shown in the figure below:
Attribute(s):
Attributes are the properties which define the entity type. For example, Roll_No, Name, DOB,
Age, Address, Mobile_No are the attributes which defines entity type Student. In ER diagram,
attribute is represented by an oval.
Types
1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key
attribute.For example, Roll_No will be unique for each student. In ER diagram, key attribute
3. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, multivalued attribute is
represented by double oval.
4. Derived Attribute –
An attribute which can be derived from other attributes of the entity type is known as
Page 66 of 19 | P a g e
derived attribute. e.g.; Age (can be derived from DOB). In ER diagram, derived attribute is
represented by dashed oval.
The power of the E-R diagram lies in its ability to accurately display information about the relationships
between entities. For example, we might track information in our database about the city where each
person lives. Information about the city itself is tracked within a City entity and a relationship is used to tie
together Person and City instances.
Relationships are normally given names that are verbs, while attributes and entities are named after nouns.
This convention makes it easy to express relationships. For example, if we name our Person/City
relationship "Lives In", we can string them together to say "A person lives in a city." We express
relationships in E-R diagrams by drawing a line between the related entities and placing a diamond shape
that contains the relationship name in the middle of the line. Here's how our Person/City relationship would
look:
Notice that there are some additional shapes on the line. The double hashed line appearing just to the left of
the City entity indicates that this part of the relationship has a cardinality of 1. On the other hand, the
crow's foot symbol to the right of the Person entity indicates that this part of the relationship has a
cardinality of "many". Stated more plainly, each person may live in only one city, while a city may contain
Page 67 of 19 | P a g e
many people.
Those are the basics of Entity-Relationship diagrams. You should now have the information you need to
create basic diagrams for your databases.
Cardinality and Ordinality are shown by the styling of a line and its endpoint, according to the chosen
notation style.
Types of cardinality
Page 68 of 19 | P a g e
One-to-one − One instance from entity set A can be associated with at most one instance of entity
set B and vice versa.
One-to-many − One instance from entity set A can be associated with more than one instance of
entity set B however an instance from entity set B, can be associated with at most one instance.
Many-to-one − More than one instances from entity A can be associated with at most one instance
of entity B, however an instance from entity B can be associated with more than one instances
from entity A.
Many-to-many − One instance from entity set A can be associated with more than one instances
from B and vice versa.
Cardinality and ordinality, respectively, refer to the maximum number of times an instance in one entity
can be associated with instances in the related entity, and the minimum number of times an instance in one
entity can be associated with an instance in the related entity. Cardinality and or dinality are
represented by the styling of a line and its endpoint, as denoted by the chosen notation style.
Procedure
6.0 Normalization
Database normalization is the process of restructuring a relational database in accordance with a series of
so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed
by Edgar F. Codd as an integral part of his relational model.
Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that
their dependencies are properly enforced by database integrity constraints. It is accomplished by applying
Page 70 of 19 | P a g e
some formal rules either by a process of synthesis (creating a new database design) or decomposition
(improving an existing database design).
The objectives of normalization beyond 1NF (first normal form) were stated as follows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion
dependencies;
2. To reduce the need for restructuring the collection of relations, as new types of data are
introduced, and thus increase the life span of application programs;
3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query statistics, where these statistics are
liable to change as time goes by.
If a table is not properly normalized and have data redundancy then it will not only eat up extra memory
space but will also make it difficult to handle and update the database, without facing data loss. Insertion,
Updation and Deletion Anomalies are very frequent if database is not normalized.
To understand these anomalies let us take an example of a Student table.
.
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod
(Head of Department) and office_tel is repeated for the students who are in the same branch in the college,
this is Data Redundancy.
Page 71 of 19 | P a g e
Insertion Anomaly
Insertion anomaly. There are circumstances in which certain facts cannot be recorded at all. For
example, each record in a "Faculty and Their Courses" relation might contain a Faculty ID, Faculty
Name, Faculty Hire Date, and Course Code. Therefore, we can record the details of any faculty
member who teaches at least one course, but we cannot record a newly hired faculty member who has
not yet been assigned to teach any courses, except by setting the Course Code to null. This phenomenon
is known as an insertion anomaly.
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be
inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be
repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
Updating Anomaly
Update anomaly. The same information can be expressed on multiple rows; therefore updates to the
relation may result in logical inconsistencies. For example, each record in an "Employees' Skills" relation
might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular
employee may need to be applied to multiple records (one for each skill). If the update is only partially
successful – the employee's address is updated on some records but not others – then the relation is left in
an inconsistent state. Specifically, the relation provides conflicting answers to the question of what this
particular employee's address is. This phenomenon is known as an update anomaly.
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all
the student records will have to be updated, and if by mistake we miss any record, it will lead to data
inconsistency. This is Updating anomaly.
Deletion Anomaly
Deletion anomaly. Under certain circumstances, deletion of data representing certain facts necessitates
deletion of data representing completely different facts. The "Faculty and Their Courses" relation
described in the previous example suffers from this type of anomaly, for if a faculty member
temporarily ceases to be assigned to any courses, we must delete the last of the records on which that
faculty member appears, effectively also deleting the faculty member, unless we set the Course Code to
null. This phenomenon is known as a deletion anomaly.
In our Student table, two different information are kept together, Student information and Branch
information. Hence, at the end of the academic year, if student records are deleted, we will also lose the
branch information. This is Deletion anomaly.
Page 72 of 19 | P a g e
Normalization is the process of splitting relations into well structured relations that allow users to insert,
delete, and update tuples without introducing database. Without normalization many problems can occur
when trying to load an integrated conceptual model into the DBMS. These problems arise from relations
that are generated directly from user views are called anomalies. There are three types of anomalies:
update, deletion and insertion anomalies.
An update anomaly is a data inconsistency that results from data redundancy and a partial update. For
example, each employee in a company has a department associated with them as well as the student group
they participate in.
If A. Bruchs’ department is an error it must be updated at least 2 times or there will be inconsistent data in
the database. If the user performing the update does not realize the data is stored redundantly the update
will not be done properly.
A deletion anomaly is the unintended loss of data due to deletion of other data. For example, if the student
group Beta Alpha Psi disbanded and was deleted from the table above, J. Longfellow and the Accounting
department would cease to exist. This results in database inconsistencies and is an example of how
combining information that does not really belong together into one table can cause problems.
An insertion anomaly is the inability to add data to the database due to absence of other data. For example,
assume Student_Group is defined so that null values are not allowed. If a new employee is hired but not
immediately assigned to a Student_Group then this employee could not be entered into the database. This
results in database inconsistencies due to omission.
Update, deletion, and insertion anomalies are very undesirable in any database. Anomalies are avoided by
the process of normalization.
Page 73 of 19 | P a g e
5. Fourth Normal Form
A functional dependency (FD) is a relationship between two attributes, typically between the PK and other
non-key attributes within a table. For any relation R, attribute Y is functionally dependent on attribute X
(usually the PK), if for every valid instance of X, that value of X uniquely determines the value of Y. This
relationship is indicated by the representation below:
X ———–> Y
The left side of the above FD diagram is called the determinant, and the right side is the dependent. Here
are a few examples.
In the first example, below, SIN determines Name, Address and Birthdate. Given SIN, we can determine
any of the other attributes within the table.
For the second example, SIN and Course determine the date completed (DateCompleted). This must also
work for a composite PK.
Inference Rules
Armstrong’s axioms are a set of inference rules used to infer all the functional dependencies on a relational
database. They were developed by William W. Armstrong. The following describes what will be used, in
terms of notation, to explain these axioms.
Let R(U) be a relation scheme over the set of attributes U. We will use the letters X, Y, Z to represent any
subset of and, for short, the union of two sets of attributes, instead of the usual X U Y.
Axiom of reflexivity
Page 74 of 19 | P a g e
For example, PartNo —> NT123 where X (PartNo) is composed of more than one piece of information;
i.e., Y (NT) and partID (123).
Axiom of augmentation
The axiom of augmentation, also known as a partial dependency, says if X determines Y, then XZ
determines YZ for any Z (see Figure 11.2 ).
The axiom of augmentation says that every non-key attribute must be fully dependent on the PK. In the
example shown below, StudentName, Address, City, Prov, and PC (postal code) are only dependent on the
StudentNo, not on the StudentNo and Grade.
StudentNo, Course —> StudentName, Address, City, Prov, PC, Grade, DateCompleted
This situation is not desirable because every non-key attribute has to be fully dependent on the PK. In this
situation, student information is only partially dependent on the PK (StudentNo).
To fix this problem, we need to break the original table down into two as follows:
Axiom of transitivity
The axiom of transitivity says if X determines Y, and Y determines Z, then X must also determine Z (see
Figure 11.3).
The table below has information not directly related to the student; for instance, ProgramID and
ProgramName should have a table of its own. ProgramName is not dependent on StudentNo; it’s dependent
on ProgramID.
This situation is not desirable because a non-key attribute (ProgramName) depends on another non-key
attribute (ProgramID).
To fix this problem, we need to break this table into two: one to hold information about the student and the
other to hold information about the program.
Page 75 of 19 | P a g e
However we still need to leave an FK in the student table so that we can identify which program the
student is enrolled in.
Union
This rule suggests that if two tables are separate, and the PK is the same, you may want to consider putting
them together. It states that if X determines Y and X determines Z then X must also determine Y and Z (see
Figure 11.4).
You may want to join these two tables into one as follows:
Some database administrators (DBA) might choose to keep these tables separated for a couple of reasons.
One, each table describes a different entity so the entities should be kept apart. Two, if SpouseName is to
be left NULL most of the time, there is no need to include it in the same table as EmpName.
Decomposition
Decomposition is the reverse of the Union rule. If you have a table that appears to contain two entities that
are determined by the same PK, consider breaking them up into two tables. This rule states that if X
determines Y and Z, then X determines Y and X determines Z separately (see Figure 11.5).
Dependency Diagram
A dependency diagram, shown in Figure 11.6, illustrates the various dependencies that might exist in a
non-normalized table. A non-normalized table is one that has data redundancy in it.
Page 76 of 19 | P a g e
o ProjectNo —> ProjName
o EmpNo —> EmpName, DeptNo,
o ProjectNo, EmpNo —> HrsWork
Transitive Dependency:
o DeptNo —> DeptName
Functional Dependency
Functional dependency is a relationship that exists when one attribute uniquely determines another
attribute.
If R is a relation with attributes X and Y, a functional dependency between the attributes is represented as
X->Y, which specifies Y is functionally dependent on X. Here X is a determinant set and Y is a dependent
attribute. Each value of X is associated with precisely one Y value.
Functional dependency in a database serves as a constraint between two sets of attributes. Defining
functional dependency is an important part of relational database design and contributes to aspect
normalization.
Functional Dependency avoids data redundancy where same data should not be repeated at
multiple locations in same database.
It maintains the quality of data in database.
It allows clearly defined meanings and constraints of databases.
It helps in identifying bad designs.
It expresses the facts about the database design.
Types of Functional dependency
Page 77 of 19 | P a g e
1. Trivial functional dependency
Example:
Example:
1. ID → Name,
2. Name → DOB
Transitive Dependency
X -> Z is a transitive dependency if the following three functional dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This dependency
helps us normalizing the database in 3NF (3rd Normal Form).
AUTHORS
Book → Author: Here, the Book attribute determines the Author attribute. If you know the book
name, you can learn the author's name. However, Authordoes not determine Book, because an
author can write multiple books. For example, just because we know the author's name Orson Scott
Card, we still don't know the book name.
Author → Author_Nationality: Likewise, the Author attribute determines the Author_Nationality,
but not the other way around; just because we know the nationality does not mean we can determine
the author.
Book →Author_Nationality: If we know the book name, we can determine the nationality via the
Author column.
We can start by removing the Book column from the Authors table and creating a separate Books table:
BOOKS
AUTHORS
Page 79 of 19 | P a g e
BOOKS table:
AUTHORS table:
COUNTRIES
Country_ID Country
Coun_002 Canada
AUTHORS
Now we have three tables, making use of foreign keys to link between the tables:
The BOOK table's foreign key Author_ID links a book to an author in the AUTHORS table.
The AUTHORS table's foreign key Country_ID links an author to a country in the
COUNTRIES table.
The COUNTRIES table has no foreign key because it has no need to link to another table in this
design.
What is the value of avoiding transitive dependencies to help ensure 3NF? Let's consider our first table
again and see the issues it creates:
AUTHORS
Page 80 of 19 | P a g e
Author ID Author Book Author_Nationality
This kind of design can contribute to data anomalies and inconsistencies, for example:
If you deleted the two books "Children of the Mind" and "Ender's Game," you would delete the
author "Orson Scott Card" and his nationality completely from the database.
You cannot add a new author to the database unless you also add a book; what if the author is yet
unpublished or you don't know the name of a book she has authored?
If "Orson Scott Card" changed his citizenship, you would have to change it in all records in which
he appears. Having multiple records with the same author can result in inaccurate data: what if the
data entry person doesn't realize there are multiple records for him and changes the data in only one
record?
You cannot delete a book like "The Handmaid's Tale" without also deleting the author completely.
Full Functional Dependency: In a relation, there exists Full Functional Dependency between any two
attributes X and Y, when X is functionally dependent on Y and is not functionally dependent on any proper
subset of Y.
Partial Functional Dependency: In a relation, there exists Partial Dependency, when a non prime attribute
(the attributes which are not a part of any candidate key) is functionally dependent on a proper subset of
Candidate Key.
For example: Let there be a relation R (Course, Sid, Sname, fid, schedule, room, marks)
Full Functional Dependencies: {Course, Sid) -> Sname, {Course, Sid} -> Marks, etc.
Partial Functional Dependencies: Course -> Schedule, Course -> Room
A full functional dependency is a state of database normalization that equates to the normalization
standard of Second Normal Form (2NF). In brief, this means that it meets the requirements of First
Normal Form (1NF), and all non-key attributes are fully functionally dependent on the primary key.
Page 81 of 19 | P a g e
Normalization is a method to remove all these anomalies and bring the database to a consistent state.
First Normal Form First Normal Form is defined in the definition of relations tables itself.
This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic
domain are indivisible units.
Before we learn about the second normal form, we need to understand the following − Prime attribute − An
attribute, which is a part of the prime-key, is known as a prime attribute. Non-prime attribute − An
attribute, which is not a part of the prime-key, is said to be a non-prime attribute. If we follow second
normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute.
That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds
true.
Page 82 of 19 | P a g e
Page 83 of 19 | P a g e
Example here
Page 84 of 19 | P a g e
7.0 Querying a database
A database query is a request for data from a database. Usually the request is to retrieve
data; however, data can also be manipulated using queries. The data can come from one or
more tables, or even other queries.
Sql
SQL is Structured Query Language, which is a computer language for storing, manipulating and retrieving
data stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database Management
Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL
as their standard database language.
Also, they are using different dialects, such as:
Why SQL?
SQL is widely popular because it offers the following advantages:
Page 85 of 19 | P a g e
7.2 Components of database query
When you are executing an SQL command for any RDBMS, the system determines the best way to carry
out your request and SQL engine figures out how to interpret the task. There are various components
included in this process.
These components are –
Query Dispatcher
Optimization Engines
Classic Query Engine
SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine won't handle logical files.
Following is a simple diagram showing the SQL Architecture:
Query parser, takes query text and produces a parse tree (or produces syntax or semantic errors as
appropriate).
Query optimizer, takes the parse tree structure and produces an execution plan data structure. At
this point, the best indexes to be used will be determined, join methods and join order will be
figured out, etc, and all this stuff will annotate the execution plan structure.
Query executor or query processor - takes the execution plan and interacts with the Storage
Manager to actually fetch the data from storage (or cache, etc), and presents the data to the user
using outbound client-side APIs.
Database engine- A database engine (or storage engine) is the underlying software component
that a database management system (DBMS) uses to create, read, update and delete
(CRUD) data from a database.
Page 86 of 19 | P a g e
Structured Query Language (SQL) as we all know is the database language by the use of which we can
perform certain operations on the existing database and also we can use this language to create a database.
SQL uses certain commands like Create, Drop, Insert etc. to carry out the required tasks.
These SQL commands are mainly categorized into four categories as discussed below:
1. DDL (Data Definition Language): DDL or Data Definition Language actually consists of the SQL
commands that can be used to define the database schema. It simply deals with descriptions of the
database schema and is used to create and modify the structure of database objects in database.
Examples of DDL commands:
CREATE – is used to create the database or its objects (like table, index, function, views, store
procedure and triggers).
DROP – is used to delete objects from the database.
ALTER-is used to alter the structure of the database.
TRUNCATE–is used to remove all records from a table, including all spaces allocated for the
records are removed.
COMMENT –is used to add comments to the data dictionary.
RENAME –is used to rename an object existing in the database.
2. DML (Data Manipulation Language) : The SQL commands that deals with the manipulation of
data present in database belong to DML or Data Manipulation Language and this includes most of the
SQL statements.
Examples of DML:
SELECT – is used to retrieve data from the database.
INSERT – is used to insert data into a table.
UPDATE – is used to update existing data within a table.
DELETE – is used to delete records from a database table.
3. DCL (Data Control Language): DCL includes commands such as GRANT and REVOKE which
mainly deals with the rights, permissions and other controls of the database system.
Examples of DCL commands:
GRANT-gives user’s access privileges to database.
REVOKE-withdraw user’s access privileges given by using the GRANT command.
4. TCL (transaction Control Language): TCL commands deals with the transaction within the
database.
Examples of TCL commands:
COMMIT– commits a Transaction.
ROLLBACK– rollbacks a transaction in case of any error occurs.
SAVEPOINT–sets a save point within a transaction.
SET TRANSACTION–specify characteristics for the transaction.
Page 87 of 19 | P a g e
Page 88 of 19 | P a g e
7.4 Design SQL queries
Each column in a database table is required to have a name and a data type.
SQL developers have to decide what types of data will be stored inside each and every table column when
creating a SQL table. The data type is a label and a guideline for SQL to understand what type of data is
expected inside of each column, and it also identifies how SQL will interact with the stored data.
The following table lists the general data types in SQL:
In MySQL there are three main data types: text, number, and date.
Holds a fixed length string (can contain letters, numbers, and special characters). The
CHAR(size)
fixed size is specified in parenthesis. Can store up to 255 characters
Holds a variable length string (can contain letters, numbers, and special characters).
VARCHAR(size) The maximum size is specified in parenthesis. Can store up to 255 characters. Note:
If you put a greater value than 255 it will be converted to a TEXT type
BLOB For BLOBs (Binary Large OBjects). Holds up to 65,535 bytes of data
MEDIUMBLOB For BLOBs (Binary Large OBjects). Holds up to 16,777,215 bytes of data
LONGBLOB For BLOBs (Binary Large OBjects). Holds up to 4,294,967,295 bytes of data
Let you enter a list of possible values. You can list up to 65535 values in an ENUM
list. If a value is inserted that is not in the list, a blank value will be inserted.
ENUM(x,y,z,etc.)
Note: The values are sorted in the order you enter them.
Page 89 of 19 | P a g e
Number data types:
-128 to 127 normal. 0 to 255 UNSIGNED*. The maximum number of digits may be
TINYINT(size)
specified in parenthesis
A small number with a floating decimal point. The maximum number of digits may
FLOAT(size,d) be specified in the size parameter. The maximum number of digits to the right of the
decimal point is specified in the d parameter
A large number with a floating decimal point. The maximum number of digits may be
DOUBLE(size,d) specified in the size parameter. The maximum number of digits to the right of the
decimal point is specified in the d parameter
A DOUBLE stored as a string , allowing for a fixed decimal point. The maximum
DECIMAL(size,d) number of digits may be specified in the size parameter. The maximum number of
digits to the right of the decimal point is specified in the d parameter
Syntax
Syntax
The BACKUP DATABASE statement is used in SQL Server to create a full back up of an existing SQL
database.
Syntax
A differential back up only backs up the parts of the database that have changed since the last full database
backup.
Syntax
Page 91 of 19 | P a g e
BACKUP DATABASE databasename
TO DISK = 'filepath'
WITH DIFFERENTIAL;
The following SQL statement creates a full back up of the existing database "testDB" to the D disk:
Example
Syntax
The column parameters specify the names of the columns of the table.
The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date, etc.).
Tip: For an overview of the available data types, go to our complete Data Types Reference.
The following example creates a table called "Persons" that contains five columns: PersonID, LastName,
FirstName, Address, and City:
Example
Page 92 of 19 | P a g e
Create Table Using another Table
Syntax
The following SQL creates a new table called "TestTables" (which is a copy of the "Customers" table):
Example
Syntax
Note: Be careful before dropping a table. Deleting a table will result in loss of complete information stored
in the table!
Example
The TRUNC
ATE TABLE statement is used to delete the data inside a table, but not the table itself.
Page 93 of 19 | P a g e
Syntax
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
The ALTER TABLE statement is also used to add and drop various constraints on an existing table.
Example
To delete a column in a table, use the following syntax (notice that some database systems don't allow
deleting a column):
The following SQL deletes the "Email" column from the "Customers" table:
Example
To change the data type of a column in a table, use the following syntax:
SQL Server / MS Access:
Page 94 of 19 | P a g e
My SQL / Oracle (prior version 10G):
Notice that the new column, "DateOfBirth", is of type date and is going to hold a date. The data type
specifies what type of data the column can hold. For a complete reference of all the data types available in
MS Access, MySQL, and SQL Server, go to our complete Data Types reference.
The "Persons" table will now look like this:
Now we want to change the data type of the column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:
Notice that the "DateOfBirth" column is now of type year and is going to hold a year in a two- or four-digit
format.
Page 95 of 19 | P a g e
DROP COLUMN Example
Next, we want to delete the column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:
SQL Constraints
SQL constraints are used to specify rules for the data in a table.
Constraints are used to limit the type of data that can go into a table. This ensures the accuracy and
reliability of the data in the table. If there is any violation between the constraint and the data action, the
action is aborted.
Constraints can be column level or table level. Column level constraints apply to a column, and table level
constraints apply to the whole table.
The following constraints are commonly used in SQL:
The following SQL creates a PRIMARY KEY on the "ID" column when the "Persons" table is created:
MySQL:
Page 96 of 19 | P a g e
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);
SQL Server / Oracle / MS Access:
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on
multiple columns, use the following SQL syntax:
MySQL / SQL Server / Oracle / MS Access:
Note: In the example above there is only ONE PRIMARY KEY (PK_Person). However, the VALUE of
the primary key is made up of TWO COLUMNS (ID + LastName).
1 Hansen Ola 30
Page 97 of 19 | P a g e
2 Svendson Tove 23
3 Pettersen Kari 20
"Orders" table:
1 77895 3
2 44678 3
3 22456 2
4 24562 1
Notice that the "PersonID" column in the "Orders" table points to the "PersonID" column in the "Persons"
table.
The "PersonID" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.
The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key column,
because it has to be one of the values contained in the table it points to.
The following SQL creates a FOREIGN KEY on the "PersonID" column when the "Orders" table is
created:
MySQL:
Page 98 of 19 | P a g e
PersonID int FOREIGN KEY REFERENCES Persons(PersonID)
);
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on
multiple columns, use the following SQL syntax:
To create a FOREIGN KEY constraint on the "PersonID" column when the "Orders" table is already
created, use the following SQL:
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on
multiple columns, use the following SQL syntax:
MySQL:
Page 99 of 19 | P a g e
ALTER TABLE Orders
DROP CONSTRAINT FK_PersonOrder;
To create a PRIMARY KEY constraint on the "ID" column when the table is already created, use the
following SQL:
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on
multiple columns, use the following SQL syntax:
Note: If you use the ALTER TABLE statement to add a primary key, the primary key column(s) must
already have been declared to not contain NULL values (when the table was first created).
MySQL:
Page 100 of 19 | P a g e
It is possible to write the INSERT INTO statement in two ways.
The first way specifies both the column names and the values to be inserted:
If you are adding values for all the columns of the table, you do not need to specify the column names in
the SQL query. However, make sure the order of the values is in the same order as the columns in the table.
The INSERT INTO syntax would be as follows:
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement inserts a new record in the "Customers" table:
Example
The selection from the "Customers" table will now look like this:
Page 101 of 19 | P a g e
Did you notice that we did not insert any number into the CustomerID field?
The CustomerID column is an auto-increment field and will be generated automatically when a new record
is inserted into the table.
Example
The selection from the "Customers" table will now look like this:
SELECT Syntax
Here, column1, column2, ... are the field names of the table you want to select data from. If you want to
select all the fields available in the table, use the following syntax:
Page 102 of 19 | P a g e
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement selects the "CustomerName" and "City" columns from the "Customers"
table:
Example
The following SQL statement selects all the columns from the "Customers" table:
Example
The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the
different (distinct) values.
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
Page 103 of 19 | P a g e
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
SELECT Example
The following SQL statement selects all (and duplicate) values from the "Country" column in the
"Customers" table:
Example
Now, let us use the DISTINCT keyword with the above SELECT statement and see the result.
The following SQL statement selects only the DISTINCT values from the "Country" column in the
"Customers" table:
Example
The following SQL statement lists the number of different (distinct) customer countries:
Example
Note: The example above will not work in Firefox and Microsoft Edge! Because COUNT(DISTINCT
column_name) is not supported in Microsoft Access databases. Firefox and Microsoft Edge are using
Microsoft Access in our examples.
Here is the workaround for MS Access:
Example
Page 104 of 19 | P a g e
SELECT Count(*) AS DistinctCountries
FROM (SELECT DISTINCT Country FROM Customers);
WHERE Syntax
Note: The WHERE clause is not only used in SELECT statement, it is also used in UPDATE, DELETE
statement, etc.!
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement selects all the customers from the country "Mexico", in the "Customers"
table:
Example
Page 105 of 19 | P a g e
Text Fields vs. Numeric Fields
SQL requires single quotes around text values (most database systems will also allow double quotes).
However, numeric fields should not be enclosed in quotes:
Example
Operator Description
= Equal
<> Not equal. Note: In some versions of SQL this operator may be written as !=
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between a certain range
LIKE Search for a pattern
IN To specify multiple possible values for a column
The WHERE clause can be combined with AND, OR, and NOT operators.
The AND and OR operators are used to filter records based on more than one condition:
The AND operator displays a record if all the conditions separated by AND is TRUE.
The OR operator displays a record if any of the conditions separated by OR is TRUE.
AND Syntax
Page 106 of 19 | P a g e
NOT Syntax
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
AND Example
The following SQL statement selects all fields from "Customers" where country is "Germany" AND city is
"Berlin":
Example
The following SQL statement selects all fields from "Customers" where city is "Berlin" OR "München":
Example
The ORDER BY keyword is used to sort the result-set in ascending or descending order.
Page 107 of 19 | P a g e
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending
order, use the DESC keyword.
ORDER BY Syntax
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
ORDER BY Example
The following SQL statement selects all customers from the "Customers" table, sorted by the "Country"
column:
Example
UPDATE Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Page 108 of 19 | P a g e
Note: Be careful when updating records in a table! Notice the WHERE clause in the UPDATE statement.
The WHERE clause specifies which record(s) that should be updated. If you omit the WHERE clause, all
records in the table will be updated!
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund
UPDATE Table
The following SQL statement updates the first customer (CustomerID = 1) with a new contact person and a
new city.
Example
UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;
The selection from the "Customers" table will now look like this:
Ana Trujillo
Avda. de la México
2 Emparedados y Ana Trujillo 05021 Mexico
Constitución 2222 D.F.
helados
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund
It is the WHERE clause that determines how many records that will be updated.
The following SQL statement will update the contactname to "Juan" for all records where country is
"Mexico":
Example
UPDATE Customers
SET ContactName='Juan'
WHERE Country='Mexico';
The selection from the "Customers" table will now look like this:
Ana Trujillo
Avda. de la México
2 Emparedados y Juan 05021 Mexico
Constitución 2222 D.F.
helados
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund
Update Warning!
Be careful when updating records. If you omit the WHERE clause, ALL records will be updated!
Example
Page 110 of 19 | P a g e
UPDATE Customers
SET ContactName='Juan';
The selection from the "Customers" table will now look like this:
DELETE Syntax
Note: Be careful when deleting records in a table! Notice the WHERE clause in the DELETE statement.
The WHERE clause specifies which record(s) should be deleted. If you omit the WHERE clause, all
records in the table will be deleted!
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement deletes the customer "Alfreds Futterkiste" from the "Customers" table:
Example
4 Around the Horn Thomas Hardy 120 Hanover London WA1 1DP UK
Sq.
It is possible to delete all rows in a table without deleting the table. This means that the table structure,
attributes, and indexes will be intact:
The following SQL statement deletes all rows in the "Customers" table, without deleting the table:
Example
The SELECT TOP clause is used to specify the number of records to return.
Page 112 of 19 | P a g e
The SELECT TOP clause is useful on large tables with thousands of records. Returning a large number of
records can impact on performance.
Note: Not all database systems support the SELECT TOP clause. MySQL supports the LIMIT clause to
select a limited number of records, while Oracle uses ROWNUM.
SELECT column_name(s)
FROM table_name
WHERE condition
LIMIT number;
Oracle Syntax:
SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number;
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement selects the first three records from the "Customers" table:
Example
The following SQL statement shows the equivalent example using the LIMIT clause:
Page 113 of 19 | P a g e
Example
The following SQL statement shows the equivalent example using ROWNUM:
Example
The following SQL statement selects the first 50% of the records from the "Customers" table:
Example
The following SQL statement selects the first three records from the "Customers" table, where the country
is "Germany":
Example
The following SQL statement shows the equivalent example using the LIMIT clause:
Example
The following SQL statement shows the equivalent example using ROWNUM:
Example
Page 114 of 19 | P a g e
SQL COUNT (), AVG() and SUM() Functions
The COUNT() function returns the number of rows that matches a specified criteria.
The AVG() function returns the average value of a numeric column.
The SUM() function returns the total sum of a numeric column.
COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;
SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;
Demo Database
Below is a selection from the "Products" table in the Northwind sample database:
COUNT() Example
Example
SELECT COUNT(ProductID)
FROM Products;
Note: NULL values are not counted.
Page 115 of 19 | P a g e
AVG() Example
The following SQL statement finds the average price of all products:
Example
SELECT AVG(Price)
FROM Products;
Demo Database
Below is a selection from the "OrderDetails" table in the Northwind sample database:
SUM() Example
The following SQL statement finds the sum of the "Quantity" fields in the "OrderDetails" table:
Example
SELECT SUM(Quantity)
FROM OrderDetails;
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are two wildcards used in conjunction with the LIKE operator:
Note: MS Access uses a question mark (?) instead of the underscore (_).
The percent sign and the underscore can also be used in combinations!
Page 116 of 19 | P a g e
LIKE Syntax
Tip: You can also combine any number of conditions using AND or OR operators.
Here are some examples showing different LIKE operators with '%' and '_' wildcards:
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement selects all customers with a CustomerName starting with "a":
Example
The following SQL statement selects all customers with a CustomerName ending with "a":
Example
Page 117 of 19 | P a g e
SELECT * FROM Customers
WHERE CustomerName LIKE '%a';
The following SQL statement selects all customers with a CustomerName that have "or" in any position:
Example
The following SQL statement selects all customers with a CustomerName that have "r" in the second
position:
Example
The following SQL statement selects all customers with a CustomerName that starts with "a" and are at
least 3 characters in length:
Example
The following SQL statement selects all customers with a ContactName that starts with "a" and ends with
"o":
Example
The following SQL statement selects all customers with a CustomerName that does NOT start with "a":
Example
Exercise:
Select all records where the value of the City column starts with the letter "a".
Page 118 of 19 | P a g e
A wildcard character is used to substitute any other character(s) in a string.
Wildcard characters are used with the SQL LIKE operator. The LIKE operator is used in a WHERE clause
to search for a specified pattern in a column.
There are two wildcards used in conjunction with the LIKE operator:
Note: MS Access uses a question mark (?) instead of the underscore (_).
In MS Access and SQL Server you can also use:
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement selects all customers with a City starting with "ber":
Example
Page 119 of 19 | P a g e
SELECT * FROM Customers
WHERE City LIKE 'ber%';
The SQL IN Operator
IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
or:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
Christina
5 Berglunds snabbköp Berguvsvägen 8 Luleå S-958 22 Sweden
Berglund
IN Operator Examples
The following SQL statement selects all customers that are located in "Germany", "France" and "UK":
Example
Page 120 of 19 | P a g e
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');
The following SQL statement selects all customers that are NOT located in "Germany", "France" or "UK":
Example
The following SQL statement selects all customers that are from the same countries as the suppliers:
Example
The BETWEEN operator selects values within a given range. The values can be numbers, text, or dates.
The BETWEEN operator is inclusive: begin and end values are included.
BETWEEN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
Demo Database
Below is a selection from the "Products" table in the Northwind sample database:
2 Chang 1 1 24 - 12 oz bottles 19
BETWEEN Example
Page 121 of 19 | P a g e
The following SQL statement selects all products with a price BETWEEN 10 and 20:
Example
To display the products outside the range of the previous example, use NOT BETWEEN:
Example
The following SQL statement selects all products with a price BETWEEN 10 and 20. In addition; do not
show products with a CategoryID of 1,2, or 3:
Example
The following SQL statement selects all products with a ProductName BETWEEN 'Carnarvon Tigers' and
'Mozzarella di Giovanni':
Example
The following SQL statement selects all products with a ProductName NOT BETWEEN 'Carnarvon
Tigers' and 'Mozzarella di Giovanni':
Example
Page 122 of 19 | P a g e
SELECT * FROM Products
WHERE ProductName NOT BETWEEN 'Carnarvon Tigers' AND 'Mozzarella di Giovanni'
ORDER BY ProductName;
Sample Table
Below is a selection from the "Orders" table in the Northwind sample database:
10248 90 5 7/4/1996 3
10249 81 6 7/5/1996 1
10250 34 4 7/8/1996 2
10251 84 3 7/9/1996 1
10252 76 4 7/10/1996 2
The following SQL statement selects all orders with an OrderDate BETWEEN '01-July-1996' and '31-July-
1996':
Example
OR:
Example
SQL aliases are used to give a table, or a column in a table, a temporary name.
Aliases are often used to make column names more readable.
An alias only exists for the duration of the query.
Page 123 of 19 | P a g e
Alias Table Syntax
SELECT column_name(s)
FROM table_name AS alias_name;
Demo Database
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
10354 58 8 1996-11-14 3
10355 4 6 1996-11-15 1
10356 86 6 1996-11-18 2
The following SQL statement creates two aliases, one for the CustomerID column and one for the
CustomerName column:
Example
The following SQL statement creates two aliases, one for the CustomerName column and one for the
ContactName column. Note: It requires double quotation marks or square brackets if the alias name
contains spaces:
Example
Page 124 of 19 | P a g e
SELECT CustomerName AS Customer, ContactName AS [Contact Person]
FROM Customers;
The following SQL statement creates an alias named "Address" that combine four columns (Address,
PostalCode, City and Country):
Example
SELECT CustomerName, Address + ', ' + PostalCode + ' ' + City + ', ' + Country AS Address
FROM Customers;
Note: To get the SQL statement above to work in MySQL use the following:
The following SQL statement selects all the orders from the customer with CustomerID=4 (Around the
Horn). We use the "Customers" and "Orders" tables, and give them the table aliases of "c" and "o"
respectively (Here we use aliases to make the SQL shorter):
Example
The following SQL statement is the same as above, but without aliases:
Example
Page 125 of 19 | P a g e
SQL - Sub Queries
A Subquery or Inner query or a Nested query is a query within another SQL query and embedded within
the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition to further restrict the
data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the
operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause, unless multiple columns are in the
main query for the subquery to compare its selected columns.
An ORDER BY command cannot be used in a subquery, although the main query can use an
ORDER BY. The GROUP BY command can be used to perform the same function as the ORDER
BY in a subquery.
Subqueries that return more than one row can only be used with multiple value operators such as
the IN operator.
The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY,
CLOB, or NCLOB.
A subquery cannot be immediately enclosed in a set function.
The BETWEEN operator cannot be used with a sub query. However, the BETWEEN operator can
be used within the sub query.
Subqueries are most frequently used with the SELECT statement. The basic syntax is as follows −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Page 126 of 19 | P a g e
Now, let us check the following subquery with a SELECT statement.
SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500) ;
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
Subqueries with the INSERT Statement
Subqueries also can be used with INSERT statements. The INSERT statement uses the data returned from
the subquery to insert into another table. The selected data in the subquery can be modified with any of the
character, date or number functions.
The basic syntax is as follows.
Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table. Now to copy the
complete CUSTOMERS table into the CUSTOMERS_BKP table, you can use the following syntax.
The subquery can be used in conjunction with the UPDATE statement. Either single or multiple columns in
a table can be updated when using a subquery with the UPDATE statement.
The basic syntax is as follows.
UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
Page 127 of 19 | P a g e
[ WHERE) ]
Example
Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table. The
following example updates SALARY by 0.25 times in the CUSTOMERS table for all the customers whose
AGE is greater than or equal to 27.
This would impact two rows and finally CUSTOMERS table would have the following records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 125.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 2125.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Subqueries with the DELETE Statement
The subquery can be used in conjunction with the DELETE statement like with any other statements
mentioned above.
The basic syntax is as follows.
Example
Assuming, we have a CUSTOMERS_BKP table available which is a backup of the CUSTOMERS table.
The following example deletes the records from the CUSTOMERS table for all the customers whose AGE
is greater than or equal to 27.
This would impact two rows and finally the CUSTOMERS table would have the following records.
Page 128 of 19 | P a g e
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
Let's look at a selection from the "Orders" table:
10308 2 1996-09-18
10309 37 1996-09-19
10310 77 1996-09-20
Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the "Customers"
table. The relationship between the two tables above is the "CustomerID" column.
Then, we can create the following SQL statement (that contains an INNER JOIN), that selects records that
have matching values in both tables:
Example
Page 129 of 19 | P a g e
OrderID CustomerName OrderDate
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the
right table
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the
left table
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table
The INNER JOIN keyword selects records that have matching values in both tables.
SELECT column_name(s)
FROM table1
INNER JOIN table2 ON table1.column_name = table2.column_name;
Page 130 of 19 | P a g e
Demo Database
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
The following SQL statement selects all orders with customer information:
Example
Page 131 of 19 | P a g e
Note: The INNER JOIN keyword selects all rows from both tables as long as there is a match between the
columns. If there are records in the "Orders" table that do not have matches in "Customers", these orders
will not be shown!
The following SQL statement selects all orders with customer and shipper information:
Example
The LEFT JOIN keyword returns all records from the left table (table1), and the matched records from the
right table (table2). The result is NULL from the right side, if there is no match.
SELECT column_name(s)
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;
Demo Database
Page 132 of 19 | P a g e
Emparedados y helados Constitución 2222 D.F.
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
The following SQL statement will select all customers, and any orders they might have:
Example
Note: The LEFT JOIN keyword returns all records from the left table (Customers), even if there are no
matches in the right table (Orders).
The RIGHT JOIN keyword returns all records from the right table (table2), and the matched records from
the left table (table1). The result is NULL from the left side, when there is no match.
SELECT column_name(s)
FROM table1
RIGHT JOIN table2 ON table1.column_name = table2.column_name;
Note: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.
Page 133 of 19 | P a g e
Demo Database
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
The following SQL statement will return all employees, and any orders they might have placed:
Example
Page 134 of 19 | P a g e
SQL FULL OUTER JOIN Keyword
The FULL OUTER JOIN keyword return all records when there is a match in either left (table1) or right
(table2) table records.
Note: FULL OUTER JOIN can potentially return very large result-sets!
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;
Demo Database
Ana Trujillo
Avda. de la México
2 Emparedados y Ana Trujillo 05021 Mexico
Constitución 2222 D.F.
helados
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
Page 135 of 19 | P a g e
10310 77 8 1996-09-20 2
The following SQL statement selects all customers, and all orders:
CustomerName OrderID
Alfreds Futterkiste
10382
10351
Note: The FULL OUTER JOIN keyword returns all the rows from the left table (Customers), and all the
rows from the right table (Orders). If there are rows in "Customers" that do not have matches in "Orders",
or if there are rows in "Orders" that do not have matches in "Customers", those rows will be listed as well.
The UNION operator is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
UNION Syntax
The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL:
Page 136 of 19 | P a g e
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
Note: The column names in the result-set are usually equal to the column names in the first SELECT
statement in the UNION.
Demo Database
Charlotte
1 Exotic Liquid 49 Gilbert St. London EC1 4SD UK
Cooper
The following SQL statement returns the cities (only distinct values) from both the "Customers" and the
"Suppliers" table:
Example
Page 137 of 19 | P a g e
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;
Note: If some customers or suppliers have the same city, each city will only be listed once, because
UNION selects only distinct values. Use UNION ALL to also select duplicate values!
The following SQL statement returns the cities (duplicate values also) from both the "Customers" and the
"Suppliers" table:
Example
The following SQL statement returns the German cities (only distinct values) from both the "Customers"
and the "Suppliers" table:
Example
The following SQL statement returns the German cities (duplicate values also) from both the "Customers"
and the "Suppliers" table:
Example
Example
A stored procedure is a prepared SQL code that you can save, so the code can be reused over and over
again.
So if you have an SQL query that you write over and over again, save it as a stored procedure, and then just
call it to execute it.
You can also pass parameters to a stored procedure, so that the stored procedure can act based on the
parameter value(s) that is passed.
EXEC procedure_name;
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
The following SQL statement creates a stored procedure named "SelectAllCustomers" that selects all
records from the "Customers" table:
Example
Example
EXEC SelectAllCustomers;
Stored Procedure with One Parameter
The following SQL statement creates a stored procedure that selects Customers from a particular City from
the "Customers" table:
Example
Example
Setting up multiple parameters is very easy. Just list each parameter and the data type separated by a
comma as shown below.
The following SQL statement creates a stored procedure that selects Customers from a particular City with
a particular PostalCode from the "Customers" table:
Example
Page 140 of 19 | P a g e
CREATE PROCEDURE SelectAllCustomers @City nvarchar(30), @PostalCode nvarchar(10)
AS
SELECT * FROM Customers WHERE City = @City AND PostalCode = @PostalCode
GO;
Example
A database view is a virtual table or logical table which is defined as a SQL SELECT query with joins.
Because a database view is similar to a database table, which consists of rows and columns, so you can
query data against it. Most database management systems, including MySQL, allow you to update data in
the underlying tables through the database view with some prerequisites.
A database view is dynamic because it is not related to the physical schema. The database system stores
views as a SQL SELECT statement with joins. When the data of the tables changes, the view reflects that
changes as well.
The MAX () function returns the largest value of the selected column.
MIN() Syntax
SELECT MIN(column_name)
FROM table_name
WHERE condition;
MAX() Syntax
SELECT MAX(column_name)
FROM table_name
WHERE condition;
Page 141 of 19 | P a g e
COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;
SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;
DATABASE VIEW
A database view is a virtual table or logical table which is defined as a SQL SELECT query with joins. Because a
database view is similar to a database table, which consists of rows and columns, so you can query data against it.
Most database management systems, including MySQL, allow you to update data in the underlying tables through
the database view with some prerequisites.
A database view allows you to simplify complex queries: Through a database view, you only have
to use simple SQL statements instead of complex ones with many joins.
A database view helps limit data access to specific users. You can use a database view to expose
only non-sensitive data to a specific group of users.
A database view provides extra security layer. The database view offers additional protection for a
database management system. The database view allows you to create the read-only view to expose
read-only data to specific users. Users can only retrieve data in read-only view but cannot update it.
A database view enables computed columns. A database table should not have calculated columns
however a database view should. When you query data from the database view, the data of the
computed column is calculated on the fly.
A database view enables backward compatibility. Suppose you have a central database, which many
applications are using it. One day, you decide to redesign the database to adapt to the new business
requirements. You remove some tables and create new tables, and you don’t want the changes to
affect other applications. In this scenario, you can create database views with the same schema as
the legacy tables that you will remove.
CREATING VIEW
Let's use a simple example to illustrate. Say we have the following table:
Table Customer
We want to create a view called V_Customer that contains only the First_Name, Last_Name, and Country
columns from this table, we would type in,
EXAMPLE 2:
SQL > CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual table. Following is an
example for the same.
+----------+-----+
| name | age |
+----------+-----+
| Ramesh | 32 |
| Khilan | 25 |
Page 143 of 19 | P a g e
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+
States of Transactions
Page 144 of 19 | P a g e
Active − In this state, the transaction is being executed. This is the initial state of every transaction.
Partially Committed − When a transaction executes its final operation, it is said to be in a partially
committed state.
Failed − A transaction is said to be in a failed state if any of the checks made by the database
recovery system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery
manager rolls back all its write operations on the database to bring the database back to its original
state where it was prior to the execution of the transaction. Transactions in this state are called
aborted.
The database recovery module can select one of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
Committed − If a transaction executes all its operations successfully, it is said to be committed. All
its effects are now permanently established on the database system.
Page 145 of 19 | P a g e
ACID Properties
A transaction is a very small unit of a program and it may contain several low level
tasks. A transaction in a database system must
maintain Atomicity, Consistency, Isolation, and Durability − commonly known as
ACID properties − in order to ensure accuracy, completeness, and data integrity.
Atomicity − this property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either
before the execution of the transaction or after the execution/abortion/failure of the
transaction.
Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be
updated once the system springs back into action.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
Concurrency control is the activity of co- ordinating concurrent accesses to a data- base in a
multiuser database management system (DBMS). Concurrency control permits users to access
a database in a multi- programmed fashion while preserving the illusion that each user is executing alone
on a dedicated system.
In a database management system (DBMS), concurrency control manages simultaneous access to a
database.
It prevents two users from editing the same record at the same time and also serializes transactions for
backup and recovery.
Page 146 of 19 | P a g e
Advantages of concurrency
The good is to serve many users and provides better throughput by sharing resources.
We have concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions.
Concurrency control protocols can be broadly divided into two categories −
Database systems equipped with lock-based protocols use a mechanism by which any transaction cannot
read or write data until it acquires an appropriate lock on it. Locks are of two kinds −
Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
Shared/exclusive − this type of locking mechanism differentiates the locks based on their uses. If a
lock is acquired on a data item to perform a write operation, it is an exclusive lock. Allowing more
than one transaction to write on the same data item would lead the database into an inconsistent
state. Read locks are shared because no data value is being changed.
There are four types of lock protocols available −
Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write'
operation is performed. Trasactions may unlock the data item after completing the ‘write’ operation.
Pre-claiming Lock Protocol
Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks.
Before initiating an execution, the transaction requests the system for all the locks it need to execute to
completion. If all the locks are granted, the transaction executes and releases all the locks when all its
operations are over. If all the locks are not granted, the transaction rolls back and waits until all the locks
are granted.
Page 147 of 19 | P a g e
Two-Phase Locking 2PL
This locking protocol divides the execution phase of a transaction into three parts.
In the first part, when the transaction starts executing, it seeks permission for the locks it requires.
The second part is where the transaction acquires all the locks. As soon as the transaction releases its first
lock, the third phase starts.
In third phase, the transaction cannot demand any new locks; it only releases the acquired locks.
Two-phase locking has two phases, one is growing, where all the locks are being acquired by the
transaction; and the second phase is shrinking, where the locks held by the transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it
to an exclusive lock.
Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the transaction
continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock after using it.
Strict-2PL holds all the locks until the commit point and releases all the locks at a time.
Page 148 of 19 | P a g e
Strict-2PL does not have cascading abort as 2PL does.
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses either
system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the time of
execution, whereas timestamp-based protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age of the
transaction. A transaction created at 0002 clock time would be older than all other transactions that come
after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and the
priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system know when
the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and
write operations. This is the responsibility of the protocol system that the conflicting pair of tasks should
be executed according to the timestamp values of the transactions.
This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and T i is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.
Data recovery is the process of restoring data that has been lost, accidentally deleted, corrupted or made
inaccessible. In enterprise IT, data recovery typically refers to the restoration of data to a desktop, laptop,
server or external storage system from a backup.
Database failure
Classification of failure:
To see wherever the matter has occurred, we tend to generalize a failure into numerous classes, as follows:
Transaction failure
System crash
Disk failure
Page 150 of 19 | P a g e
Types of Failure
1. Transaction failure: A transaction needs to abort once it fails to execute or once it reaches to any
further extent from wherever it can’t go to any extent further. This is often known as transaction
failure wherever solely many transactions or processes are hurt. The reasons for transaction failure
are:
Logical errors
System errors
1. Logical errors: Where a transaction cannot complete as a result of its code error or an internal error
condition.
2. System errors: Wherever the information system itself terminates an energetic transaction as a
result of the DBMS isn’t able to execute it, or it’s to prevent due to some system condition. to
Illustrate, just in case of situation or resource inconvenience, the system aborts an active
transaction.
3. System crash: There are issues − external to the system − that will cause the system to prevent
abruptly and cause the system to crash. For instance, interruptions in power supply might cause the
failure of underlying hardware or software package failure. Examples might include OS errors.
4. Disk Failure-In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability to the
disk, disk head crash or any other failure, which destroys all or a part of disk storage.
Storage Structure
In brief, the storage structure can be divided into two categories −
Volatile storage − As the name suggests, a volatile storage cannot survive system crashes. Volatile
storage devices are placed very close to the CPU; normally they are embedded onto the chipset
itself. For example, main memory and cache memory are examples of volatile storage. They are fast
but can store only a small amount of information.
Page 151 of 19 | P a g e
Non-volatile storage − These memories are made to survive system crashes. They are huge in data
storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic tapes,
flash memory, and non-volatile (battery backed up) RAM.
Recovery Techniques:
1. Salvation program: Run after a crash to attempt to restore the system to a valid state. No recovery
data used. Used when all other techniques fail or were not used. Good for cases where buffers were
lost in a crash and one wants to reconstruct what was lost...
2. Incremental dumping: Modified files copied to archive after job completed or at intervals.
3. Audit trail: Sequences of actions on files are recorded. Optimal for "backing out" of transactions.
(Ideal if trail is written out before changes).
4. Differential files: Separate file is maintained to keep track of changes, periodically merged with the
main file.
5. Backup/current version: Present files form the current version of the database. Files containing
previous values form a consistent backup version.
6. Multiple copies: Multiple active copies of each file are maintained during normal operation of the
database. In cases of failure, comparison between the versions can be used to find a consistent
version.
7. Careful replacement: Nothing is updated in place, with the original only being deleted after
operation is complete.
Virus Infection- Depending on the type of virus, it could have the ability to steal, corrupt, modify
and even delete the complete database.
Natural Disasters- Natural disasters like earthquake or tsunami have the ability to destroy the entire
infrastructure. In such an event, there is absolutely no way to even find, let alone recover, the data.
Disgruntled Employees- A disgruntled employee could provide essential and confidential
information to outsiders, causing untold damage to an organization. And if the employee has access
or gains unauthorized access to systems or applications, he/she can inject a virus or delete data to
halt the company’s day to day operations.
Excessive privileges. When workers are granted default database privileges that exceed the
requirements of their job functions, these privileges can be abused, Gerhart said. “For example, a
bank employee whose job requires the ability to change only account holder contact information
may take advantage of excessive database privileges and increase the account balance of a
colleague’s savings account.” Further, some companies fail to update access privileges for
employees who change roles within an organization or leave altogether.
Legitimate privilege abuse. Users may abuse legitimate database privileges for unauthorized
purposes, Gerhart said.
Database injection attacks. The two major types of database injection attacks are SQL injections
that target traditional database systems and NoSQL injections that target “big data” platforms.
Malware. A perennial threat, malware is used to steal sensitive data via legitimate users using
infected devices.
Page 152 of 19 | P a g e
Exploitation of vulnerable databases. It generally takes organizations months to patch databases,
during which time they remain vulnerable. Attackers know how to exploit unpatched databases or
databases that still have default accounts and configuration parameters.
*Unmanaged sensitive data. Many companies struggle to maintain an accurate inventory of their
databases and the critical data objects contained within them. “Forgotten databases may contain
sensitive information, and new databases can emerge without visibility to the security team.
Sensitive data in these databases will be exposed to threats if the required controls and permissions
are not implemented,” he said.
The human factor. The root cause for 30 percent of data breach incidents is human negligence,
according to the Ponemon Institute Cost of Data Breach Study. “Often this is due to the lack of
expertise required to implement security controls, enforce policies or conduct incident response
processes.
Privilege abuse: When database users are provided with privileges that exceeds their day-to-day
job requirement, these privileges may be abused intentionally or unintentionally.
Operating System vulnerabilities: Vulnerabilities in underlying operating systems like Windows,
UNIX, Linux, etc., and the services that are related to the databases could lead to unauthorized
access. This may lead to a Denial of Service (DoS) attack. This could be prevented by updating the
operating system related security patches as and when they become available.
Weak authentication: Weak authentication models allow attackers to employ strategies such as
social engineering and brute force to obtain database login credentials and assume the identity of
legitimate database users.
Weak audit trails: A weak audit logging mechanism in a database server represents a critical risk
to an organization especially in retail, financial, healthcare, and other industries with stringent
regulatory compliance. Regulations such as PCI, SOX, and HIPAA demand extensive logging of
actions to reproduce an event at a later point of time in case of an incident. Logging of sensitive or
unusual transactions happening in a database must be done in an automated manner for resolving
incidents. Audit trails act as the last line of database defense. Audit trails can detect the existence of
a violation that could help trace back the violation to a particular point of time and a particular user.
Concepts in database management hardly fall in the category of come-and-go, as the cost of shifting
between technical approaches overwhelms producers, managers, and designers. However, there are several
trends in database management, and knowing how to take advantage of them will benefit your
organization. Following are the some of the current trends:
1. Databases that bridge SQL/NoSQL- The latest trends in database products are those that don’t simply
embrace a single database structure. Instead, the databases bridge SQL and NoSQL, giving users the
best capabilities offered by both. This includes products that allow users to access a NoSQL database in
the same way as a relational database, for example.
2. Databases in the cloud/Platform as a Service- As developers continue pushing their enterprises to the
cloud, organizations are carefully weighing the trade-offs associated with public versus private.
Developers are also determining how to combine cloud services with existing applications and
Page 153 of 19 | P a g e
infrastructure. Providers of cloud service offer many options to database administrators. Making the
move towards the cloud doesn’t mean changing organizational priorities, but finding products and
services that help your group meet its goals.
3. Automated management- Automating database management is another emerging trend. The set of
such techniques and tools intend to simplify maintenance, patching, provisioning, updates and upgrades
— even project workflow. However, the trend may have limited usefulness since database management
frequently needs human intervention.
4. An increased focus on security- While not exactly a trend given the constant focus on data security,
recent ongoing retail database breaches among US-based organizations show with ample clarity the
importance for database administrators to work hand-in-hand with their IT security colleagues to ensure
all enterprise data remains safe. Any organization that stores data is vulnerable.
Database administrators must also work with the security team to eliminate potential internal
weaknesses that could make data vulnerable. These could include issues related to network privileges,
even hardware or software misconfigurations that could be misused, resulting in data leaks.
5. In-memory databases- Within the data warehousing community there are similar questions about
columnar versus row-based relational tables; the rise of in-memory databases, the use of flash or solid-
state disks (which also applies within transaction processing), clustered versus no-clustered solutions
and so on.
6. Big Data- To be clear, big data does not necessarily mean lots of data. What it really refers to is the
ability to process any type of data: what is typically referred to as semi-structured and unstructured data
as well as structured data. Current thinking is that these will typically live alongside conventional
solutions as separate technologies, at least in large organizations, but this will not always be the case.
7. Decentralized data management- Although there are benefits to decentralized data management, it
presents challenges as well. How will the data be distributed? What’s the best decentralization method?
What’s the proper degree of decentralization? A major challenge in designing and managing a
distributed database results from the inherent lack of centralized knowledge of the entire database.
1. Data Security Problems- For organizations to stay ahead of third-party threats, companies must start
by assessing their own security strategy. They should enact a multi-layered defense strategy that
covers their entire enterprise — all endpoints, all mobile devices, all applications and all data.
Following this assessment, companies should evaluate the technology, compliance procedures and
security standards that their partner network has in place.
2. Managing the Data Overload- Enterprises need to think about data traffic patterns in their
organizations, Vincent said, and recognize when the traffic no longer flows through a central point
(whether public cloud or private cloud) and ready their corporate networks for a whole new traffic
flow as part of their digital transformation. To help de-saturate the enterprise of data, enterprises
need to think about storage and moving inactive data from active enterprise applications to data
warehouses, or the cloud. Generally, any workload that can process entities as a single object can be
a candidate for object storage. This includes archival and retrieval of database backups or storage of
Page 154 of 19 | P a g e
unstructured data, such as images, video and text documents, Ray Johnson, chief data scientist at
Chicago-based consultancy SPR said.
3. Insufficient understanding and acceptance of big data- Oftentimes, companies fail to know even the
basics: what big data actually is, what its benefits are, what infrastructure is needed, etc. Without a
clear understanding, a big data adoption project risks to be doomed to failure. Companies may
waste lots of time and resources on things they don’t even know how to use.
4. And if employees don’t understand big data’s benefits and/or don’t want to change the existing
processes for the sake of its adoption, they can resist it and impede the company’s progress.
5. Big data, being a huge change for a company, should be accepted by top management first and then
down the ladder. To ensure big data understanding and acceptance at all levels, IT departments need
to organize numerous trainings and workshops. To see to big data acceptance even more, the
implementation and use of the new big data solution need to be monitored and controlled. However,
top management should not overdo with control because it may have an adverse effect.
10.0 References
1. Database System Concept by Abraham Silberschatz and S Sudarshan.
2. Principles of Database Systems by J D Ullman.
Page 155 of 19 | P a g e