Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

UNIT I

RELATIONAL DATABASES

Purpose of Database System – Views of data – Data Models – Database System


Architecture –Introduction to relational databases – Relational Model – Keys –
Relational Algebra – SQL fundamentals – Advanced SQL features – Embedded
SQL– Dynamic SQL

Data: Known facts that can be recorded that have implicit meaning.
E.g. Student roll no, names, address etc
Database: collection of inter-related data organized meaningfully for a specific purpose.

DBMS: DBMS is a collection of interrelated data and a set of program to access those data.
The primary goal of a DBMS is to provide a way to store and retrieve database information
that is both convenient and efficient.

Database System: Database and DBMS collectively known as database system.


Database Applications


Banking: all transactions 


Airlines: reservations, schedules 


Universities: registration, grades 


Sales: customers, products, purchases 
 

Online retailers: order tracking, customized recommendations


Manufacturing: production, inventory, orders, supply chain 


Human resources: employee records, salaries, tax deductions 


Credit card transactions 

Telecommunications & Finance 

PURPOSE OF DATABASE SYSTEMS


Drawbacks of Conventional File Processing System
i. Data redundancy and inconsistency

Since the files and application programs are created by different programmers
over a long period of time, the files have different formats and the programs
may be written in several programming language.  The same piece of
information may be duplicated in several files.

For Example: The address and phone number of particular customer may
appear in a file that consists of personal information and in saving account
records file also. This redundancy leads to data
consistency that is, the various
copies of the same data may no longer agree.

For example: a changed customer address may bereflected in personal
information file, but not in saving account records file.
ii. Difficulty in accessing data

Conventional file processing environments do not allow needed data to be

retrieved in a convenient and efficient manner.

For Example: Suppose that bank officer needs to find out the names of all
customers who live within the city‘s 411027 zip code. The bank officer has
now two choices: Either get the list of customers and extract the needed
information manually, or ask the data processing department to have a system
programmer write the necessary application program. Both alternatives are
unsatisfactory.
iii. Data isolation

Since, data is scattered in various files, and files may be in different formats,
 it
is difficult to write new application programs to retrieve appropriate data.
iv. Concurrent access anomalies

In order to improve the overall performance of the system and obtain a faster
response time many systems allow multiple users to update the data
simultaneously. In such environment,
 interaction of concurrent updates may
results in inconsistent data.

For Example: Consider bank account A, with $500.If two customers with
draw funds (say $50 and $100 resp ) from account A at the same time, the
result of the concurrent executions $400, rather than $350. In order to guard
against this
 possibility, some form of supervision must be maintained in the
system.

v. Atomicity Problem
 
System failure will lead to atomicity problem.

System failure will lead to atomicity problem. For Example: Failure during
transfer of fund from system A to A.
It will be debited from A but not credited
to B leading to wrong transaction.
vi. Concurrent Access Anomalies

In order to improve the overall performance of the system and obtain a faster
response time many systems allow multiple users to update the data
simultaneously. In such environment,
 interaction of concurrent updates may
result in inconsistent data.

For Example: Consider bank account A, containing $500. If two customers
withdraw funds say $50 and $100 respectively) from account A at about the
same time, the result of the concurrent executions may leave the account in an
incorrect (or inconsistent) state. Balance will be $400 instead of $350. To
protect against
 this possibility, the system must maintain some form of
supervision.

vii. Security problems

Not every user of the database system should be  able to access all the data.
System should be protected using proper security.

For Example: In a banking system, pay roll personnel should be only given
authority to see the part of the database that has information about the various
bank employees.
 They do not need access to information about customer
accounts.

Since application programs added to the system in an ad-hoc manner, it is

difficult to enforce such security constraints.
viii. Integrity problems

The data values
 stored in the database must satisfy certain types of consistency
constrains.

For Example: The balance of a bank account may never fall below a
prescribed amount (say $100).These constraints are enforced in the system by
adding appropriate code in the various application programs.

Advantages of Database

Data base is a way to consolidate and control the operational data centrally. It is a better
way to control the operational data. The advantages of having a centralized control of data
are:

i. Redundancy can be reduced


In non-database systems, each application or department has its own private
files resulting in considerable amount of redundancy of the stored data. Thus storage
space is wasted. By having a centralized database most of this can be avoided.
ii. Inconsistency can be avoided
When the same data is duplicated and changes are made at one side, which is not
propagated to the other site, it gives rise to inconsistency. Then the two entries
regarding the same data will not agree. So, if the redundancy is removed, chances of
having inconsistent data are also removed.
iii. The data can be shared
The data stored from one application, can be used for another application.
Thus, the data of database stored for one application can be shared with new
applications.

iv. Standards can be enforced


With central control of the database, the DBA can ensure that all applicable
standards are observed in the representation of the data.
v. Security can be enforced
DBA can define the access paths for accessing the data stored in database and
he can define authorization checks whenever access to sensitive data is attempted.
vi. Integrity can be maintained
Integrity means that the data in the database is accurate. Centralized control of
the data helps in permitting the administrator to define integrity constraints to the data
in the database.

Disadvantages of DBMS
1. Complexity : The provision of the functionality that is expected of a good DBMS makes
the DBMS an extremely complex piece of software. Database designers, developers, database
administrators and end-users must understand this functionality to take full advantage of it.
Failure to understand the system can lead to bad design decisions, which can have serious
consequences for an organization.

2. Size : The complexity and breadth of functionality makes the DBMS an extremely large
piece of software, occupying many megabytes of disk space and requiring substantial
amounts of memory to run efficiently.

3. Performance: Typically, a File Based system is written for a specific application, such as
invoicing. As result, performance is generally very good. However, the DBMS is written to
be more general, to cater for many applications rather than just one. The effect is that some
applications may not run as fast as they used to.

4. Higher impact of a failure: The centralization of resources increases the vulnerability of


the system. Since all users and applications rely on the ~vailabi1ity of the DBMS, the failure
of any component can bring operations to a halt.

5. Cost of DBMS: The cost of DBMS varies significantly, depending on the environment
and functionality provided. There is also the recurrent annual maintenance cost.

6. Additional Hardware costs: The disk storage requirements for the DBMS and the
database may necessitate the purchase of additional storage space. Furthermore, to achieve
the required performance it may be necessary to purchase a larger machine, perhaps even a
machine dedicated to running the DBMS. The procurement of additional hardware results in
further expenditure.

7. Cost of Conversion: In some situations, the cost of the DBMS and extra hardware may
be insignificant compared with the cost of converting existing applications to run on the new
DBMS and hardware. This cost also includes the cost of training staff to use these new
systems and possibly the employment of specialist staff to help with conversion and running
of the system. This cost is one of the main reasons why some organizations feel tied to their
current systems and cannot switch to modern database technology.
VIEW OF DATA

A major purpose of a database system is to provide users with an abstract view of the
data. That is, the system hides certain details of how the data are stored and maintained.
Data abstraction
The Complexity is hidden from the users through several level of abstraction. There
are three levels of data abstraction:
i. Physical level: It is the lowest level of abstraction that describes how the data are
actually stored. The physical level describes complex low-level data structures in
details.
ii. Logical level: It is the next higher level of abstraction that describes what data are
stored in the database and what relationships exist among those data.
iii. View level: It is the highest level of abstraction that describes only part of the entire
database.

Data Independence
The ability to modify a scheme definition in one level without affecting a scheme definition in the
next higher level is called data independence. There are two levels of data independence:

1. Physical data independence is the ability to modify the physical scheme without causing application
programs to be rewritten. Modifications at the physical level are occasionally necessary in order to improve
performance.

2. Logical data independence is the ability to modify the conceptual scheme without causing application
programs to be rewritten. Modifications at the conceptual level are necessary whenever the logical structure
of the database is altered.
Logical data independence is more difficult to achieve than physical data independence since
application programs are heavily dependent on the logical structure of the data they access.
Instances and schemas
Database change over times as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema.

Types of database schemas

i. Physical schema: It describes the database design at the physical level.


ii. Logical schema: It describes the database design at the physical level.
iii. Subschema: A database may also have several subschemas at the view level called as subschemas
that describe different views of the database.
DATA MODELS
 
Underlying structure of the database is called as data models.

It is a collection of conceptual
 tools for describing data, data relationships, data semantics, and
consistency constraints.
 
It is a way to describe the design of the database at physical, logical and view level.

Different types of data models are:


 Entity relationship model
 Relational model
 Hierarchical model
 Network model
 Object Based model
 Object Relational model
 Semi Structured Data model
Entity relationship model

 It is based on a collection of real world things or objects called entities and the relationship among
these objects.
 The Entity relationship model is widely used in database design.
 It uses basic shapes like rectangle , ellipse, diamond, line etc,.

Relational Model
 The relational model uses a collection of tables to represent both data and the relationship among
those data.
6
 Each table has multiple columns and each column has a unique name.
 Software such as Oracle, Microsoft SQL Server and Sybase are based on the relational model.
E.g. Record Based model. It is based on fixed format records of several types.


Hierarchical Model

Hierarchical database organize data in to a tree data structure such that each record type has only one
owner 

Hierarchical structures were widely used in the first main frame database management systems.

Links are possible vertically but not horizontally or diagonally. 
 Relationships:
one-to-one
one-to-many

 

Advantages
 High speed of access to large datasets.

 Ease of updates.

 Simplicity: the design of a hierarchical database is simple.

 Data security: Hierarchical model was the first database model that offered the data security that is
provided and enforced by the DBMS.

 Efficiency: The hierarchical database model is a very efficient one when the database contains a large
number of transactions, using data whose relationships are fixed.
Disadvantages
 Implementation complexity

 Database management problems


 Lack of structural independence

Network Model
 The model is based on directed graph theory.

7
 The network model replaces the hierarchical tree with a graph thus allowing more general connections
among the nodes.
 The main difference of the network model from the hierarchical model is its ability to handle many-to-
many (n: n) relationship or in other words, it allows a record to have more than one parent.
 Relationships:
one-to-one
one-to-many
many-to-one
many-to-many
 Example is, an employee working for two departments.

Sample network model

Advantages:
 Conceptual simplicity
 Capability to handle more relationship types:
 Data independence:

Disadvantages:
 Detailed structural knowledge is required.
 Lack of structural independence.

Object-Based Data model


 The object- oriented model is an extension of E-R model.
 The object- oriented model is based on a collection of objects.
 An object contains values stored in instance variables within the object.
 An object also contains bodies of code that operate in the object these bodies of code are called methods.
 Objects that contain the same types of values and methods are grouped together into classes.

Advantages:
 Applications require less code.

 Applications use more natural data model.

 Code is easier to maintain.


 It provides higher performance management of objects and complex interrelationships between objects.

 Object-oriented features improve productivity. Data access is easy.

8
Object Relational Model


Object-relational data model combines the  feature of modern object-oriented programming
 languages with relational database features.

 universal server, Oracle
Some of the object-relational systems available in the market are IBM DB2
Corporation‘s oracle 8, Microsoft Corporations SQL server 7 and so on.

Semi Structured Data Model

 

This data model allows the individual data items of same type to have different sets of attributes.
 

Other data model allows a particular type of data item to have same set of attributes.

Extensible Markup Language (XML) is used to represent structured data. 

Database System Architecture


The architecture of a database system is greatly influenced by the underlying computer system on
which the database is running:
i. Centralized.
ii. Client-server.
iii. Parallel (multi-processor).
iv. Distributed
COMPONENTS OF DBMS ARE BROADLY CLASSIFIED AS FOLLOWS:

1. Query Processor Components :

• DML Pre-compiler : It translates DML statements in a query language into low level instructions that
query evaluation engine understands. It also attempts to transform user's request into an equivalent but more
efficient form.
• Embedded DML Pre-compiler : It converts DML statements embedded in an application program to
normal procedure calls in the host language. The Pre-compiler must interact with the DML compiler to
generate the appropriate code.
• DDL Interpreter : It interprets the DDL statements and records them in a set of tables containing meta
data or data dictionary.
• Query Evaluation Engine : It executes low-level instructions generated by the DML compiler.

2. Storage Manager Components :

They provide the interface between the low-level data stored in the database and application programs and
queries submitted to the system.
• Authorization and Integrity Manager : It tests for the satisfaction of integrity constraints checks the
authority of users to access data.
• Transaction Manager : It ensures that the database remains in a consistent state despite the system

9
failures and that concurrent transaction execution proceeds without conflicting.
• File Manager : It manages the allocation of space on disk storage and the data structures used to represent
information stored on disk.
• Buffer Manager : It is responsible for fetching data from disk storage into main memory and deciding
what data to cache in memory.

3. Data Structures :

Following data structures are required as a part of the physical system implementation.
• Data Files : It stores the database.
• Data Dictionary : It stores meta data (data about data) about the structure of the database.
• Indices : Provide fast access to data items that hold particular values.
• Statistical Data : It stores statistical information about the data in the database. This information is used
by query processor to select efficient ways to execute query.

Database Administrators
A person who has such central control over the system is called a database administrator (DBA).
The functions of a DBA include:

• Schema definition. The DBA creates the original database schema by executing a set of data definition
statements in the DDL.
10
• Storage structure and access-method definition.

• Schema and physical-organization modification The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization.
• Granting of authorization for data access By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access.
Authorization information is kept in a special system structure that the database system consults whenever
someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator‘s routine maintenance activities are:
1. periodically backing up the database
2. Ensuring that enough free disk space
3. Monitoring jobs running on the database and ensuring that performance is not degraded by very
expensive tasks submitted by some users.
4. Ensuring that performance is not degraded by very expensive tasks submitted by some users.

DATABASE USERS
There are four types of database users, differentiated by the way they interact with the system.
1. Naive users

Naive users interact with the system by invoking one of the application programs that have
been written previously.

Naive users are typical users of form interface, where the user can fill in appropriate fields of
the form.

Naive users may also simply read reports generated from the database.

2. Application Programmers

Application programmers are computer professionals who write application programs.

Rapid application development (RAD) tools enable the application programmer to construct
forms and reports without writing a program.

Special types of programming languages that combine control structures with data
manipulation language. These languages, sometimes called fourth-generation languages.
3. Sophisticated users

Sophisticated users interact with the system without writing programs. Instead, they
form their requests in a database query language.

They submit each such query to a query processor that the storage manager understands.

Online analytical processing (OLAP) tools simplify analysis and data mining tools specify
certain kinds of patterns in data.
11
4. Specialized users

Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework.

The applications are computer-aided design systems, knowledge base and expert systems,
systems that store data with complex data types
Introduction to relational databases

What is RDBMS?

RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all
modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

A Relational database management system (RDBMS) is a database management system (DBMS) that is
based on the relational model as introduced by E. F. Codd.

What is table?

The data in RDBMS is stored in database objects called tables. The table is a collection of related data
entries and it consists of columns and rows.

Remember, a table is the most common and simplest form of data storage in a relational database. Following
is the example of a CUSTOMERS table:

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

What is field?

Every table is broken up into smaller entities called fields. The fields in the CUSTOMERS table consist of
ID, NAME, AGE, ADDRESS and SALARY.

A field is a column in a table that is designed to maintain specific information about every record in the
table.

What is record or row?

A record, also called a row of data, is each individual entry that exists in a table. For example there are 7
records in the above CUSTOMERS table. Following is a single row of data or record in the CUSTOMERS
table:

+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

12
+----+----------+-----+-----------+----------+

A record is a horizontal entity in a table.

What is column?

A column is a vertical entity in a table that contains all information associated with a specific field in a table.

For example, a column in the CUSTOMERS table is ADDRESS, which represents location description and
would consist of the following:

+-----------+
| ADDRESS |
+-----------+
| Ahmedabad |
| Delhi |
| Kota |
| Mumbai |
| Bhopal |
| MP |
| Indore |
+----+------+

What is NULL value?

A NULL value in a table is a value in a field that appears to be blank, which means a field with a NULL
value is a field with no value.

It is very important to understand that a NULL value is different than a zero value or a field that contains
spaces. A field with a NULL value is one that has been left blank during record creation.

CODD’S RULE:

Dr Edgar F. Codd did some extensive research in Relational Model of database systems and came up with
twelve rules of his own which according to him, a database must obey in order to be a true relational
database.

These rules can be applied on a database system that is capable of managing is stored data using only its
relational capabilities. This is a foundation rule, which provides a base to imply other rules on it.

Rule 1: Information rule

This rule states that all information (data), which is stored in the database, must be a value of some table
cell. Everything in a database must be stored in table formats. This information can be user data or meta-
data.

Rule 2: Guaranteed Access rule

This rule states that every single data element (value) is guaranteed to be accessible logically with
combination of table-name, primary-key (row value) and attribute-name (column value). No other means,
such as pointers, can be used to access data.

13
Rule 3: Systematic Treatment of NULL values

This rule states the NULL values in the database must be given a systematic treatment. As a NULL may
have several meanings, i.e. NULL can be interpreted as one the following: data is missing, data is not
known, data is not applicable etc.

Rule 4: Active online catalog

This rule states that the structure description of whole database must be stored in an online catalog, i.e. data
dictionary, which can be accessed by the authorized users. Users can use the same query language to access
the catalog which they use to access the database itself.

Rule 5: Comprehensive data sub-language rule

This rule states that a database must have a support for a language which has linear syntax which is capable
of data definition, data manipulation and transaction management operations. Database can be accessed by
means of this language only, either directly or by means of some application. If the database can be accessed
or manipulated in some way without any help of this language, it is then a violation.

Rule 6: View updating rule

This rule states that all views of database, which can theoretically be updated, must also be updatable by the
system.

Rule 7: High-level insert, update and delete rule

This rule states the database must employ support high-level insertion, updation and deletion. This must not
be limited to a single row that is, it must also support union, intersection and minus operations to yield sets
of data records.

Rule 8: Physical data independence

This rule states that the application should not have any concern about how the data is physically stored.
Also, any change in its physical structure must not have any impact on application.

Rule 9: Logical data independence

This rule states that the logical data must be independent of its user’s view (application). Any change in
logical data must not imply any change in the application using it. For example, if two tables are merged or
one is split into two different tables, there should be no impact the change on user application. This is one of
the most difficult rule to apply.

Rule 10: Integrity independence

14
This rule states that the database must be independent of the application using it. All its integrity constraints
can be independently modified without the need of any change in the application. This rule makes database
independent of the front-end application and its interface.

Rule 11: Distribution independence

This rule states that the end user must not be able to see that the data is distributed over various locations.
User must also see that data is located at one site only. This rule has been proven as a foundation of
distributed database systems.

Rule 12: Non-subversion rule

This rule states that if a system has an interface that provides access to low level records, this interface then
must not be able to subvert the system and bypass security and integrity constraints.

RELATIONAL MODEL
Relational data model is the primary data model, which is used widely around the world for data storage
and processing. This model is simple and it has all the properties and capabilities required to process data
with storage efficiency.

Concepts
Tables − In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation instance.
Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify the row in
the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are
called Relational Integrity Constraints.
There are three main integrity constraints −

15
 Key constraints
 Domain constraints
 Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely.
This minimal subset of attributes is called keyfor that relation. If there are more than one such minimal
subsets, these are called candidate keys.
Key constraints force that −
 in a relation with a key attribute, no two tuples can have identical values for key attributes.
 a key attribute can not have NULL values.
Key constraints are also referred to as Entity Constraints.

Domain Constraints
Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The
same constraints have been tried to employ on the attributes of a relation. Every attribute is bound to have a
specific range of values. For example, age cannot be less than zero and telephone numbers cannot contain a
digit outside 0-9.

Referential integrity Constraints


Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a
relation that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same
relation, then that key element must exist.

Keys

A key allows us to identify a set of attributes and thus distinguishes entities from each other.

Keys also help uniquely identify relationships, and thus distinguish relationships from each other.

Key Type Definition

Any attribute or combination of attributes that uniquely identifies a row in


Superkey the table.
Example: Roll_No attribute of the entity set ‗student‘ distinguishes one
student entity from another. Customer_name, Customer_id together is a
Super key
Minimal Superkey. A superkey that does not contain a subset of attributes
Candiate that is itself a superkey.
Key
Example: Student_name and Student_street,are sufficient to uniquely
identify one particular student.

Primary The candidate key selected to uniquely identify all rows. It should be
Key rarely changed and cannot contain null values.

Example: Roll_No is a primary key

16
An attribute (or combination of attributes) in one table that must either
Foreign match the primary key of another table or be null
Key
Example: Consider in the staff relation the branch_no attribute exists to
match staff to the branch office they work in. In the staff relation,
branch_no is foreign key.

Secondary An attribute or combination of attributes used to make data retrieval more


Key
efficient.

Relational Algebra
• Relational algebra is a procedural query language, which takes instances of relations as input
and yields instances of relations as output.
• It uses operators to perform queries.
• An operator can be either unary or binary.
• They accept relations as their input and yield relations as their output.
• Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.
• The Operations can be classified as
– Basic Operations  Select, project, union, rename, set difference & Cartesian product
– Additional Operations  set intersection, natural join, division & assignment
– Extended Operations  Aggregate operation and outer join

Select Operation
• It is used to select tuples from a relations
• Notation:  p(r)  p is called the selection predicate
• Defined as: p(r) = {t | t  r and p(t)}
• Each term is one of:
<attribute> op <attribute> or <constant> where op is one of: =, , >, . <. 
• Example of selection:

 branch-name=“Perryridge”(account)

17
Project Operation
• It is used to select certain columns from the relation.

• Notation:  A1, A2, …, Ak (r)


– where A1, A2 are attribute names and r is a relation name.
• The result is defined as the relation of k columns obtained by erasing the columns that are not listed
• Duplicate rows removed from result, since relations are sets
• E.g. To eliminate the branch-name attribute of account relation

account-number, balance (account)

Union Operation
• The Result of the union operation is denoted by r  s is a relation that includes all the tuples that are
either in r or in s or in both r and s
• Notation: r  s
• Defined as: r  s = {t | t  r or t  s}
• For r  s to be valid.
1. r, s must have the same number of attributes.
2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of
values as does the 2nd column of s)
• E.g. to find all customers with either an account or a loan

customer-name (depositor)  customer-name (borrower)

18
Set Difference Operation
• It is used to find tuples that are in one relation but are not in another relation.
• Notation r – s
• Defined as:
r – s = {t | t  r and t  s}
• Set differences must be taken between compatible relations.
– r and s must have the same attributes

Cartesian-Product Operation
• It is used to combine information from two relations.
• Notation r x s
• Defined as:
r x s = {t q | t  r and q  s}

19
Rename Operation
• Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
• Allows us to refer to a relation by more than one name.
Example:

 x (E) returns the expression E under the name X


Additional Operation
• Set Intersection
• Natural Join operation
• Division
Set-Intersection Operation
• Notation: r  s
• Defined as:
• r  s ={ t | t  r and t  s }

• Note: r  s = r - (r - s)

Join Operation
• The join operation is used way to combine information from two or more relations.
• Although a join can be defined as a cross-product followed by selections and projections.

20
Division Operation
• Suited to queries that include the phrase “for all”.
• Let r and s be relations on schemas R and S respectively where
– R = (A1, …, Am, B1, …, Bn)
– S = (B1, …, Bn)
The result of r  s is a relation on schema
r  s = { t | t   R-S(r)   u  s ( tu  r ) }

Extended Relational-Algebra-Operations
• Generalized Projection
• Aggregate Functions
• Outer Join
Generalized Projection
• Extends the projection operation by allowing arithmetic functions to be used in the projection list.

 F1, F2, …, Fn(E)


• E is any relational-algebra expression

21
• Each of F1, F2, …, Fn are arithmetic expressions involving constants and attributes in the schema of
E.
Custname Limit Creditbalance
aaa 2000 1750
bbb 1500 1500
ccc 1200 700

custname,(limit-creditbalance)as creditavailable (creditinfo)

Aggregate Functions and Operations


• Aggregation function takes a collection of values and returns a single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
• Aggregate operation in relational algebra

G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E)


– E is any relational-algebra expression
– G1, G2 …, Gn is a list of attributes on which to group
– Each Fi is an aggregate function
– Each Ai is an attribute name

Join operations
Join is a combination of a Cartesian product followed by a selection process. A Join operation pairs
two tuples from different relations, if and only if a given join condition is satisfied.

22
• Cross Join: append at the end of every row of Table A by every row of Table B.
Table A:

2 
3 
2  1 A I
2  2 B II
Table B:
2  4 D IV
1 A I 3  1 A I
2 B II 3  2 B II
4 D IV 3  4 D IV

INNER JOIN
EQUI JOIN:
 Equi join is the first type of Inner Join.
 It joins two or more tables where the specified columns are equal.
 In this type of join, you can only use '=' operator in comparing the columns.
 Operators like '>', '<' are not allowed in this type of join.

NATURAL JOIN
 It is same as equijoin but the difference is that in natural join, the common attribute appears
only once

• Outer join
– Left outer join
– Right outer join
– Full outer join
• Self-join
Outer join
• An extension of the join operation that avoids loss of information.
23
• Computes the join and then adds tuples form one relation that do not match tuples in the other
relation to the result of the join.
• Uses null values:
– null signifies that the value is unknown or does not exist

24
SQL FUNDAMENTALS
SQL:

SQL Overview
Structured Query Language
The standard for relational database management systems (RDBMS)
SQL-92 and SQL-99 Standards – Purpose:
Specify syntax/semantics for data definition and manipulation
Define data structures
Enable portability
Specify minimal (level 1) and complete (level 2) standards
Allow for later growth/enhancement to standard
Catalog
A set of schemas that constitute the description of a database
Schema
The structure that contains descriptions of objects created by a user (base tables, views,
constraints)

Advantages of SQL:

25
Data types in SQL:

String types
CHAR(n) – fixed-length character data, n characters long Maximum length = 2000 bytes
VARCHAR2(n) – variable length character data, maximum 4000 bytes
LONG – variable-length character data, up to 4GB. Maximum 1 per table
Numeric types
NUMBER(p,q) – general purpose numeric data type
INTEGER(p) – signed integer, p digits wide
FLOAT(p) – floating point in scientific notation with p binary digits precision
Date/time type
DATE – fixed-length date/time in dd-mm-yy form

26
SQL Statements

Data Definition Language (DDL):


A data definition language or data description language (DDL) is a syntax similar to a computer
programming language for defining data structures, especially database schemas. Many data description
languages use a declarative syntax to define fields and data types. SQL, however, uses a collection of
imperative verbs whose effect is to modify the schema of the database by adding, changing, or deleting
definitions of tables or other objects.
DDL COMMANDS

 Create
 Alter
 Add
 Modify
 Drop
 Rename
 Drop

CREATE
• USED FOR THE CREATION OF TABLE IN THE DATABASE
Syntax:
CREATE TABLE < tablename>
(<column 1> <data type><size>,
.........
<column n> <data type><size>);
Eg :
CREATE TABLE EMP1(EMPID NUMBER(10), EMPNAME VARCHAR(20),DOB DATE);

27
ALTER Table Statement
• Add a new column
• Modify an existing column
• Define a default for a column
• Drop a column
Syntax:
• ALTER TABLE < tablename > ADD ( columnname DataType size,……..);
• ALTER TABLE < tablename > MODIFY ( columnname DataType size,……..);
• ALTER TABLE < tablename > MODIFY ( columnname DataType size [DEFAULT
Exp],……..);
• ALTER TABLE < tablename > DROP ( columnname);
Eg:
• ALTER TABLE emp ADD( salary number(5));

Eg:
• ALTER TABLE emp MODIFY(salary number(10));
• ALTER TABLE emp ADD(salary number(5) DEFAULT 10000);
• ALTER TABLE emp DROP(salary );

DROP TABLE Statement


• To Delete all the data and structure in the table
Syntax:
DROP TABLE <tablename>;
Eg:
DROP TABLE EMP;
RENAME Statement
• Is used to rename a Table, view etc,.
Syntax:
28
RENAME old_Name TO new_Name;
Eg:
RENAME EMP TO EMP1;

TRUNCATE TABLE Statement


• Is used to remove all the rows from a table
Syntax:
TRUNCATE TABLE <tablename>;
Eg:
TRUNCATE TABLE Emp;

DML
 A data manipulation language (DML) is a family of syntax elements similar to a computer
programming language used for Selecting , inserting, deleting and updating data in a
database.
 A popular data manipulation language is that of Structured Query Language (SQL), which is
used to retrieve and manipulate data in a relational database.
 A data manipulation language (DML) is a family of computer languages including commands
permitting users to manipulate data in a database. This manipulation involves inserting data
into database tables, retrieving existing data, deleting data from existing tables and modifying
existing data.
DML Commands:

 Insert
 Select
 Update
 Delete

INSERT TABLE Statement


• Used to Add new row to a table
Syntax:
INSERT INTO <Tablename> (column,…..) VALUES (values,….);

29
Eg:
• INSERT INTO EMP(empid, firstname, lastname, hiredate, salary) VALUES ( 111 , ‘RAM’ ,
’KUMAR’ , SYSDATE,10000);
• INSERT INTO EMP(empid, firstname) VALUES ( 111 , ‘RAM’);
• INSERT INTO EMP VALUES (&empid, ’&firstname’,‘&lastname’, ‘&hiredate’,&salary);

UPDATE TABLE Statement


• Used to update rows in the table
Syntax:
Update <table name> set <column name>=’values’ where <condition>;

Eg:

 UPDATE EMP SET SALARY=10000


WHERE EMPID=111;

 UPDATE EMP SET SALARY=10000;

30
DELETE TABLE Statement

• Used to remove existing rows from a table

Syntax:

DELETE FROM <tablename>

[WHERE CONDITION];

Eg:

• DELETE FROM emp

WHERE empname=‘aaa’;

• DELETE FROM emp;

Delete Table statement


Single level:

Delete from <table name> where <column name>=’values’;

Multilevel:

Delete from <table name>;

delete from emp;

SELECT:

Single level:

Select <column name> from <table name>;

Multilevel:

Select * from <table name> where <condition>;

31
Data Control Language

 A data control language (DCL) is a syntax similar to a computer programming language used
to control access to data stored in a database
 It is used to create roles, permissions, and referential integrity as well it is used to control
access to database by securing it.
 These SQL commands are used for providing security to database objects.
 Examples of DCL commands include:
o GRANT to allow specified users to perform specified tasks.
o REVOKE to cancel previously granted or denied permissions.
 The operations for which privileges may be granted to or revoked from a user or role may
include CONNECT, SELECT, INSERT, UPDATE, DELETE, EXECUTE, and USAGE.
 In the Oracle database, executing a DCL command issues an implicit commit. Hence you
cannot roll back the command.
Grant
• Used to allow specified users to perform specified tasks.
Syn:
GRANT obj-privileges ON object TO user;

SQL> grant select on emp to user1;


Grant succeeded.

REVOKE
• Used to cancel previously granted or denied permissions.
Syn:
REVOKE obj-privileges ON object FROM user;

SQL> revoke select on emp from user1;


Revoke succeeded.
Transaction Control Language (TCL)
 A Transaction Control Language (TCL) is a computer language and a subset of SQL, used to control

transactional processing in a database.


 A series of one or more SQL statements that are logically related, or a series of operation performed on Oracle
table data is termed as a Transaction.
 Oracle treats changes to table data as a two step process. First the changes requested are done. To make these
changes permanent a COMMIT statement has to be given at the SQL prompt. A ROLLBACK statement given
at the SQL prompt can be used to undo a part of or the entire Transaction.
 Examples of TCL commands include:
COMMIT to apply the transaction by saving the database changes.

32
ROLLBACK to undo all changes of a transaction.

SAVEPOINT to divide the transaction into smaller sections. It defines breakpoints

for a transaction to allow partial rollbacks.

DESCRIBING THE TABLE:


Describe:
It is used to get the schema or structure of a table
Syntax
• Desc <table name>
Eg:
• Desc EMP;

DATA RETRIEVAL
• SELECT - Retrieve data from the database

• SQL>SELECT * FROM EMP;

33
INTEGRITY CONSTRAINTS
• An Integrity constraint (IC) are used to apply rules for the database tables.
• There are the domain integrity, the entity integrity, the referential integrity and the foreign key
integrity constraints.
Domain Integrity
Domain integrity means the definition of a valid set of values for an attribute.
– Domain - NOT NULL ,CHECK

Entity Integrity Constraint

The entity integrity constraint states that primary keys can't be null. There must be a proper value in the
primary key field.
– Entity – UNIQUE, PRIMARY KEY
Referential Integrity Constraint
– The referential integrity constraint is specified between two tables and it is used to maintain the
consistency among rows between the two tables.
– Referential – FOREIGN KEY

34
– Creation of table with Primary Key and Foreign Key
SQL>CREATE TABLE EMP(EMPID NUMBER(5)
PRIMARY KEY,EMPNAME VARCHAR2(15),
DOJ DATE,DEPT VARCHAR2(15));
SQL>CREATE TABLE EMP1(EMPID NUMBER(5)
REFERENCES EMP(EMPID),SAL NUMBER(10));

FUNCTION
• Numeric Functions
• Group Functions
• Character Functions
• Date Functions Etc,.

35
DATE FUNCTIONS
• SQL> select months_between('1-jan-09','1-dec-08') from dual;
MONTHS_BETWEEN('1-JAN-09','1-DEC-08')
-------------------------------------
1
• SQL> select sysdate from dual;
SYS
---------
01- JAN-12
• SQL> select CURRENT_TIMESTAMP from dual;
CURRENT_TIMESTAMP
-----------------------------------------
31-JAN-12 12.31.13.156000 AM +05:30
• SQL> select last_day('1-JAN-12') from dual;
LAST_DAY
---------
31- JAN-12
• SQL> select add_months('1-jan-09',2) from dual;
ADD_MONTHS('1-JAN-09',2)
-------------------------------------
1-MAR-09
• SQL> select to_char(sysdate,'fmdd-mm') from dual;
TO_CH
-----

36
10-8
Set operations
• UNION
• UNION ALL
• INTERSECT
• MINUS

SQL> select * from emp


2 union
3 select * from emp2;
EMPID EMPNAME SAL
---------- ---------- ----------
100 aaaa 30100
101 bbbb 35100
102 cccc 20000
103 dddd 35000
105 eeee 20000
SQL> select * from emp
2 minus
3 select * from emp2;
EMPID EMPNAME SAL
---------- ---------- ----------
100 aaaa 30100
102 cccc 20000
Logical operations
• SELECT * FROM EMP WHERE NOT (sal BETWEEN 25000 AND 35000)
• SELECT * FROM EMP WHERE empid=101 AND dno=20
• SELECT * FROM emp WHERE empid=101 OR deptno=20

View

37
• A view, which is a logical table based on one or more tables or views.
• A view contains no data itself.
• The tables upon which a view is based are called base tables.

JOINS
• An SQL join clause combines records from two or more tables in a database.
• Types of JOINs:
– INNER
• Equi-join
– Natural-join
– Cross-join
– OUTER
• Left outer-join
• Right outer-join
• Full outer-join
– SELF-JOIN.
Inner join
• Inner join creates a new result table by combining column values of two tables based upon the join-
predicate.

38
Natural join

Outer Join
• An extension of the join operation that avoids loss of information.
• Uses null values:
– null signifies that the value is unknown or does not exist
• Types
Left outer-join

Right outer-join

Full outer-join

39
Self-join

PL/SQL
• PL/SQL (Procedural Language/Structured Query Language).
• Is a combination of SQL along with the procedural features of programming languages.
• A PL/SQL Block consists of three sections:
– The Declaration section (optional).
– The Execution section (mandatory).
– The Exception Handling section (optional).

-
PL/SQL block to print the nos from 1 to 10 in reverse order.

PL/SQL block to find the sum of 10 number.


40
PL/SQL block to to display empid of an employee from a table
SQL> set serveroutput on;
SQL> declare
2 id emp.empid%type;
3 begin
4 select empid into id from emp where empname='aaaa';
5 dbms_output.put_line(id);
6 end;
7/
PL/SQL block to fetch rows from a table
Declare
t emp%rowtype;
Begin
For t in (select * from emp)
Loop
Dbms_output.put_line('Empid: '||t.empid||' Empname: ‘ ||t.empname||' Sal: '||t.sal);
End loop;
End;

Trigger
• A database trigger is procedural code that is automatically executed in response to certain events on
a particular table or view in a database.
Syntax:
create or replace trigger <triggername> after or before <update/delete/insert>
on <tablename> for each row

41
Types of triggers
• Row Triggers : This trigger gets executed once for each row of the result set caused by
insert/update/delete.
• Statement Triggers : This trigger gets executed only once for the entire result set, but fires each
time the statement is executed.
• Example for Row trigger

SQL>create or replace trigger trig123 after update or delete on emp for each row
declare
begin
if updating then
dbms_output.put_line('UPDATING EMP TABLE');
end if;
if deleting then
dbms_output.put_line('DELETING EMP TABLE');
end if;
end;
/
Trigger created.

Example for statement trigger


SQL>create or replace trigger trig123 after update or delete on emp
declare
begin
if updating then
dbms_output.put_line('UPDATING EMP TABLE');
end if;

42
if deleting then
dbms_output.put_line('DELETING EMP TABLE');
end if;
end;
/
Trigger created.

SQL> update emp set sal=10500 where empid=101;


UPDATING EMP TABLE
1 row updated.
SQL> delete from emp where empid=101;
DELETING EMP TABLE
1 row deleted.
REPORT
SET PAGESIZE 25
SET LINESIZE 80
TTITLE CENTER 'EMPLOYEE SALARY REPORT'
BTITLE 'END OF THE REPORT‘
COLUMN "NETPAY" FORMAT 9,9999.00
SELECT EMPID,ENAME,SAL,COMM,SAL+COMM "NETPAY" FROM emp;
TTITLE OFF
BTITLE OFF
CLEAR COLUMN

43
Procedure in PL/SQL
• A Procedure or in simple a proc is a named PL/SQL block which performs some specific task.
• A procedure has
– a header and
– a body.

Procedure to increase the salary of an employee in a table


SQL> create or replace procedure raise_salary(id integer, amt integer)is
2 current_sal integer;
3 sal_missing exception;
4 begin
5 select sal into current_sal from emp where empid=id;
6 if current_sal is null then
7 raise sal_missing;
8 else
9 update emp set sal=sal+amt where empid=id;
10 end if;
11 exception
12 when sal_missing then
13 insert into emp_audit values(id,'sal null');
14 end raise_salary;
15 /

SQL> select * from emp;


EMPID EMPNAME SAL
---------- ---------- ----------
100 aaaa 25000
101 bbbb 30000
102 cccc 20000

44
SQL> select * from emp;
EMPID EMPNAME SAL
---------- ---------- ----------
100 aaaa 30100
101 bbbb 30000
102 cccc 20000
Function in PL/SQL
• A function is a named PL/SQL Block which is similar to a procedure.
• The major difference between a procedure and a function is, a function must always return a
value, but a procedure may or may not return a value.

Function to calculate the total salary of the employees in a dept.

SQL> create or replace function dept_total(deptid number)


2 return number is allsal number;
3 begin
4 select sum(sal) into allsal from emp where dno=deptid;
5 return allsal;
6 end;
7 /
Function created.

SQL> select * from emp;


EMPID EMPNAME SAL DNO
---------- ---------- ---------- ----------
100 aaaa 25000 20
101 bbbb 30000 25
102 cccc 20000 20
103 dddd 35000 25

SQL> select dept_total(20) from dual;


DEPT_TOTAL(01)
-----------------
45000

SQL> SHOW ERRORS FUNCTION depttotal;


Is used to show the error in the function

Comparison bw Procedure & Function


• A function is a subprogram written to perform certain computations
• Functions must return a value , but for stored procedures this is not compulsory.
• Functions could be used in SELECT statements, provided they don’t do any data manipulation.
• However, procedures cannot be included in SELECT statements.

45
NESTTED QUERIES

Embedded SQL
Embedded SQL is a method of combining the computing power of a programming language and the
database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written inline
with the program source code of the host language. The embedded SQL statements are parsed by an
embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the
preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in
programs written in any number of languages such as: C/C++, COBOL and Fortran.

The SQL standards committee defined the embedded SQL standard in two steps: a formalism called
Module Language was defined, then the embedded SQL standard was derived from Module Language.

46
Embedded SQL is a robust and convenient method of combining the computing power of a
programming language with SQL's specialized data management and manipulation capabilities.

• Approach: Embed SQL in the host language.


– A preprocessor converts the SQL statements into special API calls.
– Then a regular compiler is used to compile the code.
• Language constructs:
– Connecting to a database:
EXEC SQL CONNECT
– Declaring variables:
EXEC SQL BEGIN (END) DECLARE SECTION
– Statements:
EXEC SQL Statement

• Specify the query in SQL and declare a cursor for it

EXEC SQL
declare c cursor for
select depositor.customer_name, customer_city
from depositor, customer, account
where depositor.customer_name = customer.customer_name
and depositor account_number = account.account_number
and account.balance > :amount
END_EXEC

• The open statement causes the query to be evaluated


EXEC SQL open c END_EXEC
• The fetch statement causes the values of one tuple in the query result to be placed on host language
variables.
EXEC SQL fetch c into :cn, :cc END_EXEC
Repeated calls to fetch get successive tuples in the query result
• A variable called SQLSTATE in the SQL communication area (SQLCA) gets set to ‘02000’ to
indicate no more data is available
• The close statement causes the database system to delete the temporary relation that holds the result
of the query.
EXEC SQL close c END_EXEC
Note: above details vary with language. For example, the Java embedding defines Java iterators to step
through result tuples.

47
Updates Through Cursors
 Can update tuples fetched by cursor by declaring that the cursor is for update
declare c cursor for
select *
from account
where branch_name = ‘Perryridge’
for update
 To update tuple at the current location of cursor c
update account
set balance = balance + 100
where current of c
Static Vs Dynamic SQL:
Static SQL
 The source form of a static SQL statement is embedded within an application program written in a
host language such as COBOL.
 The statement is prepared before the program is executed and the operational form of the statement
persists beyond the execution of the program.
 Static SQL statements in a source program must be processed before the program is compiled.
 This processing can be accomplished through the DB2 precompiler or the SQL statement
coprocessor.
 The DB2 precompiler or the coprocessor checks the syntax of the SQL statements, turns them into
host language comments, and generates host language statements to invoke DB2.
 The preparation of an SQL application program includes precompilation, the preparation of its static
SQL statements, and compilation of the modified source program.

Dynamic SQL:
 Programs that contain embedded dynamic SQL statements must be precompiled like those that
contain static SQL, but unlike static SQL, the dynamic statements are constructed and prepared at
run time.
 The source form of a dynamic statement is a character string that is passed to DB2 by the program
using the static SQL statement PREPARE or EXECUTE IMMEDIATE.
 Allows programs to construct and submit SQL queries at run time.
 Example of the use of dynamic SQL from within a C program.

char * sqlprog = “update account


set balance = balance * 1.05
where account_number = ?”
EXEC SQL prepare dynprog from :sqlprog;
48
char account [10] = “A-101”;
EXEC SQL execute dynprog using :account;
 The dynamic SQL program contains a ?, which is a place holder for a value that is provided when the
SQL program is executed.

49
50

You might also like