Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 58

1) What is Database

The database is a collection of inter-related data which is used to retrieve, insert and delete the
data efficiently. It is also used to organize the data in the form of a table, schema, views, and
reports, etc.

2) Database Management System

o Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc. are a very popular commercial database which is used in
different applications.
o DBMS provides an interface to perform various operations like database creation, storing
data in it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.

DBMS allows users the following tasks:

o Data Definition: It is used for creation, modification, and removal of definition that
defines the organization of data in the database.
o Data Updating: It is used for the insertion, modification, and deletion of the actual data
in the database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data
integrity, enforcing data security, dealing with concurrency control, monitoring
performance and recovering information corrupted by unexpected failure.

3) Characteristics of DBMS?

o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements of the
user.

4) Advantages of DBMS?

o Controls database redundancy: It can control data redundancy because it stores all the
data in one single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data
among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the
database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of
data from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces.

5) Disadvantages of DBMS?

o Cost of Hardware and Software: It requires a high speed of data processor and large
memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of
the organization, all the data stored in a single database and if the database is damaged
due to electric failure or database corruption then the data may be lost forever.

6) Types of Databases?

There are various types of databases used for storing different varieties of data:

1) Centralized Database

It is the type of database that stores data at a centralized database system. It comforts the users
to access the stored data from different locations through several applications. These
applications contain the authentication process to let users access data securely. An example of
a Centralized database can be Central Library that carries a central database of each library in a
college/university.

Advantages of Centralized Database


o It has decreased the risk of data management, i.e., manipulation of data will not affect
the core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database


o The size of the centralized database is large, which increases the response time for
fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.

2) Distributed Database

Unlike a centralized database system, in distributed systems, data is distributed among different
database systems of an organization. These database systems are connected via communication
links. Such links help the end-users to access the data easily. Examples of the Distributed
database are Apache Cassandra, HBase, Ignite, etc.

We can further divide a distributed database system into:

o Homogeneous DDB: Those database systems which execute on the same operating


system and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures, and carries different hardware devices.
Advantages of Distributed Database
o Modular development is possible in a distributed database, i.e., the system can be
expanded by including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.

3) Relational Database

This database is based on the relational data model, which stores data in the form of rows
(tuple) and columns (attributes), and together forms a table (relation). A relational database uses
SQL for storing, manipulating, as well as maintaining the data. E.F. Codd invented the database
in 1970. Each table in the database carries a key that makes the data unique from
others. Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc.

Properties of Relational Database

There are following four commonly known properties of a relational model known as ACID
properties, where:

A means Atomicity: This ensures the data operation will complete either with success or with
failure. It follows the 'all or nothing' strategy. For example, a transaction will either be committed
or will abort.

C means Consistency: If we perform any operation over the data, its value before and after the
operation should be preserved. For example, the account balance before and after the
transaction should be correct, i.e., it should remain conserved.

I means Isolation: There can be concurrent users for accessing data at the same time from the
database. Thus, isolation between the data should remain isolated. For example, when multiple
transactions occur at the same time, one transaction effects should not be visible to the other
transactions in the database.

D means Durability: It ensures that once it completes the operation and commits the data, data
changes should remain permanent.

4) NoSQL Database

Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It
is not a relational database as it stores data not only in tabular form but in several different
ways. It came into existence when the demand for building modern applications increased. Thus,
NoSQL presented a wide variety of database technologies in response to the demands. We can
further divide a NoSQL database into the following four types:
a. Key-value storage: It is the simplest type of database storage where it stores every
single item as a key (or attribute name) holding its value, together.
b. Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model
format as used in the application code.
c. Graph Databases: It is used for storing vast amounts of data in a graph-like structure.
Most commonly, social networking websites use the graph database.
d. Wide-column stores: It is similar to the data represented in relational databases. Here,
data is stored in large columns together, instead of storing in rows.

Advantages of NoSQL Database


o It enables good productivity in the application development as it is not required to store
data in a structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.

5) Cloud Database

A type of database where data is stored in a virtual environment and executes over the cloud
computing platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS,
etc.) for accessing the database. There are numerous cloud platforms, but the best options are:

o Amazon Web Services(AWS)


o Microsoft Azure
o Google Cloud SQL, etc.

6) Object-oriented Databases

The type of database that uses the object-based data model approach for storing data in the
database system. The data is represented and stored as objects which are similar to the objects
used in the object-oriented programming language.

7) Hierarchical Databases

It is the type of database that stores data in the form of parent-children relationship nodes.
Here, it organizes data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child record in the tree
will contain only one parent. On the other hand, each parent record can have multiple child
records.

8) Network Databases

It is the database that typically follows the network data model. Here, the representation of data
is in the form of nodes connected via links between them. Unlike the hierarchical database, it
allows each record to have multiple children and parent nodes to form a generalized graph
structure.

9) Personal Database

Collecting and storing data on the user's system defines a Personal Database. This database is
basically designed for a single user.
Advantage of Personal Database
o It is simple and easy to handle.
o It occupies less storage space as it is small in size.

7) What is RDBMS?

RDBMS stands for Relational Database Management Systems..

All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL
and Microsoft Access are based on RDBMS.

It is called Relational Data Base Management System (RDBMS) because it is based on relational
model introduced by E.F. Codd.

8) How it works

Data is represented in terms of tuples (rows) in RDBMS.

Relational database is most commonly used database. It contains number of tables and each
table has its own primary key.

Due to a collection of organized set of tables, data can be accessed easily in RDBMS.

9)What is table?

The RDBMS database uses tables to store data. A table is a collection of related data entries and
contains rows and columns to store data.

A table is the simplest example of data storage in RDBMS.

10)What is field?

Field is a smaller entity of the table which contains specific information about every record in the
table.

11)What is row or record?

A row of a table is also called record. It contains the specific information of each individual entry
in the table. It is a horizontal entity in the table.
12)What is column?

A column is a vertical entity in the table which contains all information associated with a specific
field in a table. For example: "name" is a column in the above table which contains all
information about student's name.

13)Data Integrity

There are the following categories of data integrity exist with each RDBMS:

Entity integrity: It specifies that there should be no duplicate rows in a table.

Domain integrity: It enforces valid entries for a given column by restricting the type, the
format, or the range of values.

Referential integrity: It specifies that rows cannot be deleted, which are used by other records.

User-defined integrity: It enforces some specific business rules that are defined by users. These
rules are different from entity, domain or referential integrity.

No DBMS RDBMS
.

1) DBMS applications store data as file. RDBMS applications store data in a tabular form.

2) In DBMS, data is generally stored in In RDBMS, the tables have an identifier called primary key
either a hierarchical form or a and the data values are stored in the form of tables.
navigational form.

3) Normalization is not present in DBMS. Normalization is present in RDBMS.

4) DBMS does not apply any security with RDBMS defines the integrity constraint for the purpose of
regards to data manipulation. ACID (Atomicity, Consistency, Isolation and Durability)
property.

5) DBMS uses file system to store data, so in RDBMS, data values are stored in the form of tables, so
there will be no relation between the a relationship between these data values will be stored in
tables. the form of a table as well.

6) DBMS has to provide some uniform RDBMS system supports a tabular structure of the data and a
methods to access the stored relationship between them to access the stored information.
information.

7) DBMS does not support distributed RDBMS supports distributed database.


database.

8) DBMS is meant to be for small RDBMS is designed to handle large amount of data. it
organization and deal with small data. supports multiple users.
it supports single user.

9) Examples of DBMS are file Example of RDBMS are mysql , postgre, sql


systems, xml etc. server, oracle etc.

DBMS File System

DBMS is a collection of data. In DBMS, the File system is a collection of data. In this system,
user is not required to write the procedures. the user has to write the procedures for managing
the database.

DBMS gives an abstract view of data that File system provides the detail of the data
hides the details. representation and storage of data.
DBMS provides a crash recovery mechanism, File system doesn't have a crash mechanism, i.e., if
i.e., DBMS protects the user from the system the system crashes while entering some data, then
failure. the content of the file will lost.

DBMS provides a good protection It is very difficult to protect a file under the file
mechanism. system.

DBMS contains a wide variety of File system can't efficiently store and retrieve the
sophisticated techniques to store and retrieve data.
the data.

DBMS takes care of Concurrent access of In the File system, concurrent access has many
data using some form of locking. problems like redirecting the file while other
deleting some information or updating some
information.

14)DBMS Architecture

o The DBMS design depends upon its architecture. The basic client/server architecture is
used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get their
request done.

15) Types of DBMS Architecture


Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.

1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a
handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.

2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the server
side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and
transaction management.
o To communicate with the DBMS, client-side application establishes a connection with the
server side.
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application server.
The database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.

16) Three schema Architecture

o The three schema architecture is also called ANSI/SPARC architecture or three-level


architecture.
o This framework is used to describe the structure of a specific database system.
o The three schema architecture is also used to separate the user applications and physical
database.
o The three schema architecture contains three-levels. It breaks the database down into
three different categories.

The three-schema architecture is as follows:


In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between various database levels
of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the request from external
level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from the conceptual to
internal level.

1. Internal Level
o The internal level has an internal schema which describes the physical storage structure
of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored in a
block.
o The physical level is used to describe complex low-level data structures in detail.
2. Conceptual Level
o The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data structure
are hidden.
o Programmers and database administrators work at this level.

3. External Level
o At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is interested
and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.

17) Data Models

Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:
18) Data model Schema and Instance:

o The data which is stored in the database at a particular moment of time is called an
instance of the database.
o The overall design of a database is called schema.
o A database schema is the skeleton structure of the database. It represents the logical
view of the entire database.
o A schema contains schema objects like table, foreign key, primary key, views, columns,
data types, stored procedure, etc.
o A database schema can be represented by using the visual diagram. That diagram shows
the database objects and relationship with each other.
o A database schema is designed by the database designers to help programmers whose
software will interact with the database. The process of database creation is called data
modeling.

In the database, actual data changes quite frequently. For example, in the given figure, the
database changes whenever we add a new grade or add a student. The data at a particular
moment of time is called the instance of the database.
19) Data Independence

o Data independence can be explained using the three-schema architecture.


o Data independence refers characteristic of being able to modify the schema at one level
of the database system without altering the schema at the next higher level.

There are two types of data independence.

1. Logical Data Independence

o Logical data independence refers characteristic of being able to change the conceptual
schema without having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual
view.
o If we do any changes in the conceptual view of the data, then the user view of the data
would not be affected.
o Logical data independence occurs at the user interface level.
2. Physical Data Independence

o Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Physical data independence occurs at the logical interface level.

Fig: Data Independence


DBMS LANGUAGES
1) Atomicity: The term atomicity defines that the data remains atomic. It means if any
operation is performed on the data, either it should be performed or executed completely or
should not be executed at all. It further means that the operation should not break in between
or execute partially. In the case of executing operations on the transaction, the operation should
be completely executed and not partially.

Example: If Remo has account A having $30 in his account from which he wishes to send $10 to
Shear’s account, which is B. In account B, a sum of $ 100 is already present. When $10 will be
transferred to account B, the sum will become $110. Now, there will be two operations that will
take place. One is the amount of $10 that Remo wants to transfer will be debited from his
account A, and the same amount will get credited to account B, i.e., into Sheero's account. Now,
what happens - the first operation of debit executes successfully, but the credit operation,
however, fails. Thus, in Remo's account A, the value becomes $20, and to that of Sheero's
account, it remains $100 as it was previously present.

In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.

The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue,
and so the atomicity is the main focus in the bank systems.

2) Consistency: The word consistency means that the value should remain preserved always.


In DBMS, the integrity of the data should be maintained, which means if a change in the
database is made, it should remain preserved always. In the case of transactions, the integrity of
the data is very essential so that the database remains consistent before and after the
transaction. The data should always be correct.

Example:

In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one
by one to both B & C. There are two operations that take place, i.e., Debit and Credit. Account A
firstly debits $50 to account B, and the amount in account A is read $300 by B before the
transaction. After the successful transaction T, the available amount in B becomes $150. Now, A
debits $20 to account C, and that time, the value read by C is $250 (that is correct as a debit of
$50 has been successfully done to B). The debit and credit operation from account A to C has
been done successfully. We can see that the transaction is done successfully, and the value is
also read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which
means that data is inconsistent because when the debit operation executes, it will not be
consistent.
4) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of a
database where no data should affect the other one and may occur concurrently. In short, the
operation on one database should begin when the operation on the first database gets
complete. It means if two operations are being performed on two different databases, they may
not affect the value of one another. In the case of transactions, when two or more transactions
occur simultaneously, the consistency should remain maintained. Any changes that occur in any
particular transaction will not be seen by other transactions until the change is not committed in
the memory.

Example: If two operations are concurrently running on two different accounts, then the value
of both accounts should not get affected. The value should remain persistent. As you can see in
the below diagram, account A is making T1 and T2 transactions to account B and C, but both are
executing independently without affecting each other. It is known as Isolation.

4) Durability: Durability ensures the permanency of something. In DBMS, the term durability


ensures that the data after the successful execution of the operation becomes permanent in the
database. The durability of the data should be so perfect that even if the system fails or leads to
a crash, the database still survives. However, if gets lost, it becomes the responsibility of the
recovery manager for ensuring the durability of the database. For committing the values, the
COMMIT command must be used every time we make changes.
20) ER model:

o ER model stands for an Entity-Relationship model. It is a high-level data model. This


model is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy
to design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship diagram.

For example, suppose we design a school database. In this database, the student will be an
entity with attributes like address, name, id, age, etc. The address can be another entity with
attributes like city, street name, pin code, etc and there will be a relationship between them.
21) Component of ER Diagram

1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be


taken as an entity.
a. Weak Entity Java Program for Beginners

An entity that depends on another entity called a weak entity. The weak entity doesn't contain
any key attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an
attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a
primary key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an
ellipse.

c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like
Date of birth.

03. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.
Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one
to one relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.
c. Many-to-one relationship When more than one instance of the entity on the left, and only
one instance of an entity on the right associates with the relationship then it is known as a
many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

22) Keys

o Keys play an important role in the relational database.


o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.

For example: In Student table, ID is used as a key because it is unique for each student. In
PERSON table, passport_number, license_number, SSN are keys since they are unique for each
person.
Question no:- 15,16,19 IMP

23) Types of key:

1. Primary key
o It is the first key which is used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys as we saw in PERSON table. The key which
is most suitable from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary
key since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
o The remaining attributes except for primary key are considered as a candidate key. The
candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes
like SSN, Passport_Number, and License_Number, etc. are considered as a candidate key.

3. Super Key

Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset of a
candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the name of
two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.

1.7M
History of Java

The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key
o Foreign keys are the column of the table which is used to point to the primary key of
another table.
o In a company, every employee works in a specific department, and employee and
department are two different entities. So we can't store the information of the
department in the employee table. That's why we link these two tables through the
primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in
the EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.

24) Generalization

o Generalization is like a bottom-up approach in which two or more entities of lower level
combine to form a higher level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the
lower level to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only difference is the
approach. Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e., subclasses
are combined to make a superclass.

For example, Faculty and Student entities can be generalized and create a higher level entity
Person.

25) Specialization

o Specialization is a top-down approach, and it is opposite to Generalization. In


specialization, one higher level entity can be broken down into two lower level entities.
o Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes are
defined next, and relationship set are then added.

For example: In an Employee management system, EMPLOYEE entity can be specialized as


TESTER or DEVELOPER based on what role they play in the company.
26) Aggregation

In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a coaching
center then he will never enquiry about the Course only or just about the Center instead he will
ask the enquiry about both.

27) Relational Model concept

Relational model can represent as a table with columns and rows. Each row is known as a tuple.
Each table of the column has a name or attribute.
Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a
domain, dom(Ai)

Relational instance: In the relational database system, the relational instance is represented by
a finite set of tuples. Relation instances do not have duplicate tuples.

5.1M

Relational schema: A relational schema contains the name of the relation and name of all
columns or attributes.108HTML Tutorial

Relational key: In the relational key, each row has one or more attributes. It can identify the row
in the relation uniquely.

Example: STUDENT Relation

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24

Shyam 12839 9026288936 Delhi 35

Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40

o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>

Properties of Relations

o Name of the relation is distinct from all other relations.


o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence

28) Integrity Constraints

o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if
the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be
available in Table 2.
Example:

4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.

Example:

23322 29) Functional Dependency

The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.

1. X   →   Y  

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

5.6M
96
Hello Java Program for Beginners

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because
if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name   

We can say that Emp_Name is functionally dependent on Emp_Id.

30)Types of Functional dependency

1. Trivial functional dependency


o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.  
2. {Employee_id, Employee_Name}   →    Employee_Id is a trivial functional dependency as   
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.  
4. Also, Employee_Id → Employee_Id and Employee_Name   →    Employee_Name are trivial 
dependencies too.  

2. Non-trivial functional dependency


o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID   →    Name,  
2. Name   →    DOB  

31)Inference Rule (IR):

Java Program for Beginners

1. Reflexive Rule (IR1)

In the reflexive rule, if Y is a subset of X, then X determines Y.

1. If X ⊇ Y then X  →    Y  

Example:

1. X = {a, b, c, d, e}  
2. Y = {a, b, c}  

2. Augmentation Rule (IR2)

The augmentation is also called as a partial dependency. In augmentation, if X determines Y,


then XZ determines YZ for any Z.

1. If X    →  Y then XZ   →   YZ   

Example:

1. For R(ABCD),  if A   →   B then AC  →   BC  
3. Transitive Rule (IR3)

In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.

1. If X   →   Y and Y  →  Z then X  →   Z    

4. Union Rule (IR4)

Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.

1. If X    →  Y and X   →  Z then X  →    YZ     

5. Decomposition Rule (IR5)

Decomposition rule is also known as project rule. It is the reverse of union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.

1. If X   →   YZ then X   →   Y and X  →    Z   

6. Pseudo transitive Rule (IR6)

In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.

1. If X   →   Y and YZ   →   W then XZ   →   W   

32) Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.
Denormalization is the reverse process of normalization as it combines the tables which
have been normalized into a single table so that data retrieval becomes faster. JOIN
operation allows us to create a denormalized form of the data by reversing the
normalization. 

33)TRANSACTION:
34) TRANSACTION PROPERTIES:
35)States of Transaction

In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is being
executed.
o For example: Insertion or deletion or updating a record is done here. But all the records
are still not saved to the database.

Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is
still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed
in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. In


this state, all the effects are now permanently saved on the database system.

Failed state
o If any of the checks made by the database recovery system fails, then the transaction is
said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If not
then it will abort or roll back the transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction

36)Failure Classification

To find that where the problem has occurred, we generalize a failure into the following
categories:

1. Transaction failure
2. System crash
3. Disk failure

1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called as
transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an


internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction
because the database system is not able to execute it. For example, The system
aborts an active transaction, in case of deadlock or resource unavailability.

2. System Crash
1. System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not
to be corrupted.

3. Disk Failure
2. It occurs where hard-disk drives or storage drives used to fail frequently. It was a
common problem in the early days of technology evolution.
3. Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.

37)Checkpoint

o The checkpoint is a type of mechanism where all the previous logs are removed from the
system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the log
file is updated with the new step of transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the consistent
state, and all transactions were committed.

38)Deadlock in DBMS

A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complications in DBMS
as no task ever gets finished and is in waiting state forever.

For example: In the student table, transaction T1 holds a lock on some rows and needs to
update some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows
in the grade table and needs to update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and
remain at a standstill. It will remain in a standstill until the DBMS detects the deadlock and
aborts one of the transactions.

39)Deadlock Avoidance

o When a database is stuck in a deadlock state, then it is better to avoid the database
rather than aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A
method like "wait for graph" is used for detecting the deadlock situation but this method
is suitable only for the smaller database. For the larger database, deadlock prevention
method can be used.

40)Deadlock Detection

In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not. The lock manager maintains a
Wait for the graph to detect the deadlock cycle in the database.
Wait for Graph
o This is the suitable method for deadlock detection. In this method, a graph is created
based on the transaction and their lock. If the created graph has a cycle or closed loop,
then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting
for some data held by the others. The system keeps checking the graph if there is any
cycle in the graph.

The wait for a graph for the above scenario is shown below:

41) Deadlock Prevention

o Deadlock prevention method is suitable for a large database. If the resources are
allocated in such a way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
42) File Organization

o The File is a collection of records. Using the primary key, we can access the records. The
type and frequency of access can be determined by the type of file organization which
was used for a given set of records.
o File organization is a logical relationship among various records. This method defines
how file records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store
only one fixed length record in any given file. An alternative approach is to structure our
files so that we can contain multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length
records.

43) Objective of file organization

o It contains an optimal selection of records, i.e., records can be selected as fast as


possible.
o To perform insert, delete or update transaction on the records should be quick and easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.

44) Types of file organization:

File organization contains various methods. These particular methods have pros and cons on the
basis of access or selection. In the file organization, the programmer decides the best-suited file
organization method according to his requirement. Types of file organization are as follows:
45) Sequential File Organization

This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:

1. Pile File Method:

o It is a quite simple method. In this method, we store the record in a sequence, i.e., one
after another. Here, the record will be inserted in the order in which they are inserted
into tables.
o In case of updating or deleting of any record, the record will be searched in the memory
blocks. When it is found, then it will be marked for deleting, and the new record is
inserted.

Insertion of the new record:


Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records
are nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence,
then it will be placed at the end of the file. Here, records are nothing but a row in any table.

2. Sorted File Method:

o In this method, the new record is always inserted at the file's end, and then it will sort the
sequence in ascending or descending order. Sorting of records is based on any primary
key or any other key.
o In the case of modification of any record, it will update the record and then sort the file,
and lastly, the updated record is placed in the right place.

Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and
R7. Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the
end of the file, and then it will sort the sequence.
Pros of sequential file organization

o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic
tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation
of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization

o It will waste time as we cannot jump on a particular record that is required but we have
to move sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.
Heap file organization

o It is the simplest and most basic type of organization. It works with data blocks. In heap
file organization, the records are inserted at the file's end. When the records are inserted,
it doesn't require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new data
block need not to be the very next data block, but it can select any data block in the
memory to store new records. The heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It is
the DBMS responsibility to store and manage the new records.

Insertion of a new record

Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a
new record R2 in a heap. If the data block 3 is full then it will be inserted in any of the database
selected by the DBMS, let's say data block 1.
If we want to search, update or delete the data in heap file organization, then we need to
traverse the data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-
consuming because there is no sorting or ordering of records. In the heap file organization, we
need to check all the data until we get the requested record.

Pros of Heap file organization

o It is a very good method of file organization for bulk insertion. If there is a large number
of data which needs to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the
sequential record.

Cons of Heap file organization

o This method is inefficient for the large database because it takes time to search or
modify the record.
o This method is inefficient for large databases.

Hash File Organization


o Hash File Organization uses the computation of hash function on some fields of the
records. The hash function's output determines the location of disk block where the
records are to be placed.

o
o When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way, when a
new record has to be inserted, then the address is generated using the hash key and
record is directly inserted. The same process is applied in the case of delete and update.
o In this method, there is no effort for searching and sorting the entire file. In this method,
each record will be stored randomly in the memory.

46) B+ File Organization

o B+ tree file organization is the advanced method of an indexed sequential access


method. It uses a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the records.
For each primary key, the value of the index is generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two
children. In this method, all the records are stored only at the leaf node. Intermediate
nodes act as a pointer to the leaf nodes. They do not contain any records.
The above B+ tree shows that:

o There is one root node of the tree, i.e., 25.


o There is an intermediary layer with nodes. They do not store the actual record. They have
only pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to
the right contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and
accessed easily.

Pros of B+ tree file organization

o In this method, searching becomes very easy as all the records are stored only in the leaf
nodes and sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records can increase or
decrease and the B+ tree structure can also grow or shrink.
o It is a balanced tree structure, and any insert/update/delete does not affect the
performance of tree.

Cons of B+ tree file organization

o This method is inefficient for the static method.


47) RAID

RAID refers to redundancy array of the independent disk. It is a technology which is used to
connect multiple secondary storage devices for increased performance, data redundancy or
both. It gives you the ability to survive one or more drive failure depending upon the RAID level
used.

It consists of an array of disks in which multiple disks are connected to achieve different goals.

RAID technology

There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.

These levels contain the following characteristics:

C++ vs Java

o It contains a set of physical disk drives.


o In this technology, the operating system views these separate disks as a single logical
disk.
o In this technology, data is distributed across the physical drives of the array.
o Redundancy disk capacity is used to store parity information.
o In case of disk failure, the parity information can be helped to recover the data.

48) Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent execution.

49) Concurrent Execution in DBMS

o In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution
of the transaction operations, there occur several challenging problems that need to be
solved.

Referential Integrity?

Referential Integrity Rule in DBMS is based on Primary and Foreign Key. The Rule defines that
a foreign key have a matching primary key. Reference from a table to another table should be
valid.

What is Indexing?

Indexing is a data structure technique which allows you to quickly retrieve records from a
database file. An Index is a small table having only two columns. The first column comprises a
copy of the primary or candidate key of a table. Its second column contains a set of pointers for
holding the address of the disk block where that specific key value stored.

An index -

 Takes a search key as input


 Efficiently returns a collection of matching records.

Starvation in DBMS?
Starvation or Livelock is the situation when a transaction has to wait for a indefinite period of time to
acquire a lock.
Reasons of Starvation –
 If waiting scheme for locked items is unfair. ( priority queue )
 Victim selection. ( same transaction is selected as a victim repeatedly )
 Resource leak.
 Via denial-of-service attack.

You might also like