Distributed database
systems (DDBS) are systems that have their data distributed and replicated over
several locations; unlike the centralized data base system (CDBS), where one
copy of the data is stored. Data may be replicated over a network using horizontal
and vertical fragmentation similar to projection and selection operations in
Structured Query Language (SQL).Both types of database share the same problems
of access control and transaction management, such as user concurrent access
control and deadlock detection and resolution.
On the other hand, however, DDBS must also cope with different problems.
Access control and transaction management in DDBS require
different rules to monitor data retrieval and update to distributed and
replicated databases. Oracle, as a leading Database Management Systems (DBMS)
employs the two-phase commit technique to maintain a consistent state for the
databases. The objective of this paper
is to explain transaction management in DDBMS and how Oracle implements this
technique.
Advantages of
Distributed DBS
Since organizations
tend to be geographically dispersed, a DDBS fits the organizational structure
better than traditional centralized DBS. Each location will have its local data
as well as the ability to get needed data from other locations via a
communication network. Moreover, the failure of one of the servers at one site
won’t render the distributed database system inaccessible. The affected site
will be the only one directly involved with that failed server.
If any data is required
from a site exhibiting a failure, such data may be retrieved from other
locations containing the replicated data. The performance of the system will
improve, since several machines take care of distributing the load of the CPU
and the I/O. Also, the expansion of the distributed system is relatively easy,
since adding a new location doesn’t affect the existing ones.
Disadvantages of Distributed DBS :
On the other hand, DDBS
has several disadvantages.
A distributed system
usually exhibits more complexity and cost more than a centralized one. This is
true because the hardware and software involved need to maintain a reliable and
an efficient system. All the replication and data retrieval from all sites
should be transparent to the user. The cost of maintaining the system is
considerable since technicians and experts are required at every site.
Another main
disadvantage of distributed database
systems is the issue of security.Handling security across several locations is
more complicated. In addition, the communication between sites may be
tapped to.
Failures in Distributed
DBS :
Several types of
failures may occur in distributed database systems:
Transaction Failures:When
a transaction fails, it aborts. Thereby, the database must be restored to the
state it was in before the transaction started.
Transactions may fail
for several reasons. Some failures may be due to deadlock situations or
concurrency control algorithms. Site Failures: Site failures are usually due to
software or hardware failures. These failures result in the loss of the main
memory contents. In distributeddatabase, site failures are of two types:
1). Total Failure where all the sites of a
distributed system fail,
2). Partial Failure where only some of the sites
of a distributed system fail.
Media Failures:
Such failures refer to the failure of secondary storage devices. The failure
itself may be due to head crashes, or controller failure. In these cases, the
media failures result in the inaccessibility of part or the entire database
stored on such secondary storage.
Communication Failures:
Communication failures, as the name implies, are failures in the communication
system between two or more sites. This will lead to network partitioning where
each site, or several sites grouped together, operates independently. As such,
messages from one site won’t reach the other sites and will therefore be lost.
The reliability protocols then utilize a timeout mechanism in order to detect
undelivered messages. A message is undelivered if the sender doesn’t receive an
acknowledgment. The failure of a communication network to deliver messages is
known as performance failure.
A distributed DBMS, a
view can be derived from distributed relations, and the access to a view
requires the execution of the distributed query corresponding to the view
definition.
An important issue in a
distributed DBMS is to make view materialization efficient.
View Management in
Distributed DBMS
Definition of views in
DDBMS is similar as in centralized DBMS
•
However, a view in a DDBMS may be
derived from fragmented relations stored at different sites Views are
conceptually the same as the base relations; therefore we store them in the
(possibly) distributed directory/catalogue
•
Thus, views might be centralized at one
site, partially replicated, fully replicated.
•
Queries on views are translated into
queries on base relations, yielding distributed queries due to possible
fragmentation of data.
Views derived from distributed relations
may be costly to evaluate. Since in a given organization it is likely that many
users access the same views, some proposals have been made to optimize view
derivation. View derivation is done by merging the view qualification with the
query qualification.
An alternative solution proposed is to
avoid view derivation by maintaining actual version of the views, called
snapshots. A snapshot represent a particular state of the database and is
therefore static, meaning that it does not reflect update to base
relation.Snapshots are useful when users
are not particularly interested in seeing the most recent version of the
database. They are managed as temporary relations in the sense that they do not
have access methods other than sequential scanning.
Query expressed on a snapshot will not
exploit indices available on the base relation from which it is derived. Access
through snapshots seems more adequate for queries that have bad selectivity and
scan the entire snapshot.
In this case a snapshot behaves more
like a predefined answer to a query. It is necessary to recalculate snapshots
periodically. Snapshots derived by selection andprojection, only the difference
needs to be calculated.