Distributed database systems

Distributed database systems (DDBS) are systems that have their data distributed and replicated over several locations; unlike the centralized data base system (CDBS), where one copy of the data is stored. Data may be replicated over a network using horizontal and vertical fragmentation similar to projection and selection operations in Structured Query Language (SQL).Both types of database share the same problems of access control and transaction management, such as user concurrent access control and deadlock detection and resolution.  On the other hand, however, DDBS must also cope with different problems.
            Access control and transaction management in DDBS require different rules to monitor data retrieval and update to distributed and replicated databases. Oracle, as a leading Database Management Systems (DBMS) employs the two-phase commit technique to maintain a consistent state for the databases.  The objective of this paper is to explain transaction management in DDBMS and how Oracle implements this technique.

Advantages of Distributed DBS
Since organizations tend to be geographically dispersed, a DDBS fits the organizational structure better than traditional centralized DBS. Each location will have its local data as well as the ability to get needed data from other locations via a communication network. Moreover, the failure of one of the servers at one site won’t render the distributed database system inaccessible. The affected site will be the only one directly involved with that failed server.
If any data is required from a site exhibiting a failure, such data may be retrieved from other locations containing the replicated data. The performance of the system will improve, since several machines take care of distributing the load of the CPU and the I/O. Also, the expansion of the distributed system is relatively easy, since adding a new location doesn’t affect the existing ones.
 Disadvantages of Distributed DBS :
On the other hand, DDBS has several disadvantages.
A distributed system usually exhibits more complexity and cost more than a centralized one. This is true because the hardware and software involved need to maintain a reliable and an efficient system. All the replication and data retrieval from all sites should be transparent to the user. The cost of maintaining the system is considerable since technicians and experts are required at every site.
Another main disadvantage  of distributed database systems is the issue of security.Handling security across several locations is more complicated. In addition, the communication between sites may be
tapped to.

Failures in Distributed DBS :
Several types of failures may occur in distributed database systems:
Transaction Failures:When a transaction fails, it aborts. Thereby, the database must be restored to the state it was in before the transaction started.
Transactions may fail for several reasons. Some failures may be due to deadlock situations or concurrency control algorithms. Site Failures: Site failures are usually due to software or hardware failures. These failures result in the loss of the main memory contents. In distributeddatabase, site failures are of two types:
1).  Total Failure where all the sites of a distributed system fail,
2).  Partial Failure where only some of the sites of a distributed system fail.

Media Failures: Such failures refer to the failure of secondary storage devices. The failure itself may be due to head crashes, or controller failure. In these cases, the media failures result in the inaccessibility of part or the entire database stored on such secondary storage.

Communication Failures: Communication failures, as the name implies, are failures in the communication system between two or more sites. This will lead to network partitioning where each site, or several sites grouped together, operates independently. As such, messages from one site won’t reach the other sites and will therefore be lost. The reliability protocols then utilize a timeout mechanism in order to detect undelivered messages. A message is undelivered if the sender doesn’t receive an acknowledgment. The failure of a communication network to deliver messages is known as performance failure.

A distributed DBMS, a view can be derived from distributed relations, and the access to a view requires the execution of the distributed query corresponding to the view definition.
An important issue in a distributed DBMS is to make view materialization efficient.

View Management in Distributed DBMS
Definition of views in DDBMS is similar as in centralized DBMS
        However, a view in a DDBMS may be derived from fragmented relations stored at different sites Views are conceptually the same as the base relations; therefore we store them in the (possibly) distributed directory/catalogue
        Thus, views might be centralized at one site, partially replicated, fully replicated.
        Queries on views are translated into queries on base relations, yielding distributed queries due to possible fragmentation of data. 

Views derived from distributed relations may be costly to evaluate. Since in a given organization it is likely that many users access the same views, some proposals have been made to optimize view derivation. View derivation is done by merging the view qualification with the query qualification. 
An alternative solution proposed is to avoid view derivation by maintaining actual version of the views, called snapshots. A snapshot represent a particular state of the database and is therefore static, meaning that it does not reflect update to base
relation.Snapshots are useful when users are not particularly interested in seeing the most recent version of the database. They are managed as temporary relations in the sense that they do not have access methods other than sequential scanning.
Query expressed on a snapshot will not exploit indices available on the base relation from which it is derived. Access through snapshots seems more adequate for queries that have bad selectivity and scan the entire snapshot.
In this case a snapshot behaves more like a predefined answer to a query. It is necessary to recalculate snapshots periodically. Snapshots derived by selection andprojection, only the difference needs to be calculated.