Showing posts with label DBMS. Show all posts
Showing posts with label DBMS. Show all posts

Distributed database systems

Distributed database systems (DDBS) are systems that have their data distributed and replicated over several locations; unlike the centralized data base system (CDBS), where one copy of the data is stored. Data may be replicated over a network using horizontal and vertical fragmentation similar to projection and selection operations in Structured Query Language (SQL).Both types of database share the same problems of access control and transaction management, such as user concurrent access control and deadlock detection and resolution.  On the other hand, however, DDBS must also cope with different problems.
            Access control and transaction management in DDBS require different rules to monitor data retrieval and update to distributed and replicated databases. Oracle, as a leading Database Management Systems (DBMS) employs the two-phase commit technique to maintain a consistent state for the databases.  The objective of this paper is to explain transaction management in DDBMS and how Oracle implements this technique.

Advantages of Distributed DBS
Since organizations tend to be geographically dispersed, a DDBS fits the organizational structure better than traditional centralized DBS. Each location will have its local data as well as the ability to get needed data from other locations via a communication network. Moreover, the failure of one of the servers at one site won’t render the distributed database system inaccessible. The affected site will be the only one directly involved with that failed server.
If any data is required from a site exhibiting a failure, such data may be retrieved from other locations containing the replicated data. The performance of the system will improve, since several machines take care of distributing the load of the CPU and the I/O. Also, the expansion of the distributed system is relatively easy, since adding a new location doesn’t affect the existing ones.
 Disadvantages of Distributed DBS :
On the other hand, DDBS has several disadvantages.
A distributed system usually exhibits more complexity and cost more than a centralized one. This is true because the hardware and software involved need to maintain a reliable and an efficient system. All the replication and data retrieval from all sites should be transparent to the user. The cost of maintaining the system is considerable since technicians and experts are required at every site.
Another main disadvantage  of distributed database systems is the issue of security.Handling security across several locations is more complicated. In addition, the communication between sites may be
tapped to.

Failures in Distributed DBS :
Several types of failures may occur in distributed database systems:
Transaction Failures:When a transaction fails, it aborts. Thereby, the database must be restored to the state it was in before the transaction started.
Transactions may fail for several reasons. Some failures may be due to deadlock situations or concurrency control algorithms. Site Failures: Site failures are usually due to software or hardware failures. These failures result in the loss of the main memory contents. In distributeddatabase, site failures are of two types:
1).  Total Failure where all the sites of a distributed system fail,
2).  Partial Failure where only some of the sites of a distributed system fail.

Media Failures: Such failures refer to the failure of secondary storage devices. The failure itself may be due to head crashes, or controller failure. In these cases, the media failures result in the inaccessibility of part or the entire database stored on such secondary storage.

Communication Failures: Communication failures, as the name implies, are failures in the communication system between two or more sites. This will lead to network partitioning where each site, or several sites grouped together, operates independently. As such, messages from one site won’t reach the other sites and will therefore be lost. The reliability protocols then utilize a timeout mechanism in order to detect undelivered messages. A message is undelivered if the sender doesn’t receive an acknowledgment. The failure of a communication network to deliver messages is known as performance failure.

A distributed DBMS, a view can be derived from distributed relations, and the access to a view requires the execution of the distributed query corresponding to the view definition.
An important issue in a distributed DBMS is to make view materialization efficient.

View Management in Distributed DBMS
Definition of views in DDBMS is similar as in centralized DBMS
        However, a view in a DDBMS may be derived from fragmented relations stored at different sites Views are conceptually the same as the base relations; therefore we store them in the (possibly) distributed directory/catalogue
        Thus, views might be centralized at one site, partially replicated, fully replicated.
        Queries on views are translated into queries on base relations, yielding distributed queries due to possible fragmentation of data. 

Views derived from distributed relations may be costly to evaluate. Since in a given organization it is likely that many users access the same views, some proposals have been made to optimize view derivation. View derivation is done by merging the view qualification with the query qualification. 
An alternative solution proposed is to avoid view derivation by maintaining actual version of the views, called snapshots. A snapshot represent a particular state of the database and is therefore static, meaning that it does not reflect update to base
relation.Snapshots are useful when users are not particularly interested in seeing the most recent version of the database. They are managed as temporary relations in the sense that they do not have access methods other than sequential scanning.
Query expressed on a snapshot will not exploit indices available on the base relation from which it is derived. Access through snapshots seems more adequate for queries that have bad selectivity and scan the entire snapshot.
In this case a snapshot behaves more like a predefined answer to a query. It is necessary to recalculate snapshots periodically. Snapshots derived by selection andprojection, only the difference needs to be calculated. 


Database link in DBMS

Use the CREATE DATABASE LINK statement to create a database link. A database link is a schema object in one database that enables you to access objects on another database. The other database need not be an Oracle Database system. However, to access non-Oracle systems you must use Oracle Heterogeneous Services.
Once you have created a database link, you can use it to refer to tables and views on the other database. In SQL statements, you can refer to a table or view on the other database by appending @dblink to the table or view name. You can query a table or view on the other database with the SELECT statement. You can also access remote tables and views using any INSERT, UPDATE, DELETE, or LOCK TABLE statement.

To create a private database link, you must have the CREATE DATABASE LINK system privilege. To create a public database link, you must have the CREATE PUBLIC DATABASE LINK system privilege. Also, you must have the CREATE SESSION system privilege on the remote Oracle database.

Oracle Net must be installed on both the local and remote Oracle databases.

Syntax:

CREATE DATABASE LINK link_name
   CONNECT TO user_id IDENTIFIED BY pass_word
   USING 'query_string';

 

Types of Database links

 Private database link
This link is more secure than a public or global link, because only the owner of the private link, or subprograms within the same schema, can use the link to access the remote database.
Syntax:
sql> CREATE DATABASE LINK link_name CONNECT TO user_id IDENTIFIED BY pass_word USING 'query_string';
 Public database link
 When many users require an access path to a remote Oracle Database, you can create a single public database link for all users in a database.
Syntax: SQL> CREATE PUBLIC DATABASE LINK publink
        2  USING 'orcl1';
 Global database link
 When an Oracle network uses a directory server, an administrator can conveniently manage global database links for all databases in the system. Database link management is centralized and simple.


Normalization in DBMS

Normalization is the process of taking data from a problem and reducing it to a set of relations while ensuring data integrity and eliminating data redundancy
  • Data integrity - all of the data in the database are consistent, and satisfy all integrity constraints.
  • Data redundancy – if data in the database can be found in two different locations (direct redundancy) or if data can be calculated from other data items (indirect redundancy) then the data is said to contain redundancy.
Normalization Avoids
  • Duplication of Data  – The same data is listed in multiple lines of the database
  • Insert Anomaly  – A record about an entity cannot be inserted into the table without first inserting information about another entity – Cannot enter a customer without a sales order
  • Delete Anomaly – A record cannot be deleted without deleting a record about a related entity.  Cannot delete a sales order without deleting all of the customer’s information.
  • Update Anomaly – Cannot update information without changing information in many places.  To update customer information, it must be updated for each sales order the customer has placed
For the Normalization process E.F. Codd (1972) developed one technique.
It often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form.

Functional Dependencies
Ø  Full Dependency
o   In a relation, the attribute(s) B is fully functional dependent on A if B is functionally dependent on A,
o   but not on any proper subset of A.

Ø  Partial Dependency
o   If there is some attribute that can be removed from A and the dependency still holds.
o   Eg. Staff_No, Sname -> Branch_No

Ø  Transitive Dependency
o   In a relation, if attribute(s) A->B and B->C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C)
o   Eg. Staff_No->Branch_No and Branch_No->BAddress



The Process of Normalization


Ø  Normalization: Formal technique for analyzing a relation based on its primary key and the functional dependencies between the attributes of that relation.Ø  Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

Ø  As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

Here in below fig. there all normal form & relationship between them is shown.
Unnormalized Normal Form(UNF)

·       A table that contains one or more repeating groups.
To create an unnormalized table Transform the data from the information source (e.g. form) into table format with columns 
First Normal Form
1NF - A relation in which the intersection of each row and column contains one and only one value.

Converting from UNF to 1NF:

Ø  Select attribute(s) to act as the key.
Ø  Identify the repeating group(s) in the unnormalized table which repeats for the key attribute(s).
Ø  Remove the repeating group by
       Entering data into empty columns of rows which contain the repeating data.

Ø  Or by
       Placing the repeating data along with a copy of the original key attribute(s) into a separate relation.
 Second Normal Form

Based on the concept of full functional dependency.
A 2NF relation is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.

Converting from 1NF to 2NF:
Ø  Identify the primary key for the 1NF relation.
Ø  Identify the functional dependencies in the relation.
Ø  If partial dependencies exist on the primary key remove them by placing then in a new relation along with a copy of their determinant.
 Third Normal Form (3NF)

Based on the concept of transitive dependency.
A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.

Converting from 2NF to 3NF:
Ø  Identify the primary key in the 2NF relation.
Ø  Identify functional dependencies in the relation.
Ø  If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant.
Fourth Normal Form

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form(BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a super-key—that is, X is either a candidate keyor a superset thereof.

Multivalued dependencies

If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A multivalued dependency X Y signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we will find that xc is associated with the same y entries regardless of z.
A trivial multivalued dependencyX Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the relation.
Fifth normal form (5NF)

Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.
A join dependency *{A, B, … Z} on R is implied by thecandidate key(s) of Rif and only if each of A, B, …, Z is a super-key for R.
  

Cartesion producat :how to use query with examples



what is Cartesian Product? How to use it in query ?. give example of any table.
SQL Joins are used to relate information in different tables. A Join condition is a part of the sql query that retrieves rows from two or more tables. A SQL Join condition is used in the SQL WHERE Clause of select, update, delete statements.
The Syntax for joining two tables is:
SELECT col1, col2, col3...
FROM table_name1, table_name2;

If a sql join condition is omitted or if it is invalid the join operation will result in a Cartesian product. The Cartesian product returns a number of rows equal to the product of all rows in all the tables being joined. For example, if the first table has 20 rows and the second table has 10 rows, the result will be 20 * 10, or 200 rows. This query takes a long time to execute.

EX: there is two table EMP & PAY , each has 4 record when we performed Cartesian product then it give 16 record in output.
Ex:  there is two table EMP & PAY , each has 4 record when we join this two table & get two column Title & Ename