|
||||
|
||||
ABOUT
There
are potentially great needs for grid replication. For example, a data
grid is a grid computing system that deal with data, requires intensive
computation and analysis of shared large-scale databases, millions of
Gigabytes, across widely distributed scientific communities. Data
replication is one of most useful strategies to achieve high levels of
availability and fault tolerance as well as minimal access time in a
data grid. A Digital library is a collection of documents, services and
infrastructure in organized electronic form, available on the
distributed system. The usages of replication in a digital library can
be: replication of (collections of) digital documents -- the creation
of identical copies of digital documents in a separate server site;
replication of index data – the recreation of index data structures at
the separate site; replication of user interface data -- the
synchronized presentation of user interface data and events at several
user interfaces, etc. Replication has been studied quite a lot in relational database. Sybase' replication, among the early products, has been available since 1993, and nowadays, most of relational database, such as Oracle, DB2, MySQL, Microsoft SQL Server, provide their own replication solutions. The designs of replication in relational database mainly deal with high availability/disaster recover, data consolidation for central audition/analysis, data distribution for balancing access load, offline query, or security purposes, etc. By years of experiences and studying of real commercial requirements, their replication functionalities are quite flexible, and advanced in offering solution for complete/portion of data objects copying, synchronous/asynchronous copying, update everywhere, data confliction, mess deploy, etc. However, with the new demanding arose in the grids, the replication capabilities of their products seems limited in addressing scalability, dynamic changing, and fault tolerance. Various Grid replication projects have been undertaken in recent years, among them EDG replica manager, Globus data replication service, and SRB. These systems are mainly design for moving large scientific data blocks closer to the applications to improve accessing efficiency. They implement a similar replication mechanism: when request, first search for an appreciate resource or existing replica, this is normally done by query a centralized metadata index; then use some efficient transferring tools, e.g. GridFTP, to move data. Most of existing gird replication tools only deals with (read only) files. SRB is able to manage data movement among heterogeneous databases, yet, it doesn't support transaction-based replication, and its metadata searching seems very stiff and not suitable for commercial usages. Different from existing replication technologies, we focus on industry and business demanding for grid computing capabilities, which allows users being able to address more business challenges and maximize their commercial benefits. We start by restructuring conventional database replication mechanism. This may helps large amount of commercial users understand the concepts of grids easily and move onto grids smoothly. For example, many businesses users would not like to abandon their in-using software which had been invested a lot and made familiar and correct. This strategy will assist them to grid-enable their database accessibility in minimal cost. On the other hand, software reusing can greatly reduce development and testing expenditure, allowing for keeping the features of old means, and delivering new quality product quicker.
|
||||