The Language Archive -
The REPLIX project is studying and implementing the next level in grid based replication and synchronization at a logical level by using iRODS.
Proper data replication is one of the most important tasks when preserving digital data for the future. Currently, MPI as many other institutions are carrying out data replication at a physical level by using tools such as AFS (Andrew File System) and Rsync to preserve the stored data about cultures and languages. Such schemes cannot be used in future data services infrastructures since they are simply too limited.
Within a distributed data management system it is important to consider topics such as data authenticity and integrity, storage resource management, data replication, access rights etc. The control of data authenticity and integrity may be achieved for example by attaching a reliable checksum to a persistent identifier (PID) that is used as the logical incarnation of an object within a repository and then performing integrity checks upon any operation on the object.
Resources exist as part of large collections which are defined by complex metadata referring to the individual resources with the help of PIDs. Data replication needs to preserve these complex relationships and it must be possible to replicate whole or partial collections together with their metadata contexts.
Data replication implies that source collections will be placed in different contexts, i.e. in different hierarchies, therefore any replication process needs to ensure that the relative consistency of the collections is preserved. The process of replication also implies that the PID records need to be updated in a controlled manner once the correctness of the copy has been verified.
A flexible and fine grained permissions framework which spans the whole data infrastructure is required. Rights to access resources are defined at the source repository. Moreover it must be ensured that data replication is accompanied by the replication of the associated access rights.
REPLIX will install iRODS at several replication sites and use its rule based functionality to test and implement features such as those detailed above. Since PIDs are crucial to the system we will utilize the Handle System based services of the EPIC consortium and the service currently installed at the MPI.