Project overview and goals

Next: Background and supporting work Up: Proposed research and context Previous: Proposed research and context

Project overview and goals

Computational Grids [1] integrate the services and resources of distributed, heterogeneous virtual organisations. The integration of such systems is technically challenging but highly advantageous. Use of Grid technology in e-Science facilitates: (i) harnessing large-scale distributed computing power to solve otherwise intractable problems; (ii) sharing access to very large scientific datasets which are expertly managed; and (iii) networked access to highly specialised scientific instruments. Facilitating long-distance collaboration between scientific research groups in this way has led to breakthroughs in our scientific understanding of the world with attendant benefits for society, nature, medicine and the quality of human life.

The organisation and coordination of Grid systems does not come for free. Grid services need to be managed so that the users of services can discover the availability of service providers, agree on resource allocation and usage policies, and share and re-use technology infrastructure at high levels of utilisation. This coordination, agreement and use must be scalable, cost-effective and robust enough to cope with changes in both application software and server hardware and operating system deployment. To this end, Grid services are offered on an Open Grid Services Architecture (OGSA) [2] which is itself defined in terms of the Globus Toolkit^TM[3] and Web Services [4].

The purpose of this project is to develop methods, invent algorithms, and engineer software infrastructure to equip requests for Grid services with irrefutable, accurate certificates which specify the quantity and type of resources which will be consumed if this request is serviced. Both service providers and service consumers would immediately benefit from such resource certificates.

Service providers can optimise the scheduling of their service provision on the basis of accurate a priori information about processing costs. Applications with high resource usage needs may be delayed or even entirely refused the permission to execute.
Application developers can investigate the resource usage of their programs based on statically acquired information. If the resource usage of their job seems relatively small they may choose to use a small Grid service provider. If the resource usage of their job seems large or even impractical they may rework and optimise their implementation before sending an improved version for execution later.

This model of Grid service usage is in stark contrast to the present ad-hoc speculative approach whereby Grid users submit jobs for execution with only a vague notion of the run-time and resource consumption which will be required. In this setting malfunction of user jobs due to violation of resource bounds is seen as an unfortunate and unforeseeable accident. In reality, such malfunctions are often entirely forseeable and this project will develop and bring to the fore the methods and technology used in the prediction and prevention of resource-based software failures.

Next: Background and supporting work Up: Proposed research and context Previous: Proposed research and context