next up previous
Next: Background and supporting work Up: Proposed research and context Previous: Proposed research and context

Project overview and goals

Computational Grids [1] integrate the services and resources of distributed, heterogeneous virtual organisations. The integration of such systems is technically challenging but highly advantageous. Use of Grid technology in e-Science facilitates: (i) harnessing large-scale distributed computing power to solve otherwise intractable problems; (ii) sharing access to very large scientific datasets which are expertly managed; and (iii) networked access to highly specialised scientific instruments. Facilitating long-distance collaboration between scientific research groups in this way has led to breakthroughs in our scientific understanding of the world with attendant benefits for society, nature, medicine and the quality of human life.

The organisation and coordination of Grid systems does not come for free. Grid services need to be managed so that the users of services can discover the availability of service providers, agree on resource allocation and usage policies, and share and re-use technology infrastructure at high levels of utilisation. This coordination, agreement and use must be scalable, cost-effective and robust enough to cope with changes in both application software and server hardware and operating system deployment. To this end, Grid services are offered on an Open Grid Services Architecture (OGSA) [2] which is itself defined in terms of the Globus ToolkitTM[3] and Web Services [4].

The purpose of this project is to develop methods, invent algorithms, and engineer software infrastructure to equip requests for Grid services with irrefutable, accurate certificates which specify the quantity and type of resources which will be consumed if this request is serviced. Both service providers and service consumers would immediately benefit from such resource certificates.

This model of Grid service usage is in stark contrast to the present ad-hoc speculative approach whereby Grid users submit jobs for execution with only a vague notion of the run-time and resource consumption which will be required. In this setting malfunction of user jobs due to violation of resource bounds is seen as an unfortunate and unforeseeable accident. In reality, such malfunctions are often entirely forseeable and this project will develop and bring to the fore the methods and technology used in the prediction and prevention of resource-based software failures.


next up previous
Next: Background and supporting work Up: Proposed research and context Previous: Proposed research and context