CoreGRID Network of Excellence - European Grid Research

European Research Network on Foundations, Software Infrastructures and Applications
for large scale distributed, GRID and Peer-to-Peer Technologies

Home

Institutes

Institute on SA

Main Menu

Integration Activities

Trust&Security Portal

Collaboration Gateway

Other Collaborations

Links

Institute on Architectural Issues

Research Objectives

Institute leader: Paraskevi Fragopoulou (fragopou

ics.forth.gr), FORTH

The goal of the institute on architectural issues is to deliver paradigms, methods, and prototypes for creation of future GRID architectures. Particular focus is given to three aspects: scalability, adaptability, and dependability of future GRIDs. The research within this institute will be conducted concurrently on these three aspects (and possibly others). However, a periodical integration of the approaches and interoperability testing is planned.

There are many challenges to realize the vision of next generation GRIDs in respect to architectures. Compared to classical distributed systems, the scale (to millions of nodes), dynamicity and node variety (sensors, mobile devices, supercomputers) of next generation GRIDs pose major challenges in system design. The systems must not only scale well, but also need to be robust in the face of both node volatility and malicious nodes/users. At the same time, the systems must stay manageable and this will require a large degree of self-organization. To face these challenges and evaluate the applicability of emerging methods, we arranged the research issues in the system architecture institute into three key fields:

Scalability Currently, GRID systems are based on the traditional client-server paradigm, which may not be able to scale to millions of nodes. Several researchers have pointed out this limit in scalability. Based on these observations, the goal of this research is to study the limitations on scalability of existing GRID architectures and to design, prototype, and test scalable approaches for key system components of a GRID infrastructure such as the resource discovery engine, scheduler, and security mechanisms.

Adaptability The existing architecture of the GRID systems features virtually no mechanisms for automatic adaptation of the systems or its parts to new internally or externally imposed conditions. To develop an “adaptive GRID” it is necessary to provide mechanisms for automated adaptation and reconfiguration of the GRIDs on all hierarchy levels. Such mechanisms include monitoring of the state of the GRID, its analysis together with decision taking, “intelligent” decision execution and finally delegation of control to human operators.

Dependability One of the main challenges for GRID computing is the ability to tolerate failures and recover from them (ideally in a transparent way). Current GRID middleware still lacks mature fault tolerant features and next-generation GRIDs need to solve this problem providing a more dependable infrastructure to execute large-scale computations by using remote clusters and HPC systems.

Roadmap version 3 on Architectural Issues: Scalability, Dependability, Adaptability

Publications related to the Institute on Architectural Issues

Research Groups

Research Group
Leader Participants

SA-1a: P2P Techniques for Resource Discovery in Grids FORTH FORTH, KTH, SICS, UNICAL, CNR-ICAR

SA-1b: Scalable Resource Location in P2P Systems FORTH FORTH, KTH, SICS, UCY, UoS

SA-1c: Foundations of Distributed Hash tables ZIB KTH, SICS, UCL, ZIB

SA-2a: Building Scalable Self-Organizing Services using P2P Technologies SICS FORTH, INRIA, KTH, SICS

SA-2b: Self-management for Applications Built on Structured Overlay Networks UCL KTH, SICS, UCL, VTT

SA-2c: Scalability for Desktop Grids SZTAKI INRIA, MTA SZTAKI, UCO, UoW (Univ. Cardiff)

SA-3a: Dependability Mechanisms for Desktop Grid Computing
INRIA
INRIA, MTA SZTAKI, UCO

SA-3b: Sabotage Tolerance in Desktop Grid Computing UCO UCO, CCLRC

SA-4a: Fault-injection and Robustness Assessment for Grid Services INRIA INRIA, UCO

SA-4b: Failure Management in Grids UCY FORTH, UCY, INFN

SA-5a: Modelling and Prediction of Workloads and System Behaviour ZIB INRIA, UCO, ZIB, UPC

SA-5b: Self-healing SOA and Grid Architectures UCO INRIA, ZIB, UPC, UCO

Latest Research Highlights

Using Virtual Machines in Desktop Grid Clients for Application Sandboxing

CoreGRID Technical Report TR-0140 [pdf]:

Desktop Grids harvest the computing power of idle desktop computers whether these are volunteer or deployed at an institution. Allowing foreign applications to run on these resources requires the sender of the application to be trusted, but trust in goodwill is never enough. An efficient solution is to provide a secure isolated execution environment, which does not constrain any additional burdens neither on administrators nor on users. Currently Desktop Grids do not provide such facility. In this report we describe our approach to provide a platform independent and transparent sandbox mechanism for Desktop Grids. We define the requirements for the transparency and present a prototype that fulfills these criteria.

Managing Performance of Aging Applications via Synchronized Replica Rejuvenation

CoreGRID Technical Report TR-0143 [pdf]:

We investigate the problem of ensuring and maximizing performance guarantees for applications suffering software aging. Our focus is the optimization of the minimum and average performance of such applications in virtualized and non-virtualized scenario. The key technique is to use a set of simultaneously active application replica and to optimize their rejuvenation schedules. We derive an analytical method for maximizing the minimum “any-time” performance for certain cases and propose a heuristic method for maximization of minimum and average performance for all others. To evaluate our method we perform extensive studies on two applications: aging profiles of Apache Axis 1.3 and the aging data of the TPC-W benchmark instrumented with a memory leak injector. The results show that our approach is a practical way to ensure uninterrupted availability and optimize performance for even strongly aging applications.

Using Machine Learning for Non-Intrusive Modeling and Prediction of Software Aging

CoreGRID Technical Report TR-0142 [pdf]:

The wide-spread phenomenon of software (running image) aging is known to cause performance degradation, transient failures or even crashes of applications. In this work we describe first a method for monitoring and modeling of performance degradation in SOA applications, particularly application servers. This method works for a large class of the aging processes caused by resource depletion (e.g. memory leaks). It can be deployed non-intrusively in a production environment, under arbitrary service request distributions. Based on this schema we investigate in the second part of the paper how machine learning (classification) algorithms can be used for proactive detection of performance degradation or sudden drops caused by aging. We leverage the predictive power of these algorithms with several techniques to make the measurement-based aging models more adaptive and more robust against transient failures. We evaluate several state-of-the-art classification methods for their accuracy and computational efficiency in this scenario. The studies are performed on a data set generated by a TPC-W benchmark instrumented with a memory leak injector. The results show that the probing method yields accurate aging models with low overhead and the machine learning approach gives statistically significant short-term predictions of degrading application performance. Both approaches can be used directly to fight aging via adaptive software rejuvenation (restart of the application), for operator alerting, or for short-term capacity planning.

More...


	© 2012 CoreGRID Network of Excellence - European Grid Research Joomla! is Free Software released under the GNU/GPL License.