Brain Meets Brawn: Why Grid and Agents Need Each Other

Ian Foster I., Nicholas R., Kesselman C.
3rd Int. Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS 2004), New York, USA, 2004.

http://www.semanticgrid.org/documents/003-foster_i_grid.pdf

Abstract

The Grid and agent communities both develop concepts and mechanisms for open distributed systems, albeit from different perspectives. The Grid community has historically focused on “brawn”: infrastructure, tools, and applications for reliable and secure resource sharing within dynamic and geographically distributed virtual organizations. In contrast, the agents community has focused on “brain”: autonomous problem solvers that can act flexibly in uncertain and dynamic environments. Yet as the scale and ambition of both Grid and agent deployments increase, we see a convergence of interests, with agent systems requiring robust infrastructure and Grid systems requiring autonomous, flexible behaviors. Motivated by this convergence of interests, we review the current state of the art in both areas, review the challenges that concern the two communities, and propose research and technology development activities that can allow for mutually supportive efforts.

1 Introduction

In open distributed systems, independent components cooperate to achieve individual and shared goals. Both individual components and the system as a whole are designed to cope with change and evolution in the number and nature of the participating entities. Such systems are important in many contexts, from large scientific collaborations to enterprise systems and sensor networks.

The Grid and agent communities are both pursuing the development of such open distributed systems, albeit from different perspectives. The Grid community [12] has historically focused on what we refer to here as “brawn”: interoperable infrastructure and tools for secure and reliable resource sharing within dynamic and geographically distributed virtual organizations (VOs) [14], and applications of the same to various resource federation scenarios. In contrast, those working on agents have focused on “brains,” i.e., on the development of concepts, methodologies, and algorithms for autonomous problem solvers that can act flexibly in uncertain and dynamic environments in order to achieve their aims and objectives [21]. A key component of this research is motivated by the fact that such agents are often required to form themselves into collectives (i.e., VOs) and act in a coordinated manner. This need to support aggregation has, in turn, led to much research into rich and flexible mechanisms for managing such interactions.

As these two communities mature and turn their attention to fundamental problems of scope, both are encountering challenging problems in terms of scale and application. This maturation process is causing an increasing overlap in the problems that they address. Specifically, current Grid systems are somewhat rigid and inflexible in terms of their interoperation and their interactions, while agent systems are typically not engineered as serious distributed systems that need to scale, that are robust, and that are secure [34]. Nevertheless, each is working its way towards the others’ territory, as Grids seek to become more flexible and agile, and agent systems seek to be more reliable and scaleable.

Given this background, it is fruitful to examine work in these two domains, first to communicate to each community what has been done by the other, and second to identify opportunities for cross fertilization. We seek to take a first step towards that goal in this paper. To this end, we first review the state of the art in Grids and agents (Sections 2 and 3), compare and contrast the two approaches (Section 4), present a common vision of service-oriented architecture (Section 5), and conclude with a list of significant research challenges (Section 6).

Limited time and space require that we restrict ourselves in this article to the work being performed within the Grid and agents communities. Thus, we do not cover the highly relevant and interesting work pertaining to open distributed systems that can be found in other domains, including robotics, peer-to-peer networking, semantic web, distributed systems, artificial intelligence, and autonomic systems.

2 Grids

Grids aim to enable “resource sharing and coordinated problem solving in dynamic, multi-institutional VOs” [12]. In other words, Grids provide an infrastructure for federated resource sharing across trust domains. Much like the Internet on which they build, current Grids define protocols and middleware that can mediate access provided by this layer to discover, aggregate, and harness resources. These applications span a wide spectrum. Moreover, the standardization of the protocols and interfaces used to construct systems is an important part of the overall research and development program.

2.1 Technologies

Grid technologies have evolved through at least three distinct generations: early ad hoc solutions, de facto standards based on the Globus Toolkit (GT), and the current emergence of more formal Web services (WS)- based standards within the context of the Open Grid Services Architecture (OGSA) [13].

OGSA adopts WS standards such as Web Services Description Language (WSDL) as a basis for a serviceoriented architecture within which arbitrary services can be defined, discovered, and invoked in terms of their interfaces rather than their implementations. This approach provides a basis for virtualization, interoperability, and composition.

The Grid community has participated in, and in some cases led, the development of WS specifications that address other Grid requirements. The WS-Resource Framework (WSRF) defines uniform mechanisms for defining, inspecting, and managing remote state, a crucial concern in many settings. WSRF mechanisms underlie work on service management (WSDM, in OASIS) and negotiation (WS-Agreement, in GGF), efforts that are crucial to the Grid vision of large-scale, reliable, and interoperable Grid applications and services. Other relevant efforts are aimed at standardizing interfaces to data, computers, and other classes of resources.

Work on Grid-related standards is driven by, and influences, the work of a vibrant open source community. GT (in its most recent instantiation, Web services-based and WSRF-compliant) provides basic middleware to create VOs, addressing such issues as specification and enforcement of VO wide policy, discovery, provisioning and management of services and resources, and federation, replication, discovery, and movement of data. At deployment, depending on available resources and planned applications, specific service implementations can be chosen and deployed, often in conjunction with other GT-based components.

Grid technology R&D has produced specifications and technologies for realizing service-oriented architectures according to robust distributed system principles. Global control mechanisms able to deal reliably with failure and adapt to changing environmental conditions and application concerns have been a lesser concern.

2.2 Applications

Early application drivers were largely from scientific computing [6, 10, 19], and included large-scale distributed computing [2, 15] (federation of computers), integration of large-scale data repositories (data grids [7]), collaboration [31], and tele-instrumentation [23, 26]. More recently, the technology has seen considerable uptake in industry as a means of addressing issues of virtualization and distributed system management [13].

GT is in production use across VOs integrating resources from 20-50 sites with thousands of computational and data resources, and is expected to scale to 100s of sites with 1000s of sites as a future goal. In the remainder of this section, we list a few examples to show the range and scope of Grid deployments.

The U.S. Network for Earthquake Engineering Simulation Grid (NEESgrid) connects experimental facilities (e.g., shake tables), data archives, computers, and a user community of earthquake engineers. Its service-oriented architecture defines standard interfaces for telepresence, monitoring, and control of remote scientific instruments, and for publishing, discovering, and accessing data produced by these instruments [26]. NEESgrid experiments have linked facilities at three sites and more than 50 remote participants.

Grid3 [15] links 28 sites with clusters totaling some 3000 processors. These resources are used by science communities from high energy physics, astronomy, biology, chemistry, and computer science for large-scale simulation and data analysis computations.

In contrast, Access Grid [31] is focused on interpersonal communication, via sharing of audio, video, and applications within collaborative spaces. Grid technologies are used in Access Grid for such purposes as security, discovery, and resource management.

Butterfly.net is creating a GT-based provisioning infrastructure for multiplayer online games, in which the demands for computation, storage, and network resources can vary dramatically as the popularity of games changes over time [24]. As a second example of a commercial Grid deployment, GlobeXplorer is using GT to support integration and processing of satellite image data [17].

Experiences with such applications reveal issues that must be addressed if Grids are to be scaled to larger communities, more diverse resources, and more complex applications. We review those challenges in Section 6.

3 Agent-Based Computing

An agent “is an encapsulated computer system that is situated in some environment, and that is capable of flexible, autonomous action in that environment in order to meet its design objectives” [33]. In more detail [21], agents are: (i) clearly identifiable problem solving entities with well-defined boundaries and interfaces; (ii) situated (embedded) in a particular environment—they receive inputs related to the state of their environment through sensors and they act on the environment through effectors; (iii) designed to fulfill a specific role—they have particular objectives to achieve and have particular problem solving capabilities (services) that they can bring to bear to this end; (iv) autonomous—they have control both over their internal state and over their own behavior; and (v) capable of exhibiting flexible problem solving behavior in pursuit of their design objectives—they need to be both reactive (able to respond in a timely fashion to changes that occur in their environment) and proactive (able to opportunistically adopt goals and take the initiative).

When adopting an agent-oriented view of the world, it soon becomes apparent that most problems require or involve multiple agents: to represent the decentralized nature of the problem, multiple loci of control, multiple perspectives, or competing interests. Moreover, these agents need to interact, either to achieve their individual objectives or to manage the dependencies that ensue from being situated in a common environment. Thus, in any given system there may be both cooperative and selfish agents whose aims are, respectively, to maximize the social welfare of the system and to maximize their own individual return. These interactions are built on some form of semantic integration (Section 2.3), may well involve trust relationships, and also include the traditional service discovery and invocation discussed above, as well as the more sophisticated social interactions related to the ability to cooperate, coordinate and negotiate about which services are performed by which agents at what time.

In the majority of cases, agents act to achieve objectives either on behalf of individuals (or companies) or as part of some wider problem solving initiative. (Note the similarity to the VO concept.) Thus, when agents interact there is typically some underpinning organizational context that defines the relationship among them. For example, agents may be peers working together in a team or one may be the manager of the other agents. To capture such links, agent systems typically have explicit constructs for modeling organizational relationships or roles such as peer, manager, or team member. In many cases, these relationships are subject to ongoing change: social interaction means existing relationships evolve (e.g., a team of peers may elect a leader) and new relations are created (e.g., a number of agents may form a VO to deliver a particular service that no one individual can offer). The temporal extent of these relationships can also vary enormously: from just long enough to deliver a particular service once, to a permanent bond.

Whatever the nature of the social process, there are two points that qualitatively differentiate agent interactions from those that occur in other computational models. First, agent-oriented interactions tend to be more sophisticated than in other contexts, dealing, for example, with notions of cooperation, coordination, and negotiation. Second, agents are flexible problem solvers, operating in an environment over which they have only partial control and observability. Thus, interactions need to be handled in a similarly flexible manner, and agents need the computational apparatus to make contextdependent decisions about the nature and scope of their interactions and to initiate (and respond to) interactions that were not foreseen at design time. The downside of this autonomy and flexibility, however, is that it is difficult to ensure that desirable global behaviors emerge. To this end, a range of techniques (such as reinforcement learning, mechanism design, and electronic institutions) are often deployed to try and impose greater order.

Drawing these points together, Figure 1 shows that adopting an agent-oriented approach to system engineering means decomposing the problem into multiple, interacting, autonomous components that have particular objectives to achieve and are capable of performing particular services. The key abstraction models that define the agent-oriented mindset are agents, interactions and organizations. Finally, explicit structures and mechanisms are often used to describe and manage the complex and changing web of organizational relationships that exist between the agents.

3.1 Technologies

In contrast to Grid computing, there is less focus on identifiable agent technologies that can be used off the shelf to build applications. Traditionally, more attention has been given to theories and models of how agents can be developed and how they can communicate, cooperate, and negotiate. This work has resulted in the development of a range of algorithms that can be used both to build individual agents and to manage their interactions. In the former case, algorithms and architectures have been developed that enable an agent to plan an effective course of action to achieve a goal in uncertain and unpredictable environments, to adapt its behavior to its prevailing circumstances, and to strike an effective balance between being too responsive (and continually changing its aim such that no task is ever completed) and too committed to its present course of action (such that more important activities are not dealt with in a timely fashion). In the latter case, algorithms have been developed that agents can use to achieve efficient negotiation outcomes, to form teams composed of the optimal set of parties, and to determine the degree of trust that should be placed in a particular agent, based upon its social and organizational relationships.

There has recently been an increasing trend towards making agent technology a serious basis for building complex, distributed systems. Several agent development environments support specific agent architectures and provide libraries of interaction protocols (e.g., JACK, JADE, Cougaar, and ZEUS), software engineering methodologies have been devised to analyze and design agent-based systems (e.g., Gaia, Tropos, and AUML), and there have been efforts to standardize various aspects of agent systems, such as inter-agent communication (e.g., FIPA, KQML). Moreover, as in the Grid community, there is an increasingly reliance on Web services and semantic web technologies for providing the computational infrastructure for such systems and an increasing acceptance of the importance of trust as a central issue in interaction.

3.2 Applications

Agent technology has been deployed in a number of isolated applications over the past ten years. However in the past few years the number and range of applications have increased significantly. In particular, many large companies are now interested in developing applications using agent technologies, and deployed applications exist for domains such as manufacturing, electronic commerce, process control, telecommunication systems, traffic and transportation management, information filtering and gathering, business process management, defense, entertainment and medical care [25].

4 Brains and Brawn

We see that a common thread underlies both agents and Grids, namely, the creation of communities or VOs bound together by a common goal or cause. Yet the two communities have focused on different aspects of this common problem. In the case of Grids, the primary concern has been the mechanisms by which communities form and operate. Thus, we see much effort devoted to how community standards are represented via explicit policy, how policy is enforced, how community members identify one another, how actions within the community are implemented, and how commitments by community members are specified, monitored and enforced. On the other hand, our understanding of how to use these mechanisms to create large-scale systems with stable collective behavior is less mature. For example, commonly used Grid tools provide uniform mechanisms for accessing data on different storage systems, but not for the semantic integration of that data; for accessing service and resource state, but not for anticipating, detecting, and diagnosing problems implied by changes to that state; and for securely authenticating users and services, but not for inferring whether or not specific users or services can be trusted to perform specific actions. To this extent, Grids are all brawn and no brain.

Agents also focus on creating community. Out of the flexible local decision making of system components, sensible community wide behaviors emerge through rich social interactions and explicit organizational structures. However in building all this flexibility and sophistication, scant attention has been paid to how these tasks should be performed in realistic distributed environments. For example, agent frameworks provide sophisticated internal reasoning capabilities, but offer no support for secure interaction or service discovery; cooperation algorithms produce socially optimal outcomes, but assume the agents have complete knowledge of all outcomes that any potential grouping can produce; and negotiation algorithms achieve optimal outcomes for the participating agents, but assume that all parties in the system are known at the outset of the negotiation and will not change during the system’s operation. Thus, one may say that agents are all brain and no brawn.

Clearly, neither situation is ideal: for Grids to be effective in their goals, they must be imbued with flexible, decentralized decision making capabilities. Likewise, agents need a robust distributed computing platform that allows them to discover, acquire, federate, and manage the capabilities necessary to execute their decisions. In other words, there are good opportunities for exploiting synergies between Grid and agents.

One approach to exploiting such synergies might be a simple layering of the technologies, i.e., to implement agent systems on top of Grid mechanisms. However, it seems more likely that the true benefits of an integrated Grid/agent approach will only be achieved via a more fine-grain intertwining of the two technologies, with Grid technologies becoming more agent-like and agent-based systems becoming more Grid-like.

As an early example of such a tighter coupling, we point to work on agent-based resource selection, in which re-enforcement-based learning is used to drive the assignment of tasks to resources [16]. In this case, the “agent” (i.e., the logic used to make the task assignment decisions) uses Grid functions for status monitoring, resource discovery, and task submission. The agent, in turn, provides a valuable Grid function, with the collection of agents implementing a robust global resource management behavior that might not otherwise be achieved. A second example is the use of automated negotiation techniques (specifically, various forms of auctions) to allocate resources in Grid systems [32]. Here, designers evaluate the effectiveness of both commodity market and Vickery auction protocols to the problem of allocating resources within a distributed system. This example also shows how techniques familiar to agents researchers can be integrated with other more standard components within a Grid architecture.

This level of integration will undoubtedly create new challenges for both agents and Grids. However, the result could be frameworks for constructing robust, large-scale, agile distributed systems that are qualitatively and quantitatively superior to the best current practice today.

5 Robust Agile Service-Oriented Systems

Having described key agent and Grid concepts, we now draw the two parallel lines of research together to highlight their commonalities and complementarities.

5.1 Autonomous Services

A core unifying concept that underlies Grids and agent systems is that of a service: an entity that provides a capability to a client via a well-defined message exchange [4]. Within third-generation Grids, service interactions are structured via Web service mechanisms, and thus all entities are services. However, while every agent can be considered a service (in that it interacts with other agents and its environment via message exchanges), we might reasonably state that not every Grid service is necessarily an agent (in that it may not participate in message exchanges that exhibit flexible autonomous actions).

This notion of autonomous action is thus central to the question of how agents and Grids can interoperate. To illustrate the issues, let us consider a service that encapsulates a database. In a local area network, we might find a version of this service that responds to requests to “read a record” or “write a record.” Such an implementation does not exhibit autonomous behavior.

On the other hand, in a more distributed, administratively heterogeneous, and failure-prone environment, the implementation of such a service might exhibit more sophisticated behavior. For example, the database might be replicated, with the number of replicas determined dynamically by knowledge-based models of system reliability and performance. Distributed negotiation protocols might be used to establish the query throughput achievable on individual copies, such that community throughput is optimized. Finally, distributed planning and scheduling algorithms might be used to map queries to specific database replicas so as to minimize the latency of user requests. In all these cases, a robust database service, designed to operate in an open distributed system, is exhibiting flexible autonomous actions (in the sense that its behaviors are not driven solely by a client request, but also by other considerations, including local policies and the outcomes of negotiations with the client). In short, such services will exhibit agent behavior.

5.2 Rich Service Models

Both agent and Grid systems consist of dynamic and stateful services. The underlying service model is dynamic in that new services can be created and destroyed over the lifetime of the system. Here an important contribution of Grid technologies is a robust lifetime and naming model for dynamic services [13]. Implicit in this model are the notion of service failure and the definition of a scalable distributed systems semantics. In contrast, agent-based systems rarely consider such issues, but they could clearly benefit from exploiting this approach to representing and managing dynamic services.

Statefulness is another important aspect of the service model. A stateful service (or, more-or-less equivalently, a resource [11]) has internal state that persists over multiple interactions. It can often be useful to make this state externally visible, so that, for example, another participant in a distributed system can determine the current load on a server, the policies that govern access to a service, and/or the schema(s) supported by a database. Again, Grid technologies have addressed this issue, defining a general model for representing and querying service state [11]. This model includes mechanisms for describing state “lifetime”, as well as a means of specifying and enforcing policy with respect to access and modification.

The Grid state model defines how state is represented and accessed, but does not speak to the structure or semantics of the state that is thus exposed. Typical practice is to define state in terms of fixed schema or attributes. In contrast, agent systems address semantics but do not provide a consistent state model. An integrated approach can allow for the publication of richer semantic information within the Grid state model, thus enhancing the ability of applications to discover, configure, and manage services in an interoperable manner [18].

5.3 Negotiation and Service Contracts

Negotiation is emblematic of the brain/brawn schism between current Grid and agent systems. In general, it cannot be assumed that a service will actually provide a particular capability to a user: a provider may be unable or unwilling to provide the service to a putative consumer. Hence, if the system is to have any type of predictable behavior, it becomes necessary to obtain commitments (contracts) about the willingness to provide a service and the characteristics, or quality, of its provision.

Given the ability to provision a resource to provide a desired level of service, we are faced with the question of exactly what levels of service can and should be obtained. The process by which this is determined will necessarily be some form of negotiation, since the autonomous entities involved need to come to a mutually acceptable agreement on the matter. If this negotiation is successful (i.e., both parties come to an agreement) then the outcome of the procurement is a contract (service level agreement) between the service provider and the service consumer.

This negotiation can be arranged in many different ways; there are millions of protocols, with varying properties, and agent researchers have invested significant effort in determining which protocols are appropriate in which circumstances [9]. In this context, the negotiation is driven by the operational policy of both the service provider and the service consumer. Specifically, policy terms to be considered may involve aspects such as the current load, the identity and reputation of the requestor, and the requestor’s ability to pay.

The use of negotiation as a means of establishing service contracts is a topic of considerable interest in both the agent [22] and Grid [8] communities. One promising approach within Grids has been to represent agreement as the creation of a shared policy statement and to define robust extensible protocols for exchanging and agreeing to policy terms. Creating these agreements in the face of a Byzantine failure model can be complex. Having designed such protocols, the next step is to determine the strategy that the system components should adopt to achieve their policy objectives. Strategies can vary from the simple (e.g., an agent bidding its true valuation for a service) to the complex (reasoning about the other participants and their likely strategies).

5.4 Virtual Organization Management

A common interaction modality in both Grid and agent systems occurs when several agents come together to form a new VO. Such VOs can be viewed as a form of dynamic service composition: a number of initially distinct entities come together, under a set of operating conditions, to form a new entity that offers a new service. In such cases, one of the key challenges is for the participating agents to determine who else should be involved in the coalition and what their various roles and responsibilities should be. Again, this activity typically involves negotiation among participants, in this case to determine a mutually acceptable agreement concerning the division of labor and responsibilities.

Dynamic creation also raises the issue of service discovery. Experience in the Grid community indicates that this discovery should not simply be on the basis of service type, but rather should incorporate notions of service state and should be based on an understanding of the capabilities of the service (i.e., semantics). While Grid technologies provide the means for describing and grouping services, these higher level matchmaking and discovery capabilities are not currently part of Grid infrastructure. Fortunately, this is an area where much work has been done in the space of agents, and thus incorporation of this technology would do much to improve matters. This integration may have an impact on how state is represented and how services are organized.

5.5 Authentication, Trust, and Policy

with dynamically created services has long been an integral part of Grid infrastructure. A common approach to this problem is to map identities into a global namespace and then apply delegation as a means for building federated namespaces for dynamically created entities. More recent work has focused on the application of richer policy statements and the creation of community based authorization and assertion authorities [27].

Also fundamental to the creation of collaboration and community, and building upon the aforementioned notions of authentication, are notions of trust. The effective management of trust and policy within a community, like VO formation, requires flexible, autonomous mechanisms able to consider, when organizing communities, not only the semantics of policy statements but also the ability to negotiate policy terms and to manage restricted delegation of rights.

As with other aspects of agents and Grids, we expect to see the adaptation of agent algorithms and technologies as they incorporate policy specification and enforcement into their basic operations and we expect to see Grid algorithms make use of some of the richness of the various agent trust and reputation models that have been developed [28]. We also expect that the types of policy statements made, along with how they are disseminated and applied, will evolve as agent-based techniques become more completely integrated into Grids. For example, reputation-based authentication mechanisms, which lend themselves to agent-based implementations, show great promise in the Grid environment.

6 Ten Research Problems

We conclude by outlining ten areas (in no particular order) in which research is needed to realize an integrated agent-Grid approach to open distributed systems.

Service architecture. The convergence of agent and Grid concepts and technologies will be accelerated if we can define an integrated service architecture providing a robust foundation for autonomous behaviors. This architecture would define baseline interfaces and behaviors supporting dynamic and stateful services, and a suite of higher-level interfaces and services codifying important negotiation, monitoring, and management patterns. The definition of an appropriate set of such architectural elements is an important research goal in its own right, and, in addition, can facilitate the creation, reuse, and composition of interoperable components.

Trust negotiation and management. All but the most trivial distributed systems involve interactions with entities (services) with whom one does not have perfect trust. Thus, authorization decisions must often be made in the absence of strong existing trust relationships. Grid middleware addresses secure authentication, but not the far harder problems of establishing, monitoring, and managing trust in a dynamic, open, multi-valent system. We need new techniques for expressing and reasoning about trust. Reputation mechanisms [29] and the ability to integrate assertions from multiple authorities (“A says M can do X, but B disagrees”) will be important in many contexts, with the identity and/or prior actions of an entity requesting some action or asserting some fact being as important as other metrics, such as location or willingness to pay. Trust issues can also impinge on data integration, in that our confidence in the “data” provided by an entity may depend on our trust in that entity, so that, for example, our confidence in an assertion “A says M is green” depends on our past experiences with A.

System management and troubleshooting. Grid technologies make it feasible to access large numbers of resources securely, reliably, and uniformly. However, the coordinated management of these resources requires new abstractions, mechanisms, and standards for the quasiautomated (“autonomic” [20]) management of the ensemble—despite multiple, perhaps competing, objectives from different parties, and complex failure scenarios. A closely related problem is troubleshooting, i.e., detecting, diagnosing, and ultimately responding to the unexpected behavior of an individual component in a distributed system, or indeed of the system as a whole. This requirement will motivate the development of robust and secure logging and auditing mechanisms. The registration, discovery, monitoring, and management of available logging points, and the development of techniques for detecting and responding to “trouble” (e.g., overload or fraud), remain open problems. We also require advances in the summarization and explanation (e.g., visualization) of large-scale distributed systems.

Negotiation. We have already discussed negotiation at some length; here we simply note that major open problems remain in this vital area.

Service composition. The realization of a specific user or VO requirement may require the dynamic composition of multiple services. Web service technologies define conventions for describing service interfaces and workflows, and WSRF provides mechanisms for inspecting service state and organizing service collections. Yet we need far more powerful techniques for describing, discovering, composing, monitoring, managing, and adapting such service collections.

VO formation and management. While the notion of a VO seems to be intuitive and natural, we still do not have clear definitions of what constitutes a VO or well-defined procedures for deciding when a new VO should be formed, who should be in that VO, what they should do, when the VO should be changed, and when the VO should ultimately be disbanded.

System predictability. While open distributed systems are inherently unpredictable, it can be important to provide guarantees about system performance (e.g., liveness or safety properties, or stochastic performance boundaries). However, such guarantees require a deeper understanding of emergent behavior in complex systems.

Human-computer collaboration. Many VOs will be hybrids in which some problem solving is undertaken by humans and some by programs. These components must interwork in a seamless fashion to achieve their aims. New collaboration models are necessary to capture the rich social interplay in such hybrid teams.

Evaluation. Meaningful comparison of new approaches and technologies requires the definition of appropriate benchmarks and challenge problems and the creation of environments in which realistic evaluation can occur. Perhaps the single most effective means of advancing agent-Grid integration might be the definition of appropriately attractive challenge problems. Such problems should demand both the brawn of Grid and the brains of agents, and define rigorous metrics that can be used to drive the development in both areas. Potential challenge problems might include the distributed monitoring and management of large-scale Grids, and robust and long-lived operation of agent applications.

Evaluation can occur in both simulated and physical environments. Rapid progress has been made in simulation systems for both agents and Grids (e.g., [30]). Production deployments such as Grid3 [15], TeraGrid [5], and NEESgrid [26], and testbeds such as PlanetLab [1], are potentially available as experimental platforms for the evaluation of converged systems, for example within the context of the challenge problems just mentioned.

Semantic integration. Open distributed systems involve multiple stakeholders that interact to procure and deliver services. Meaningful interactions are difficult to achieve in any open system because different entities typically have distinct information models. Advances are required in such interrelated areas as ontology definition, schema mediation, and semantic mediation [3]. Again, issues of trust and cost have vital roles to play.

Acknowledgments

The work of the first author was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109-Eng-38. The second author acknowledges the support of the EPSRC project “Virtual organisations for e-Science” (GR/S62710/01).