To change the amount of computing resources allocated to clustered web applications requires a middleware with the ability of dynamically changing the size of each application cluster as well as deciding which server machine each application instance of each cluster will use. We assume that the middleware hosts multiple clustered applications. Because of resource limitation, a given server machine cannot run instances for all application clusters at the same time. There may also be hardware or other limitations that make a machine unsuitable for running some application clusters.
In this paper, we propose the design of and evaluate a middleware clustering technology capable of dynamically allocating resources to web applications through dynamic application instance placement. We define application instance placement as the problem of placing application instances on a given set of server machines to satisfy the resource demand of each application cluster. Our middleware uses a placement controller mechanisms that dynamically configures the size and placement of application instances while obeying user-supplied constraints.
The controller places applications based on the available server machine resources and the application demands. Each server machine has two resources, one load-dependent and one load-independent. The usage of the load-dependent resource depends primarily on the intensity of application load. The usage of the load-independent resource does not strongly depend on the current intensity of the application workload. Correspondingly, an application has a load-dependent and a load-independent demand. In the case of web applications, typical examples for load-dependent and load-independent resources are CPU and memory, respectively.
The problem of application placement has been studied before [1,2,3,4]. In [5] we have introduced a placement algorithm that improves on the prior art in several aspects. (1) It allows multiple types of resources to be managed, (2) it aims at maximizing satisfied application demand while minimizing placement changes compared to the previous placement, and (3) it is efficient enough to be applicable in online resource allocation. Although some of prior techniques also consider these factors, none of these techniques addresses all factors simultaneously.
In this paper, we extend our prior work in two ways. First, we modify the algorithm such that it produces application placement that allows application load to be better balanced across server machines. To the best of our knowledge this objective has not been investigated in prior solutions to the placement problem. However, placements that allow load to be balanced across servers allow applications to perform better.
Second, we investigate two variants of the algorithm that improve its effectiveness in maximizing the amount of satisfied demand and minimizing the placement churn.
The paper is organized as follows. In Section 2 we introduce the architecture of a workload management system in which our controller is implemented. In Section 3 we formulate and present the algorithm for the application placement problem. In Section 4 we evaluate placement techniques adopted by the controller. In Section 5 we compare our approach with prior work on this subject.
Web applications handle web requests and generate static or dynamic web content. In the case of J2EE, runtime components of web applications include HTTP servers, servlet engines, Enterprise Java Beans (EJB) containers and databases. Web administrators deploy these runtime components on several or more server machines, forming a distributed computing environment. Web application middleware presented in this paper manages applications running on this distributed computing environment and allocates resources to them in real-time to meet their demand.
Management system architecture in a sample Web infrastructure.
Figure 1 illustrates the typical
architecture of web infrastructure. We show three applications
,
and
deployed across two tiers
(Application Tier 1 and Application Tier 2). For example, in the
case of a J2EE application the first tier hosts servlets and EJBs
while the second tier hosts the database. In this example, the
first tier uses three different server machines while the second
uses two server machines. In this paper, we use the term
node when referring to a server machine. In a given
tier, a particular application may run on multiple nodes. We
refer to the presence of an application on a node as an
application instance and we use the term application
cluster to indicate the set of all instances of the same
application. By application we mean a set of software entities
that form a logical web application unit. Thus, an application
may involve multiple servlets, multiple EJB modules, etc.
In our experience, a large enterprise web datacenter hosts
many different web applications deployed in clusters with a few
tens to several hundred application instances spread across a few
to a hundred nodes. In almost every case datacenter nodes do not
have enough capacity to have all deployed applications be running
on the node at the same time (in the case of J2EE applications
memory is the bottleneck, with each application instance
requiring on average
of real memory to
handle requests). Therefore web administrator partitions
applications across the available nodes. In the example of Figure
1 we have the first tier hosting three
applications with each node capable of hosting at most two
instances of each application. We use a placement matrix
to show where application instances run. Each element
of this matrix
represents the number of instances of application
running on node
.
Web-application middleware environments typically use layer 7
routers to deliver web requests to applications. The routers map
web requests to web applications and send the web requests to a
specific instance of the target web application. The layer 7
routers use load balancing techniques and affinity rules to
select target instances. The router may also implement admission
and flow control. A router equipped with this functionality can
control the amount of backend resources used by the different
types of web requests. However, its ability to increase the
amount of resources used by a given type of web requests is
limited by the number of application instances targeted by this
type of web requests as well as by the capacity of the nodes
where the application instances run. For example, the placement
presented in Fig. 1 allows the request
router to handle a big amount of web request load for application
but much smaller load for
applications
and
.
When application
experiences a spike
of web requests while the workload of application
decreases, the web requests destined for application
cannot use more capacity in Tier 1
than that of the node on which application
is running. Depending on the control logic of the
request router, application
will
experience either high service time in Tier 1, long waiting time
in the routing layer, or high request rejection rate. At the same
time, spare capacity in the system will remain on the remaining
nodes in Tier 1.
To adapt to such workload changes, we must modify the
application placement dynamically in reaction to the changing
load. In the considered scenario, a desired system response to
the workload shift would be to start new instances of application
on nodes 2-1 and 3-1 and stop
instances of application
on these
nodes (to make room for the new instances). When we deal with a
large number of applications with multiple resource requirements
and a large number of nodes with heterogeneous capacities,
deciding on the best application placement for a given workload
is non-trivial.
In this paper we formulate the placement problem as an instance of a two-dimensional packing problem and we study several algorithms for solving it. We also describe a placement controller that runs on our middleware and dynamically changes the size of application clusters to respond to dynamic load variations.
An application instance may run on any node that matches application requirements. In J2EE domain, these properties concern features that cannot be changed using J2EE runtime environment, such as network connectivity, bandwidth, the presence of non-J2EE software libraries, etc. In our system, an application is deployed to all nodes that match its properties. Application deployment involves creating persistent configuration of an application server that hosts the application, which includes the definition and configuration of J2EE resources and parameters. We configure application placement by starting and stopping the application servers as needed. When an application instance is stopped on a node it does not consume any of the node's runtime resources.
Using server start and stop operations to control the number and placement of application instances provides a relatively light-weight resource allocation mechanism because it does not require a potentially time-consuming application deployment, server configuration, distribution of configuration files, etc. A possible disadvantage of such an approach is its somewhat limited flexibility as it does not allow arbitrary combinations of applications to be deployed and executed on application servers.
The controller described in this paper requires the managed system to provide certain functionality. When this functionality is present, the controller is also suitable for the management of other than Web applications. Similarly, it could use other control mechanisms than application instance starts and stops. For example, in the presence of virtualization technology [8,9], we could start and stop virtual partitions as well. The list of the required functions includes the following items.
Load-independent requirements describe resources needed to run an application on an application server, which are used regardless of the offered load. Examples of such requirements are memory, communication channels, and storage. In this paper, we focus on a single resource - memory. Although, J2EE applications also use other resources in a load-independent manner, we have observed that it is memory that is the most frequent bottleneck resource in J2EE applications. We have classified memory as a load-independent resource since a significant amount of it is consumed by a running application instance even if it receives no load. In addition, memory consumption is frequently related to prior application usage rather than to its current load. For example, even in the presence of low load, memory consumption may be high as a result of caching. Thus, although the actual memory consumption also depends on the application load, characterizing memory usage while taking the load-dependent factor into account is extremely difficult. It is more reasonable to estimate the upper limit on the overall memory utilization and treat it as a load-independent memory requirement. Our system includes a component that dynamically computes this upper limit based on a time-series of memory usage by an application. This component is not described in this paper. Consequently, in this paper we assume that an estimate of memory requirements is available.
Load-dependent requirements correspond to resources whose consumption depends on the offered load. Examples of such requirements are current or projected request rate, CPU cycles/sec, disk activity, and number of execution threads. In this paper, we focus on CPU as a resource that is consumed in a load-dependent manner as it is the most frequent bottleneck resource in J2EE applications. CPU demand is expressed as the number of CPU cycles per second that must be available to an application. Computing this estimate is a non-trivial problem, which is not discussed in this paper. However, our system computes such an estimate online. It includes a component, an application profiler [6] that helps us in this process. The application profiler computes the number of CPU cycles consumed by a request of an application. This per-request CPU demand of an application is then multiplied by the required request throughput for the application, whose estimation depends on admission or flow control mechanism used by the request router.
Correspondingly, node capacity is characterized using load-independent and load-dependent values that describe the amount of resources used to host an application on a node and the available capacity to process requests of applications. Similar to application requirements, we focus on the size of memory and the speed of CPU as load-independent and load-dependent capacity measures.

Architecture of the placement management component.
At the heart of the system there is Placement Controller which computes a new application placement based on load-independent and load-dependent resource demand estimates for all applications. These estimates are obtained online using Application Demand Estimators.
In each control cycle, the Placement Controller computes a new placement matrix with the objective to satisfy the entire load-dependent demand of applications without exceeding load-dependent and load-independent resource capacities of the managed nodes. In addition, the Placement Controller works with the following considerations:
Once placement is computed by the Placement Controller, it is effected by the Placement Executor, which implements the placement by performing individual start and stop operations on the application servers. As each start and stop operation disrupts the system, the Placement Executor orders the individual placement changes in a manner that minimizes the disruption to the running applications. Placement Executor ensures that:
The Placement Executor delays the start and stop actions that would violate the above policies given actions that are already in progress. It monitors the progress of the initiated start and stop actions and reevaluates the actions that have been delayed. As soon as it is allowed by the policies, the Placement Executor initiates the delayed actions.
The Placement Executor implements placement changes using Placement Effecter. It monitors the status of the initiated changes and keeps track of placement changes effected by other managers or a user using the Placement Sensor.
The system we describe above is highly dynamic allowing various types of configuration changes, such as adding or removing a node, creating and destroying an application cluster, or changing node or cluster properties to be accommodated at runtime. In addition, all components are highly available.
This paper does not attempt to present all components of the system described in this section, although all of them have been implemented as a part of our project. We focus on the functionality of Placement Controller and in the following sections we discuss it in detail.

Placement control loop.
We formulate the placement problem as follows. We are given a
set of nodes,
, and write
for a member of that set. We
are also given a set of applications
, and write
for a
member of that set. With each node
we associate its load-independent and load-dependent capacities,
and
, which correspond to
the node's memory and CPU power, respectively. With each
application, we associate its load-independent and load dependent
demand values,
and
, which correspond to
the application's memory and CPU requirements, respectively. We
assume that
is the amount of
memory consumed by an instance of application
, which is independent of offered load. The
CPU-speed requirements of the application,
, depend on offered
load.
With each application we also associate a boolean flag
indicating
whether instances of the application can be started or stopped by
the placement controller. Also, with each application and each
node we associate the allocation restriction flag,
indicating whether an instance of the application
may be started on node
.
The placement problem is to find matrix
, where
is 1
if an instance of application
is
running on node
and 0
otherwise, such that (1) allocation restrictions are observed,
(2) load-independent capacity limits are observed, and (3) it is
possible to allocate load of all applications to instances
without exceeding load-dependent capacity limits. If a previous
placement matrix
exists, we also want to minimize the number of changes between
the old and the new matrix. If it is not possible to satisfy the
entire demand of applications, we aim at maximizing the amount of
satisfied demand. Finally, among all placements that maximize the
amount of satisfied demand, we want to find one that allows the
load allocated to nodes to be balanced.
Since we are dealing with multiple optimization objectives, we
prioritize them in the formal statement of the problem as
follows. Let
be a matrix of
real valued load placement and
.
| (6) |
| (7) |
The optimization problem solved by the Placement Controller is a variant of the Class Constrained Multiple-Knapsack Problem [11] (CCMKP) and differs from prior formulations of this problem mainly in that it also minimizes placement changes. Since this problem is NP-hard, the Placement Controller adopts a heuristic, which is presented in Section 3.2.
The Placement Controller avoids solving the optimization
problem if the current placement can serve the new demand. This
is achieved by first testing whether there exists a
load-distribution matrix for placement matrix
that satisfies constraints expressed by Eqns.
(2)-(5) (function canSolve in Figure
3). This test forms a familiar
linear-programming problem, for which a number of efficient
solutions may be found in literature [12,13].
If the current placement can serve the new demand, after
waiting
, the Placement
Controller proceeds to the next control cycle. Otherwise, the new
placement is computed and implemented. After placement is
changed, the Placement Controller does not resume the new cycle
for
. The increased
timeout provides sufficient time for the system to adjust to the
new placement and reduces the number of placement changes. Both
and
are configurable parameters with 1 minute and 15
minutes as their default values.
The basic algorithm consists of three parts: the residual placement, incremental placement, and rebalancing placement. The residual placement is used at the system start-up, when there is no prior placement, and as a subroutine of the incremental placement. The rebalancing placement is invoked at the end of incremental placement, once the maximizing placement is found, to modify it in a way that allows for a better distribution of load-dependent demand.
Residual placement is based on the following intuitive rule. If we look at allocated memory as the cost of allocating an application's CPU demand, it is wise to first place an application with the highest memory requirement compared to its CPU demand. This way, we can maximize the chances that this application will be placed on the fewest possible number of nodes, and thus the cost of satisfying its CPU demand will be the lowest. When choosing a node on which to place an application, it is reasonable to first search for a node with density similar to the density of the application. It is not wise to load applications with high density on a low density server, since we would be likely to reach the processing capacity constraint and leave a lot of memory unused on that server. Similarly, if low density applications are loaded on high density servers, we would be likely to reach the server's memory constraint without using much of the processing capacity.
In the residual placement, nodes are ordered by the increasing
value of their densities,
,
and applications are ordered by increasing densities
.
The algorithm starts with the lowest density application
and looks for the lowest density
node that can fit the application, i.e., a node
such that
and
is
true. If the node satisfies the entire load-dependent capacity of
, i.e.,
then the application is removed from the application list and the
node is added back to the ordered node list after its
load-dependent capacity is decreased by
. Otherwise, the
entire node's available load-dependent capacity is allocated to
the application, the node is removed from the node list, and the
algorithm searches for another node on which to place the
residual demand of the application. The algorithm proceeds until
the demand of all applications is assigned to nodes, or no more
assignments are possible given nodes' capacities. The
computational complexity of residual placement is
.
The incremental placement combines the residual placement with
the maximum flow computation to solve the placement problem while
minimizing the number of placement changes. It derives the new
placement
from the previous
placement
incrementally as
follows.
In a given iteration of the algorithm, we first check of the
current placement,
can
satisfy the demand. This is done by solving the maximum bipartite
flow problem represented by Eqns. (1-i) and (3)-(5). If
a solution exists, i.e.,
, the algorithm completes. Otherwise, we are left with some
residual demand for each application that could not be satisfied
given the current placement,
. For some applications this residual demand may be zero. We are
also left with some residual load-dependent and load-independent
capacity on each server,
and
. We apply the algorithm from Section 3.2.1 to find an allocation of the residuals.
If the residual placement succeeds in finding a feasible
allocation, we set
to the
resulting allocation and exit. Otherwise, we remove the
assignment with the lowest density,
from
and proceed to the next
iteration.
Observe, that the number of iterations performed by the
incremental algorithms depends on how hard it is to find the
satisfying placement. The problem is hard to solve if the total
demand of applications compared to the total available capacity
is high, or the total memory requirement of applications
approaches total memory available on nodes. The more difficult
the problem is to solve, the longer it takes for the algorithm to
complete. The upper bound on the number of iterations is the
number of assignments in the current placement, i.e.,
.
When applications in unmanaged mode are present, in the first iteration of the algorithm we adopt a slightly modified procedure. We first find the maximum load allocation for applications in unmanaged mode. Then we subtract the capacity utilized by them from the total available capacity. If any residual demand of these applications remains, it is ignored; the unmanaged applications are removed from further consideration and the algorithm proceeds according to the above procedure.
| (9) |
| (10) |
The above problem may be easily transformed into a min-cost
flow problem in a bipartite flow network for which a number of
polynomial time algorithms exist [13]. Note that if there exists a
load-distribution that achieves the perfect balance of node
utilization (utilization of all nodes equal to
), then the above optimization will find such a
load-distribution. When perfect load-distribution cannot be
found, the load-distribution found by the optimization problem
includes some nodes loaded above
as
a utilization threshold and some nodes loaded below
. The point is to change assignments in such a way
so as to allow shifting some load from overloaded nodes to
underloaded nodes. Only assignments in
may be moved.
Assignments that overlap with prior placement cannot be moved as
this could increase placement churn.
The rebalancing placement proceeds from the most overloaded
node and attempts to move some of its instances to one or more
underloaded nodes. We always choose an instance whose removal
brings the node utilization closest to
. This procedure continues until all nodes are
balanced or no more useful changes can be done.
When the algorithm described in the previous section is applied to the system in which load-dependent demand of applications is high compared to the total available load-dependent capacity, its execution time increases, and the ability to find the optimal placement decreases. We have observed that the same algorithm applied to a demand that is reduced to a certain percentage of the total available capacity not only executes faster, but is also more likely to produce placement that satisfies the entire real demand.
Another factor that impacts the effectiveness of the algorithm is the content of the previous placement matrix; the same algorithm applied to different prior placement matrices produces different results.
Taking these observations into account, we have implemented three simple variants of the placement algorithm.
In the simulation study, we vary the size of the problem (by varying the number of nodes and dynamic clusters) and the hardness of the placement problem that is being solved.
The hardness of the placement problem is affected by three factors: memory load factor, CPU load factor, and workload variability factor. When we ignore minimum/maximum instances policies, vertical stacking, and allocation restrictions, we can define the hardness factors as follows.
, where
In the experimental design, we vary the number of nodes and
dynamic clusters. Load independent capacity and demand values are
uniformly distributed over sets
and
, respectively
with average values of
and
, respectively.
Load-dependent capacity is the same for all nodes and is equal to
100.
For each computed placement, the percentage of satisfied
demand is calculated by first finding the load-distribution
matrix
that maximizes the amount
of satisfied demand given the computed placement, as defined
through Eqns. (3)-(5). Then we calculate
.
In Figs. 4, 5, and 6, we present experimental results for the fraction of demand satisfied, number of placement changes, and execution time, respectively. The results are presented with 95% confidence intervals. When memory load factor is high, e.g., MLF=100%, the algorithm is memory-constrained and regardless of the CPU load factor it usually fails to place all applications and consequently does not meet CPU demand. As shown in Fig. 4, in a very memory- and CPU- constrained scenario of MLF=100% and CPU=99%, the Multiple-runs Algorithm (MRA) and Load-reduction Algorithm (LA) appear to perform somewhat better than the Basic Algorithm (BA), however, all algorithms fail to meet the entire load-dependent demand. Also, they generate roughly the same (huge) number of placement changes (Fig. 5). From the perspective of execution time, extremely resource-constrained problems are particularly harsh on the Multiple-runs Algorithm because they trigger multiple algorithm iterations thereby elevating the placement computation time (Fig. 6).
However, when the memory constraint is relaxed to MLF=60%, the placement problem becomes easier to solve and the differences between algorithms more obvious. As shown in Fig. 4, MRA satisfies more demand than LRA and BA. This result is achieved with the same number of placement changes as with BA. LRA performs worse than MRA in terms of satisfying the demand but it proves better in reducing the placement churn (Figs. 4 and 5). The improved effectiveness of MRA comes at the price of multiplied execution time compared to LRA and BA. Our experiments have revealed that it not necessary to reduce MLF to as low as 60%. We have observed similar results, though not as prominent, at MLF=80% as well.
When memory and CPU constraints are further reduced (to say MLF=60% and CLF=90%), the placement problem becomes easy to solve and all algorithms perform equally well. Figs. 4-6 show the results obtained with MLF=40% and CLF=90%.

Fraction of demand satisfied by algorithms.

Number of placement changes.

Placement computation time.
In this section we evaluate the effectiveness of rebalancing placement applied to MRA. For this purpose, we compare two versions of this algorithm: with and without the rebalancing placement phase. We compare the algorithms while looking at load distribution in the initial placement, which starts with the empty set of placement assignments, and at the average of ten subsequent placements, which start with a non-empty set of prior placement assignments. Distinguishing these two cases is important as in the initial placement we typically have greater freedom in replacing instances than in subsequent placements, where only a subset of all instances is eligible for being moved.
To assess the effectiveness of rebalancing placement we must
consider that load distribution is performed by a router. It is
the effectiveness of the router that ultimately affects the
evenness of the load distribution. In this section, we assume a
router that can perfectly balance load within limits imposed by
the placement in effect. To obtain load distribution that is
achieved by such an ideal router, we need to solve the
optimization problem expressed by Eqns. (1-iii) and (3)-(5).
This non-linear problem may be quite easily solved as a sequence
of at most
problems defined by Eqns.
(8)-(11), which are straightforward to
linearize.
Once the most balanced distribution is known, we must measure
the amount of inequality in the distribution. As a good candidate
of such a measure, we consider the Gini index [14] (
), which is defined as a ratio of the area
between the line of perfect (uniform) distribution (45 degree
line) and the Lorenz curve of the actual distribution to the
total area below the 45-degree line. This coefficient is most
frequently used as a measure of income inequality. The Gini index
of 0 indicates perfect equality. The Gini index approaches one to
represent extreme imbalance.
We evaluate the effectiveness of rebalancing placement in a
network composed of 10 nodes with load-independent demand
uniformly distributed over a set
.. We compute placement
for a system with 10, 30, and 50 applications with identical
memory requirements of 0.4. This gives us MLF values of 16%, 48%,
and 80%, respectively. We vary CLF from 5% to 95%.
Gini index of initial placement and the average Gini index for MLF=16%.
Gini index of initial placement and the average Gini index for MLF=48%.
Gini index of initial placement and the average Gini index for MLF=80%.
Figs. 7-9 show initial and average Gini indices for MLF values of 16%, 48%, and 80%, respectively.
Our study reveals that the potential improvement in load balance depends on the number of degrees of freedom the rebalancing placement has while moving instances. For low values of MLF, we deal with only a small number of instances that can be moved with relatively high fraction of total demand allocated to each of them. For high values of MLF, we deal with many instances with potentially small fraction of total load allocated to them but we have little extra space to move these instances to. In addition, high MLF implies that load allocated to each instance is also high. Oftentimes, we can only improve balance by creating an additional instance for an application and distributing the application load across the increased number of instances. However, doing so would result in increasing the disruption to the system and is not considered by the rebalancing algorithm even when applied to initial placement.
Our study also shows that initial placement typically has a much better potential for distributing load than subsequent placements. This observation concerns both the algorithm with rebalancing and the one without it. This difference results from the way we prioritized the objectives: we effect the fewest number of changes to the previous placement that are necessary to satisfy the entire demand even though the resultant load distribution may be unbalanced.
Our study also concerns the analysis of rebalancing cost, which may be quantified as an increase in placement time. However, our study shows that this cost is negligible. In a system of 10 nodes and 30 applications, the rebalancing phase takes less than 10 msec and is invoked only when placement changes are suggested by the algorithm. In configurations where placement changes are frequent (for example with high CLF), the rebalancing time is only a small fraction of the overall execution time.
Placement rebalancing affects not only the placement time, but also the number of placement changes. Even though our rebalancing algorithm does not directly introduce any additional placement changes, it modifies the input to the subsequent control cycle. This modified input may lead to a placement with a higher or lower number of placement changes than if no rebalancing was performed. However, in our study we could not find any statistically significant difference in the average number of placement changes between the two algorithms.
The problem of application placement in a shared cluster of servers has been investigated by Urgaonkar at el. [1]. In their formulation, instances of an application that execute on different hardware servers, which are called capsules, are managed independently of one another. Capsule resource requirements are estimated independently based on prior resource usage by a capsule. Then capsules are packed on servers in such a way so as not to exceed each server's capacity. Independent packing of capsules is a rather inflexible mechanism of application placement as it does not allow the number of capsules serving a given application to be changed or the load to be shifted from one capsule to another. Another issue with the approach presented in [1] is that capsule requirements are characterized using a single value that can represent either CPU or network bandwidth requirements. The approach introduced in this paper allows applications to be placed based also on their memory requirements. Finally, the technique in [1] ignores previous capsule placement while finding the new placement. It is therefore unsuitable for frequent online placement modifications, which is a goal of the technique described in this paper.
In the work of Ardagna et al. [4], the placement problem aims at minimizing the costs of running the server machines: each machine has a cost, and it is paid if any tier of any application is placed there. This objective is different from ours. Unlike Ardagna et al., we must be concerned with the possibility that some demands may not be met and therefore try to maximize the amount of demand that can be satisfied. On the other hand, our approach does not consider the cost of running the server machines. There are more differences between our problem statements: they accept co-residency prohibitions of a certain form; our formulation does not have this feature; our formulation includes a desire to minimize changes from an existing placement, while their formulation has no such concern; they are concerned with a larger resource allocation problem that also includes routing and scheduling, while we solve placement problem independently..
Wolf et al. in [2] propose an algorithm that maps Web sites to cluster servers allowing an overlap between Web-site clusters. The algorithm proposed in [2] attempts to minimize the number of placement changes by exploring new configurations starting from those that differ no more than one Web-site-to-server assignment from the current placement and limiting the total number of placement changes to a configured number. Unlike the technique proposed in this paper, Wolf et al. [2] focus on a single-resource capacity constraints. Moreover, their optimization technique is intended to be used infrequently (about once a week) and it may be executed on a dedicated server. Hence, there is no concern about its computational overhead. In the problem discussed in this paper, the application placement is re-evaluated frequently (in the order of minutes) and the computation is co-located with applications. Therefore, its computational overhead must be very small.
Our work can be also compared to the work on Cluster Reserves by Aron et al. [3], where an application and application instance considered in our paper correspond to a service class and a cluster reserve in [3]. The resource allocation problem solved in [3] is to compute the desired percentage of each node's capacity to be allocated to each service class based on overall cluster utilization ratios desired for each service class. This problem differs from the problem solved in our paper in several aspects. First, we aim at finding the best placement of application instances on nodes rather than an optimal allocation of each node's capacity. Our technique does compute such an allocation as a side-effect. However, this allocation is not enforced, meaning that we only compute it to determine if it is feasible to allocate applications' demand to nodes with a given placement. The actual resource allocation is determined by the request router at a much finer time scale, which improves the system responsiveness to workload changes. Second, we aim at satisfying the most of the offered demand rather than keeping the relative resource allocation as close to the goal as possible. Third, our problem takes into account the limited memory capacity of nodes, which constraints the number of application instances that may be placed on each of them. Finally, the objectives of our optimization problem include minimizing changes from the previous allocation, which is not the case in [3].
The problem of optimally placing replicas of objects on servers, constrained by object and server sizes as well as capacity to satisfy a fluctuating demand for objects, has appeared in a number of fields related to distributed computing. In managing video-on-demand systems, replicas of movies are placed on storage devices and streamed by video servers to a dynamic set of clients with a highly skewed movie selection distribution. The goal is to maximize the number of admitted video stream requests. Several movie placement and video stream migration policies have been studied. A disk load balancing criterion which combines a static component and a dynamic component is described in [17]. The static component decides the number of copies needed for each movie by first solving an apportionment problem and then solving the problem of heuristically assigning the copies onto storage groups to limit the number of assignment changes. The dynamic component solves a discrete class-constrained resource allocation problem for optimal load balancing, and then introduces an algorithm for dynamically shifting the load among servers (i.e. migrating existing video streams). A placement algorithm for balancing the load and storage in multimedia systems is described in [18]. The algorithm also minimizes the blocking probability of new requests.
In the area of parallel and grid computing, several object placement strategies (or, meta-scheduling strategies) have been investigated [19,20]. Communication overhead among objects placed on various machines in a heterogeneous distributed computing environment plays an important role in the object placement strategy. A related problem is that of replica placement in adaptive content distribution networks [21,20]. There the problem is to optimally replicate objects on nodes with finite storage capacities so that clients fetching objects traverse a minimum average number of nodes in such a network. The problem is shown to be NP-complete and several heuristics have been studied, especially distributed algorithms.
Similar problems have been studied in theoretical optimization literature. The special case of our problem with uniform memory requirements was studied in [22,23] where some approximation algorithms were suggested. Related optimization problems include bin packing, multiple knapsack and multi-dimensional knapsack problems [24].
Our work can be improved in a number of ways. First, our algorithm does not explicitly model a cost of a placement change. We only assume that this cost is high and therefore worth minimizing. A possibly better approach could model the impact of a placement change on the performance of applications. Instead of minimizing cost, it would attempt to maximize the overall system utility. This new approach would be particularly suitable when application start or stop mechanism is a light-weight one, e.g., with the usage of OS-level virtualization technology.
Second, we do not prioritize applications. If the entire demand cannot be satisfied, some applications will be affected either by their increased execution time, or increased waiting time, or increased rejection rate. We do not attempt to control which applications are affected this way.
Third, our rebalancing technique aims at equalizing CPU utilization across nodes. However, with heterogeneous nodes, better application performance may be achieved if response times of an application on different nodes are equalized instead. Our technique does not model or manage response times as a function of node utilization.