New Challenges in Scheduling Theory

March 29 - April 2, 2016
Centre CNRS "Paul-Langevin", Aussois, France


List of Abstracts

Finding Perfect Matchings in Bipartite Hypergraphs

SpeakerChidambaram Annamalai

Haxell's condition is a natural hypergraph analog of Hall's condition, which is a well-known necessary and sufficient condition for a bipartite graph to admit a perfect matching. That is, when Haxell's condition holds it forces the existence of a perfect matching in the bipartite hypergraph. Unlike in graphs, however, there is no known polynomial time algorithm to find the hypergraph perfect matching that is guaranteed to exist when Haxell's condition is satisfied.
We prove the existence of an efficient algorithm to find perfect matchings in bipartite hypergraphs whenever a stronger version of Haxell's condition holds. Our algorithm can be seen as a generalization of the classical Hungarian algorithm for finding perfect matchings in bipartite graphs. The techniques we use to achieve this result could be of use more generally in other combinatorial problems on hypergraphs where disjointness structure is crucial, e.g. Set Packing.


Minimizing Makespan, Robustness, and Access Attempts in Randomized Backoff

SpeakerMichael Bender

In this talk we revisit the classical problem of randomized exponential backoff on a multiple-access channel by viewing it as a scheduling problem. We analyze randomized backoff with respect to throughput, robustness to jamming or failures, and the number of attempts to access the channel.

slides(pdf) 

Convex Allocations under IO Constraints

SpeakerRaphaël Bleuse

The trend in supercomputers is to integrate a unique and multi-purpose interconnection network. Such a network carries both I/O traffic and internal communications of jobs. Using convexity for allocation as a mean to mitigate inter-jobs interaction, how can we take into account I/O traffic? In this talk, we will present preliminary complexity results for this problem.

slides(pdf) 

Scheduling Star Observations on a Telescope

SpeakerNadia Brauner

The best telescopes are scarce and expensive resources: it is therefore essential to maximize the scientific potential of the observations that are to be scheduled on a telescope within some granted observation time. Observations must be chosen from a list of candidate targets with various observation constraints. We study the practical case when the duration of an observation is at least half that of its observation window. We shall mainly concentrate on complexity proofs that help deriving efficient solution methods. Then, we propose algorithms to solve it. Through experiments on real and realistic datasets, we show that the method provides optimal solutions very efficiently.


Controlling and Assessing Correlations of Cost Matrices in Heterogeneous Scheduling

SpeakerLouis-Claude Canon

Bias in the performance evaluation of scheduling heuristics has been shown to undermine the scope of existing studies. Improving the assessment step leads to stronger scientific soundness when designing new optimization strategies. This paper considers the problem of allocating independent tasks to unrelated machines such as to minimize the maximum completion time. Testing heuristics requires to generate cost matrices, each of which specifying the execution duration of a set of tasks on a set of machines. Numerous studies showed that the task and machine heterogeneities belong to the properties impacting the heuristics performance the most. This study focuses on orthogonal properties, the average correlations between each pair of rows and each pair of columns, as a proximity measure with uniform instances. Cost matrices generated with two distinct novel generation methods show the effect of these correlations on several heuristics from the literature. In particular, EFT performance depends on whether the tasks are more correlated than the machines and HLPT performs better when the both correlations are close to one. This is joint work with Laurent Philippe.

slides(pdf) 

Cyclic Day-on day-off Scheduling Problems

SpeakerPatrick De Causmaecker

We consider day-on-day-of scheduling problems with fixed lower and upper bounds for the numbers of days on an days off. We review some classical literature and complexity results. We then focus on the case where the only allowed schedules for any member of personnel are cyclic permutations of a given schedule. We give a characterization of feasible schedules, an analysis of the number of schedules which, while rising exponentially with the length of the schedule, shows an oscillating behavior. We derive lower and upper bound for the general case a reduction to consecutive ones subproblems. We prove that in specific cases, the problem can be solved in polynomial time. We are reporting on work in progress and are open for discussion. This is joint work with Wieslaw Kubiak and Pieter Smet.


On the Online Machine Minimization Problem

SpeakerLin Chen

We consider the online machine minimization problem in which jobs with hard deadlines arrive online over time at their release dates. The task is to determine a feasible preemptive schedule on a minimum number of machines. Our main result is a general O(\log m)-competitive algorithm for the online problem, where m is the optimal number of machines used in an offline solution. This is the first improvement on an intriguing problem in nearly two decades. To date, the best known result is a O(\log (p_{\max}/p_{\min}))-competitive algorithm by Phillips et al. (STOC 1997) that depends on the ratio of maximum and minimum job sizes, p_{\max} and p_{\min}. Even for m=2 no better algorithm was known. Our algorithm is in this case constant-competitive. When applied to laminar or agreeable instances, our algorithm achieves a competitive ratio of O(1) even independently of m.
The following two key components lead to our new result. Firstly, we derive a new lower bound on the optimum value that relates the laxity and the number of jobs with intersecting time windows. Then, we design a new algorithm that is tailored to this lower bound and balances the delay of jobs by taking the number of currently running jobs into account.


Detecting Service Provider Alliances on the Choreography Enactment Pricing Game

SpeakerDaniel Cordeiro

In this work we present the choreography enactment pricing game, a cooperative game-theoretic model for the study of scheduling of jobs using competitor service providers. A choreography (a peer-to-peer service composition model) needs a set of services to fulfill its jobs requirements. Users must choose, for each requirement, which service providers will be used to enact the choreography at lowest cost. Due to the lack of centralization, vendors can form alliances to control the market. We show a novel algorithm capable of detecting alliances among service providers, based on our study of the bargaining set of this game.
This is a joint-work with Johanne Cohen (CNRS) and Loubna Echabbi (INPT, Morocco).

slides(pdf) 

A Challenge of Packing CSS-sprites

SpeakerMaciej Drozdowski

In this talk we will present a CSS-sprite packing problem. CSS-sprites are constructed by combining many small graphical infrastructure elements of a web page. CSS-sprites are built to reduce network transfer times. CSS-sprite packing problem involves geometric packing, image compression and communication performance modeling. Benchmarking of real user web browsers communication performance has been conducted. A new method method, called Spritepack, is proposed and evaluated. This is joint work with Jakub Marszałkowski, Jan Mizgajski and Dariusz Mokwa.

slides(pdf) 

Scheduling Models and Algorithms for the Orderly Colored Longest Paths

SpeakerGiovanni Felici

An Orderly Colored Longest Path (OCLP) problem amounts to finding the longest path in a graph whose edges are colored with a given number of colors, under the constraint that the path follows a given order of colors. To solve OCLP, integer programming algorithms have been formulated by means of max flow models with packing constraints and cycle elimination.
The problem, originated from a bioinformatic application, turns out to be suited for the modeling of several optimization problems arising in the scheduling literature. This includes, problems where operations belonging to different classes have to be executed respecting a given order among classes, and in rostering problems, where feasible rosters have to respect a given order of the shifts. Additionally, complex pick-up-and-delivery routing problems may use the sequencing constraints to represent feasible sequences of operations.
In this work we present alternative formulations of the problem inspired by its scheduling interpretation, and discuss these variants both from the modeling and the computational effectiveness standpoints. The different formulations of OCLP are discussed both from the modeling and from the computational effectiveness standpoint, by means of an extensive experimental experiments on properly designed instances.
This is a joint work with Gaurav Singh and Marta Szachniuk.

slides(pdf) 

Scheduling Non-Unit Jobs to Minimize Calibrations

SpeakerJeremy Fineman

The Integrated Stockpile Evaluation (ISE) problem extends a classic offline scheduling problem where n jobs, each with arrival times and deadlines, must be scheduled non preemptively on m machines. The additional constraint in the ISE problem is that a machine may only be used if it has been calibrated recently. The goal is to minimize the number of calibrations necessary to complete all the jobs before their deadlines.
This talk presents an approximation algorithm for the general case of the ISE problem where each job may have different processing times. (Prior work was restricted to unit processing times.) It turns out that the ISE problem generalizes the classic problem of finding the minimum number of machines necessary to schedule all jobs nonpreemptively by their deadlines. This work shows that the machine-minimization bounds are indeed attainable for the ISE problem. That is, suppose that we have a black-box s-speed \alpha-approximation for the machine-minimization problem. Then our ISE algorithm finds an O(\alpha)-approximation for the number of calibrations when using O(\alpha m) machines at s speed.


Recoverable Robustness in Scheduling Problems

SpeakerHan Hoogeveen

Solving optimization problems is normally done with deterministic data. These data are however not always known with certainty. Recoverable robustness handles this uncertainty by modeling it in different scenarios that occur with a given probability. A solution then consists of a solution for the undisturbed case that can be made feasible for each scenario by applying a given, simple recovery algorithm to it. The value of this solution is then equal to the expected value of the realized solution.
In my talk, I will apply the concept of recoverable robustness to the problem of minimizing the number of late jobs on one machine where the processing times of the jobs are uncertain. Furthermore, I will present some results for scheduling problems with stochastic processing times in which rejection is allowed.


Extensions of Resource Allocation Problems Motivated by Smart Grids

SpeakerJohann Hurink

Resource allocation play a pivotal role in optimization problems in many fields. We consider variants of the problem motivated by applications from Smart Grids. While general solution techniques rely on the differentiability of the objective functions, we formulate necessary and sufficient optimality conditions without relying on differentiability. Furthermore, we introduce an extension of the general resource allocation problem by adding cumulative bounds on the variables. We show that this extension can be solved by a recursive method solving an original resource allocation problem in each of the O(n) iteration. Finally, we consider discrete versions of both the original resource allocation problem and the newly introduced extension. We show NP hardness of both discrete versions and consider a relaxation based on convex combination instead for which we formulate efficient algorithms.

slides(pdf) 

A Multi Criteria Decision Making Model for Vehicle Routing Problem in Reverse Logistics

SpeakerKingshuk Islam

Reverse logistics (RL) is a relatively new topic area in logistics and supply chain management. Key reasons for its gaining an increasing attention include new and stricter governmental regulations on environmental issues, waste collection and disposal, the increasing demand for low cost recycled products and the profitability of recycled materials. There are more and more research contributions reported in the literature that highlight the importance of efficient planning and control of the RL process and also include the optimisation of vehicle routing. On the other hand, the fuel consumption and carbon emission are seen as key features of a vehicle routing problem as they directly affect the carbon emission certificates that are monitored by the governments in many European countries. In this study, we investigate the vehicle routing problem in reverse logistics with simultaneous pickups and drop-offs. The vehicle routing problem is defined as a pollution routing problem that considers not only the travel distances but also the CO2 emissions, travel times and travel costs. A multi criteria decision making model is developed to suggest the best possible route for the vehicles in a reverse logistic programme with the aim of minimising the CO2 emission and the relevant costs related to the vehicle routing. Simulations under different scenarios are carried out and the effect of changing the values of relevant variables on the defined criteria is reported.

slides(pptx) 

On Bi-level Approach for Scheduling Problems

SpeakerEmmanuel Kieffer

Hierarchical optimization is concerned with several nested levels of optimization problems binding decision makers. Bi-level optimization is a particular case involving two nested problems representing two decision makers who control their own set of decision variables. The first decision maker referred to as the leader takes the first decision which restricts the second decision maker referred to as the follower. In response to it, the follower will try to react optimally to the leader's decision. This modeling pattern may lead to collaboration or competition between them. Closely related to Game Theory (Stackelberg games), bi-level strategies are more realistic since they do not overestimate the objective fitness when decision makers may have an impact on each other. Bi-levels modeling has been proposed for different kinds of problems (e.g. supply-chain management, network optimization, structural optimization). One of the most studied bi-level problem is the Toll setting problem which consists in finding optimal toll locations knowing that network users try to minimize their travel cost. By considering the possible reactions of the network users, the authority operating tolls is able to maximize its revenue and avoid a situation discouraging network users to take highways. Despite the fact that the literature on scheduling is very rich, few scheduling problems have been modeled using bi-level representations. We propose here a survey on scheduling using bi-level models and show the necessity to develop new optimization tools to solve them. This is joint work with Gregoire Danoy and Pascal Bouvry.

slides(pdf) 

Structural Properties of an Open Problem in Preemptive Scheduling

SpeakerWieslaw Kubiak

Structural properties of optimal preemptive schedules have been studied in a number of recent papers with a primary focus on two structural parameters: the minimum number of preemptions necessary, and a tight lower bound on shifts, i.e., the sizes of intervals bounded by the times created by preemptions, job starts, or completions. These two parameters have been investigated for a large class of preemptive scheduling problems, but so far only rough bounds for these parameters have been derived for specific problems. This paper sharpens the bounds on these structural parameters for a well-known open problem in the theory of preemptive scheduling: Instances consist of in-trees of n unit-execution-time jobs with release dates, and the objective is to minimize the total completion time on two processors. This is among the current, tantalizing "threshold" problems of scheduling theory: Our literature survey reveals that any significant generalization leads to an NP-hard problem, but that any significant, but slight simplification leads to tractable problem with a polynomial-time solution.
For the above problem, we show that the number of preemptions necessary for optimality need not exceed 2n-1; that the number must be of order \Omega(\log n) for some instances; and that the minimum shift need not be less than 2^{-2n+1}. These bounds are obtained by combinatorial analysis of optimal preemptive schedules rather than by the analysis of polytope corners for linear-program formulations of the problem, an approach to be found in earlier papers. The bounds immediately follow from a fundamental structural property called normality, by which minimal shifts of a job are exponentially decreasing functions. In particular, the first interval between a preempted job's start and its preemption must be a multiple of 1/2, the second such interval must be a multiple of 1/4, and in general, the i-th preemption must occur at a multiple of 2^{-i}. We expect the new structural properties to play a prominent role in finally settling a vexing, still-open question of complexity.
This a joint work with: Bo Chen, Ed Coffman, and Dariusz Dereniowski.

slides(pdf) 

A Comparison of Dragonfly Global Link Arrangements and their Implications for the Scheduling of Communication

SpeakerVitus Leung

High-performance computing systems are shifting away from traditional interconnect topologies to exploit new technologies and to reduce interconnect power consumption. The Dragonfly topology is one promising candidate for new systems, with several variations already in production. It is hierarchical, with local links forming groups and global links joining the groups. At each level, the interconnect is a clique, with a link between each pair of switches in a group and a link between each pair of groups.
We show that the intergroup links can be made in meaningfully different ways. We evaluate these three previously-proposed approaches for link organization (called global link arrangements) in two ways. First, we use bisection bandwidth, an important and commonly-used measure of the potential for communication bottlenecks. We show that the global link arrangements often give bisection bandwidths differing by 10s of percent, with the specific separation varying based on the relative bandwidths of local and global links. For the link bandwidths used in a current Dragonfly implementation, it is 50%. Second, we show that the choice of global link arrangement can greatly impact the regularity of task mappings for nearest neighbor stencil communication patterns, an important pattern in scientific applications, that allow the scheduling of communication in phases to balance traffic and minimize contention.
This is joint work with Emily Hastings, David Rincon-Cruz, Marc Spehlmann, Sofia Meyers, Anda Xu, and David Bunde.

slides(pdf) 

Flow Shop for Dual CPUs with Dynamic Voltage Scaling

SpeakerMinming Li

We study the following flow shop scheduling problem on two processors. We are given n jobs with a common deadline D, where each job j has workload p_{i,j} on processor i and a set of processors which can vary its speed dynamically. Job j can be executed on the second processor if the execution of job j is completed on the first processor. Our objective is to find a feasible schedule such that all jobs are completed by the common deadline D with minimized energy consumption. We present a linear program for the discrete speed case, where the processor can only run at specific speeds in S = \{s_1,s_2,...,s_q\} and when the job execution order is fixed. We also provide an m^{\alpha-1}-approximation algorithm for the arbitrary order case where m is the number of processors and \alpha is a parameter of the processor. We then introduce a new variant of flow shop scheduling problem called sense-and-aggregate model motivated by data aggregation in wireless sensor networks where the base station needs to receive data from sensors and then compute a single aggregate result. In this model, the first processor will receive unit size data from sensors and the second processor is responsible for calculating the aggregate result. The second processor can decide when to aggregate and the workload that needs to be done to aggregate x data will be f(x) and another unit size data will be generated as the result of the partial aggregation which will then be used in the next round aggregation. Our objective is to find a schedule such that all data are received and aggregated by the deadline with minimum energy consumption. For this model, we present an O(n^5) dynamic programming algorithm when f(x)=x and a greedy algorithm when f(x)=x-1.

slides(pdf) 

Online Non-preemptive Scheduling in a Resource Augmentation Model based on Duality

SpeakerGiorgio Lucarelli

Resource augmentation is a well-established model for analyzing algorithms, particularly in the online setting. It has been successfully used for providing theoretical evidence for several heuristics in scheduling with good performance in practice. According to this model, the algorithm is applied to a more powerful environment than that of the adversary. Several types of resource augmentation for scheduling problems have been proposed up to now, including speed augmentation, machine augmentation and more recently rejection. In this paper, we present a framework that unifies the various types of resource augmentation. Moreover, it allows generalize the notion of resource augmentation for other types of resources. Our framework is based on mathematical programming and it consists of extending the domain of feasible solutions for the algorithm with respect to the domain of the adversary. This, in turn allows the natural concept of duality for mathematical programming to be used as a tool for the analysis of the algorithm's performance. As an illustration of the above ideas, we apply this framework and we propose a primal-dual algorithm for the online scheduling problem of minimizing the total weighted flow time of jobs on unrelated machines when the preemption of jobs is not allowed. This is a well representative problem for which no meaningful online algorithm with performance guarantee is known, while several strong lower bounds exist in the classical setting or even with speed augmentation. Specifically, we present an (1+\epsilon_s)-speed O(1/(\epsilon_s \epsilon_r))-competitive algorithm if we are allowed to reject jobs whose total weight is an \epsilon_r-fraction of the weights of all jobs, for any \epsilon_s>0 and \epsilon_r \in (0,1). Moreover, we extend the idea for analysis of the above problem and we propose an (1+\epsilon_s)-speed \epsilon_r-rejection O(k^{(k+3)/k}/(\epsilon_{r}^{1/k} \epsilon_{s}^{(k+2)/k}))-competitive algorithm for the more general objective of minimizing the weighted \ell_k-norm of the flow times of jobs.
This is joint work with Nguyen Kim Thang, Abhinav Srivastav and Denis Trystram.

slides(pdf) 

Evaluation of Data Locality and Communication Volume in MapReduce

SpeakerLoris Marchal

In this work, we focus on how MapReduce allocates tasks to processing resources in relation to the allocation of file chunks on data storage. We particularly focus on Map tasks and study the locality of tasks using or not chunk replication. We obtain simple formulas for the volume of communications and the total makespan when communications are not allowed at runtime. This allows to better tuned parameters such as the replication factor to optimize both data locality and makespan.

slides(pdf) 

Online Scheduling with Calibrations

SpeakerSam McCauley

Past work has examined scheduling on machines that require expensive calibrations. In this model, a machine can only process jobs once it has been calibrated at a large cost. After a fixed period of time, the machine becomes uncalibrated, and must be calibrated again before it can resume processing. This model attempts to reflect maintenance cost for high-precision machines in sensitive scientific and industrial applications. We study classical scheduling on these machines: in short, the algorithm is given a series of jobs with release times, which must be scheduled to minimize an objective function.
We study the online case, where the algorithm does not know of jobs until their actual release time. Our goal is to minimize the total flow time, plus the number of calibrations (each of which has a large constant cost). We give a constant-competitive algorithm in this model. On a single machine, this analysis comes from a charging argument; if multiple machines are used, we use an LP relaxation to prove the bounds. Our results stem in part from structural lemmas used in an efficient dynamic program for the offline case. This is joint work with Vincent Chau, Minming Li, and Kai Wang.


Optimal Replenishment under Price Uncertainty

SpeakerEsther Mohr

We aim to find optimal replenishment decisions without having the entire price information available at the outset. Although it exists, the underlying price distribution is neither known nor given as part of the input. Under the competitive ratio optimality criterion, we design and analyze online algorithms for two related problems. Besides the reservation price based decision how much to buy we additionally consider the optimal scheduling of orders. We suggest an online algorithm that decides how much to buy at the optimal point in time. Results show that the problem of finding a replenishment strategy with best possible worst-case performance guarantees can be considered as an extension of the online time series search problem.


Scheduling Parallel DAG Jobs Online to Minimize Average Flow Time

SpeakerBenjamin Moseley

In this talk, we discuss the problem of scheduling parallelizable jobs online with an objective of minimizing average flow time. Each parallel job is modeled as a DAG where each node is a sequential task and each edge represents dependence between tasks. Previous work has focused on a model of parallelizability known as the arbitrary speed-up curves setting where a scalable algorithm is known. However, the DAG model is more widely used by practitioners, since many jobs generated from parallel programming languages and libraries can be represented in this model. However, little is known for this model in the online setting with multiple jobs. The DAG model and the speed-up curve models are incomparable and algorithmic results from one do not immediately imply results for the other. Previous work has left open the question of whether an online algorithm can be O(1)-competitive with O(1)-speed for average flow time in the DAG setting. This talk will give the first a scalable algorithm which is (1+\varepsilon)-speed O(1/\varepsilon^3)-competitive for any \varepsilon >0. We further introduce the first greedy algorithm for scheduling parallelizable jobs. Greedy algorithms are among the most useful in practice due to their simplicity. We show that this algorithm is (2+\varepsilon)-speed O(1/\varepsilon^4)-competitive for any \varepsilon >0.

slides(ppt) 

Minimizing Rental Cost for Multiple Recipe Applications in the Cloud

SpeakerJean-Marc Nicod

Clouds are more and more becoming a credible alternative to parallel dedicated resources. The pay-per-use pricing policy however highlights the real cost of computing applications. This new criterion, the cost, must then be assessed when scheduling an application in addition to more traditional ones as the completion time or the execution flow. In this paper, we tackle the problem of optimizing the cost of renting computing instances to execute an application on the cloud while maintaining a desired performance (throughput). The target application is a stream application based on a DAG pattern, i.e., composed of several tasks with dependencies, and instances of the same execution task graph are continuously executed on the instances. We provide some theoretical results on the problem of optimizing the renting cost for a given throughput then propose some heuristics to solve the more complex parts of the problem, and we compare them to optimal solutions found by linear programming.
This is joint work with Fouad Hana, Loris Marchal, Laurent Philippe, Veronika Sonigo and Hala Sabbah.

slides(pdf) 

Minimizing Energy Costs in Data Centers by Applying Automated Demand Response and Renewable Energy Sources

SpeakerAriel Oleksiak

Data centers consume large amounts of energy. In parallel, power grid operators are struggling with reduction of peak energy demands. To cope with this problem automated demand response (ADR) techniques are developed. As data centers are precisely monitored and controlled they are well suited to the participation in ADR programmes. However, shapes of loads in data centres may be variable and not fitting the energy supply.
We propose to apply ADR to data centers by adequate scheduling of workloads and the partial use of renewable energy sources (RES) during peak hours. We assign priorities to jobs to limit impact of ADR actions on required performance. As the energy produced by renewable sources may be very variable we investigate the use of energy storage. We present a model, heuristics that minimise overall energy cost and experimental results. We discuss profits for data centers that may come from participation in the demand response programme and the use of renewables energy sources.


Logistic Model for the Multi-deliveries Distribution of Perishable Goods

SpeakerGrzegorz Pawlak

Logistic model for the multi-deliveries perishable goods requires meeting many exacting conditions and is still a subject of research within logistic business. The purpose is to handle a set of customers located in various locations while maintaining constraints such as limited number of vehicles and their capacity, planning loading and unloading, with the requisite of minimizing costs for a single business day.
For the purposes of this research original solution arbitrarily defines groups of distribution points where one point may belong to several groups. Each group has assigned certain types of vehicles and its own priority. Each type of vehicles is described by a specific group of characteristics.
Output parameters include the list of supplies that is optimized in terms of number of kilometers. A single delivery is expressed in the form of assignment of pallets to the vehicle and a list of the distribution points. Pallets for single supply are arranged in the reverse order of distribution points.
The research compares results of proposed methods with the historical data from the practical application.
This is joint work with Gaurav Singh.


A Multi-objective Genetic Algorithm for Energy-aware Production Scheduling

SpeakerSanja Petrovic

The increasing interest in sustainability of manufacturing and higher energy prices have created a new pressure on modern manufacturing companies. Their aim becomes not just to have an efficient production schedule in terms of the standard performance measures, but also to minimize the energy consumption. This can be achieved by changing the production parameters, typically the speed of a machine. Another way to do this is by intelligent scheduling, which considers idle times on the machines. Idle machines also consume energy and generally the energy saving can be achieved by minimizing the idle times on the machines. However, the problem can be addressed better if the machine is turned off when it is idle. In this research study, we develop a multi objective model for job shop scheduling, which considers the processing and non-processing electricity consumption of machines. The non-processing electricity consumption is incurred with the machine start-up, while machine is idle and with the machine shut-down. It is economically justified to turn off a machine if the idle time is "long enough" to save on energy consumption. The novelty in the developed multi-objective genetic algorithm for job shop scheduling is in creating more opportunities for turning off idle machines while maintaining the production performance, i.e. the total weighted tardiness of jobs. The genetic algorithm aims to generate operation sequence on each machine in such a way as to integrate fragmented short idle periods into longer periods, thus creating opportunities for the machine to be turned off. The schedule builder uses semi-active schedules. The genetic algorithm is based on NSGA-II and introduces two new steps to expand the solution pool and then select the elite solutions. The algorithm is validated on job shop problem instances and demonstrates its effectiveness. This is joint work with Ying Liu, Haibo Dong and Niels Lohse.


A New Scheduling Problem Motivated by Moving-target Cyberdefense

SpeakerCynthia Phillips

We introduce a new scheduling problem motivated by cybersecurity. We briefly describe PLADD (Probabilistic Learning Attacker, Dynamic Defender), an abstract game that approximates the competition for a resource between the resource owner (defender) and an attacker. PLADD is an extension of the FlipIt resource competition game, incorporating elements of moving-target defenses in cyber systems. The goal of these models is to develop optimal defender strategies. One way to study defense strategies is to sample attack scenarios from the probability distributions representing the attacker. We can then compute the optimal sequence of defender actions for that finite attack scenario set using stochastic programming (SP). One natural representation of the defender scheduling problem is as a generalized disjunctive program (GDP). We describe ways that the Pyomo modeling language can represent both GDP and SP problems, automatically converts GDP to integer programs, and either expand the SP to the deterministic equivalent or solve it iteratively through automatic decomposition. We give simple computational results for the FlipIt stochastic programming model. The PLADD model stochastic program is currently not tractable. Alternatively, we can also express the PLADD version of the problem as a combinatorial scheduling problem. One must schedule global start times for a set of parallel machines to minimize total idle time. This problem is currently wide open.
This is joint work with Stephen Jones, Alexander Outkin, Jared Gearhart, Jacob Hobbs, John Siirola, Stephen Verzi, Daniel Tauritz, Samuel Mulder, and Asmeret Naugle.


Approximation Schemes for Machine Scheduling with Resource (In-)Dependent Processing Times

SpeakerMalin Rau

We consider two related scheduling problems: resource constrained scheduling on identical parallel machines and a generalization with resource dependent processing times. In both problems, jobs require a certain amount of an additional resource and have to be scheduled on machines minimizing the makespan, while at every point in time a given resource capacity is not exceeded. In the first variant of the problem the processing times and resource amounts are fixed, while in the second the former depends on the latter. We present asymptotic fully polynomial approximation schemes (AFPTAS) for the problems: For any \epsilon>0 a schedule of length at most (1+\epsilon) times the optimum plus an additive term of O(1/\epsilon^2) is provided, and the running time is polynomially bounded in 1/\epsilon and the input length. Up to now only approximation algorithms with constant approximation ratios were known. This is joint work with Klaus Jansen and Marten Maack (Kiel).

slides(pdf) 

Partition with Side Effects

SpeakerKrzysztof Rzadca

In data centers, many tasks (services, virtual machines or computational jobs) share a single physical machine. We propose a new resource management model for such colocation. Our model uses two parameters of a task-its size and its type-to characterize how a task influences the performance of the other tasks allocated on the same machine. As typically a data center hosts many similar, recurring tasks (e.g.: a webserver, a database, a CPU-intensive computation), the resource manager should be able to construct these types and their performance interactions. Moreover, realistic variants of our model are polynomially-solvable, in contrast to the NP-hard vector packing used previously. In particular, we minimize the total cost in a model in which each task's cost is a function of the total sizes of tasks allocated on the same machine (each type is counted separately). We show that for a linear cost function the problem is strongly NP-hard, but polynomially-solvable in some particular cases. We propose an algorithm polynomial in the number of tasks (but exponential in the number of types and machines); and another algorithm polynomial in the number of tasks and machines (but exponential in the number of types and admissible sizes of tasks). When there is a single type, we give a polynomial time algorithm. We also prove that, even for a single type, the problem becomes NP-hard for convex costs. This is joint work with Fanny Pascual.

slides(pdf) 

Time-Cost Trade-offs of Pipelined Dataflow Applications

SpeakerErik Saule

In the era of cloud computing and the exascale, high performance computing is no longer about finishing an application as fast as possible. Time is certainly of the essence, but other objectives such as the cost of a computation, the energy it uses or the maximum amount of power it can draw are just as much of a concern. The pipelined data flow programming model has proved to be popular for analyzing large or streaming datasets. We argue that the pipelined data flow model is suited to adapt to different objectives.
We use one particular histopathology application as a case study. The simple model of pipelined applications assumes a predictable work volume and a smooth execution of that work. We show that in our application it is easy to predict the amount of work, but its execution is not smooth, as work is released from one stage to the next in bursts. However, simple changes in the application allow us to smooth the release of work over time which makes the execution faster and easier to model. With an accurate model of the application's execution for a particular configuration, we are able to express the trade-off between the execution cost of our analysis on a cloud platform and the time it takes to perform the analysis.

slides(pdf) 

Efficient Heuristics for Placing Large-Scale Distributed Applications on Multiple Clouds

SpeakerPedro Silva

In order to place an application on the cloud, a designer must choose among Virtual Machine (VM) types, from private and public cloud providers, those that are capable of hosting her application or its parts using as criteria application requirements, VM prices, and VM resources.
This procedure becomes more complicated when the objective is to place large component based applications on the cloud, considering that possibly hundreds of cloud providers and, consequently, thousands of different VM types may be available. In this scenario, the scalability of placement algorithms is a main concern since this problem is a generalization of the NP-Hard multi-dimensional bin packing problem.
In this talk I will present efficient greedy heuristics based on first fit decreasing and best fit algorithms. We consider the objective of minimizing placement costs while meeting application performance constraints. Those greedy heuristics are the result of the adaptation of state of the art heuristics to large-scale scenarios taking into account the heterogeneity of costs and resources from VM types.
Finally, I will show that the proposed greedy heuristics are capable of computing near optimal solutions for the placement of very large applications on multiple clouds with many cloud providers. We observed that they took a few seconds to calculate near optimal solutions to placements that would require hours or even days when calculated using exact algorithms or meta-heuristics.


Malleable Task-graph Scheduling with a Practical Speed-up Model

SpeakerBertrand Simon

Scientific workloads are often described by directed acyclic task graphs. Indeed, DAGs represent both a model frequently studied in theoretical literature and the structure on which several dynamic runtime schedulers rely on to handle HPC applications. A natural problem consists then in computing a schedule minimizing the makespan of a given graph. In this paper, we are motivated by task graphs arising from multifrontal factorizations of sparse matrices and therefore work under the following model. We focus on malleable tasks (i.e., a single task can be allotted a time-varying number of processors) and specifically on a simple yet realistic speedup model: the parallelism is perfect but bounded, each task having a limit on the number of processors it can be allotted. We first prove that the problem of minimizing the makespan is NP-Complete. Then, we study a widely used algorithm, ProportionalScheduling, and propose a new strategy GreedyFilling. Even though both strategies are 2-approximations, experiments on real and synthetic data sets show that GreedyFilling achieves significantly lower makespans.
This is a joint work with Loris Marchal, Oliver Sinnen and Frédéric Vivien.

slides(pdf) 

Stope Schedule Optimization for an Underground Mining Operation

SpeakerGaurav Singh

Many underground mines use the method known as stoping where the deposit is divided into columns of underground material known as stopes. A stope is extracted in such a way that its material is mixed together in the process of extraction. Once a stope is started it has a fixed lifecycle that cannot be altered. The stopes are partitioned into districts. For safety reasons, the number of stopes mining within each district is limited. In order to access the stopes and move the material to the processing plant on the surface, several pieces of equipment are consumed. There are primarily four phases that a stope goes through in its development, each with its own cost. The first phase is preparation which includes all activities necessary to get the stope ready for extraction. This phase is followed by mining where production of ore occurs. After mining has finished, the stope enters a backfill preparation phase where it prepares to be filled. After backfill preparation is complete, the filling phase commences. Based on the choice of the fill the corresponding neighboring stopes may become in-accessible, which makes this a difficult combinatorial optimization problem. The talk will present a mathematical optimization model that helps to generate a minimum cost life-of-mine stope extraction schedule.


Run Generation Revisited: What Goes Up May or May Not Come Down

SpeakerShikha Singh

In this talk, we revisit run generation, which is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M, and output runs (contiguously sorted chunks of elements) that are as long as possible.
We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the online setting, both with and without resource augmentation, and in the offline setting.

slides(pdf) 

Balanced Optimization with Vector Costs

SpeakerFrits Spieksma

We consider so-called balanced optimization problems with vector costs. We propose a framework containing such problems; this framework allows us to investigate the complexity and approximability of these problems in a general setting. More concrete, each problem in the framework admits a 2-approximation, and for many problems within the framework this result is best-possible, in the sense that having a polynomial-time algorithm with a performance ratio better than 2 would imply P=NP. Due to its relevance in practice, we pay special attention to the balanced assignment problem with vector costs: we show that the problem remains NP-hard even in case of sum costs.
This is joint work with Annette Ficker and Gerhard Woeginger.


Beating the Harmonic Lower Bound for Online Bin Packing

SpeakerRob van Stee

In the online bin packing problem, items of sizes in (0,1] arrive online to be packed into bins of size 1. The goal is to minimize the number of used bins. In this paper, we present an online bin packing algorithm with asymptotic performance ratio of 1.5817, which constitutes the first improvement over the algorithm Harmonic++ in fifteen years. That algorithm achieved a competitive ratio of 1.58889 and is one instance of the Super Harmonic framework; a lower bound of Ramanan et al. shows that within this framework, no competitive ratio below 1.58333 can be achieved.
We give a lower bound of 1.5766 for any interval classification algorithm, including ones that pack medium and large items like our algorithm. This shows that fundamentally different ideas will be required to make further improvements.

slides(pptx) 

Late Work Scheduling in Online and Offline Mode

SpeakerMalgorzata Sterna

The scheduling problems on parallel identical machines with a common due date and the total late work criterion are investigated. In off-line mode all jobs are known in advance, while in on-line mode they appear in the system one by one. We determined the complexity of the off-line case by providing the binary NP-hardness proof and the pseudopolynomial time dynamic programming algorithm. Moreover, we proposed the online algorithm for an arbitrary number of machines, proving its competitive ratio representing the upper bound of the distance between the optimal offline solution and any online solution. The theoretical results are illustrated with results of computational experiments.
This is a joint work with Xin Chen, Kateryna Czerniachowska, Xin Han and Jacek Blazewicz.

slides(pdf) 

On Birkhoff-von Neumann Decomposition of Doubly Stochastic Matrices

SpeakerBora Uçar

Birkhoff-von Neumann (BvN) decomposition of doubly stochastic matrices expresses a double stochastic matrix as a convex combination of a number of permutation matrices. There are known upper and lower bounds for the number of permutation matrices that take part in the BvN decomposition of a given doubly stochastic matrix. We investigate the problem of computing a decomposition with the minimum number of permutation matrices and show that the associated decision problem is strongly NP-complete. We propose a heuristic and investigate it theoretically and experimentally on a set of real world sparse matrices and random matrices. We also discuss the use of such decomposition in solving sparse linear systems.
The talk will cover joint work with Fanny Dufossé, Michele Benzi and Alex Pothen.


Personnel Rostering - Local and Global Constraint Consistency

SpeakerGreet Vanden Berghe

Personnel rostering considers optimizing the shift assignment to employees over a scheduling period, while subject to organisational and legislative constraints. Academic models for personnel rostering abstractly conceptualise complex real world problem characteristics. Often only one isolated scheduling period is considered, contradicting common practice where personnel rostering inherently spans multiple dependent periods. The academic literature offers no systematic approach to address this modelling challenge, and consequently, few models capture the requirements imposed by practice.
A case study concerning nurse rostering in a hospital ward reveals the impact of inconsistent constraint evaluation. The presentation introduces the concepts of local and global consistency in constraint evaluation processes and proposes a systematic approach to the modelling challenges. Experimental results demonstrate that the proposed methodology approximates the optimal solution. This is joint work with Pieter Smet and Fabio Salassa.

slides(pdf) 

Mechanism Design for Scheduling with Uncertain Execution Time

SpeakerAngelina Vidali

Crowdsourcing tasks to agents, in the context of processing big data, creates a lot of uncertainty about the execution times. We study the problem where a task (or multiple unrelated tasks) must be executed, there are multiple machines/agents that can potentially perform the task, and our objective is to minimize the expected sum of the agents' processing times. Each agent does not know exactly how long it will take him to finish the task; he only knows the distribution from which this time is drawn. Agents are selfish and will lie about their distributions if this increases their expected utility. We construct a mechanism that makes it an ex-post equilibrium for the agents to both truthfully report their distributions and exert full effort. We would further like to characterize all possible mechanisms.
Joint work with Vincent Conitzer, (Duke University, USA) presented in AAAI'14 and ongoing work with Carmine Ventre (Teesside University).

slides(pdf) 

Topology-aware resource management for HPC applications

SpeakerAdèle Villiermet

The Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments. Its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. Improper resource selection operations may lead to poor performance executions and global system utilization along with increase of system fragmentation and jobs starvation. These phenomenas play a role in the increase of the platforms' total cost of ownership and should be minimized. I will present my work on a new topology-aware resource selection algorithm to determine the best choice among the available nodes of the platform based upon their position within the network and taking into account the applications communication matrix. This algorithm is integrated as a plugin in SLURM and we validated it with different optimization schemes by comparing with the default SLURM algorithm.


Worst Case Bound of the LRF Schedule for Fully Parallel Jobs

SpeakerKai Wang

We study the problem of scheduling a set of n independent jobs on m identical machines so as to minimize the total weighted completion time. Here, we consider fully parallel job, which can be executed on any machine at any moment. The problem is known to be strongly NP-hard when m is input. The greedy algorithm, Largest-Ratio-First algorithm gives a 2-approximation schedule. We show that this approximation ratio can be smaller and it is a function of n and m for any instance of jobs with equal density and it is 1+(m-1/m+2) in general. This is joint work with Vincent Chau and Minming Li.

slides(pdf) 

Flow Shop Problem F2 --> D | v= 1; c \ge 1|C_\max Revisited

SpeakerYinling Wang

In this paper we study a flow shop problem F2\rightarrow D | v= 1; c \ge 1|C_\max with two machines and one transporter: machines A, B and a transporter V which is initially located at machine B. There are a set of jobs needed to be processed on machine A first, then on machine B, and transported to the destination by V finally. Transporter V can carry at most c jobs in one batch where c \ge 1. The objective is to minimize the completion time when all the jobs are transported to the destination. Problem F2\rightarrow D|v = 1,c = 2|C_{\max } has been proved to be binary NP-hard in EJOR2007, when c = 2, we solve the open question and prove it is strongly NP-hard, i.e., there is no FPTAS for the problem. We also study three special cases of the problem and give related optimal algorithms: i) when the processing order on machines A, B is fixed, we propose a linear time optimal algorithm; ii) when the processing time on machine B of jobs is identical, we give an O(n\log n) optimal algorithm; iii) when the shortest processing time of the jobs on machine B is no less than the longest processing time on machine A, we give an O(n^2) optimal algorithm.

slides(pdf)