
Schedule
Monday, June 5th
Workshops
Breakfast and registration opens at 7:30.
Breaks 10:3011:00 and 15:3016:00. Lunch on your own, 12:3014:00. Workshop schedules:
Tuesday, June 6th
07:30  08:30 
Breakfast + Registration 
08:30  08:45 
Opening Remarks/
Presentation of the Test of Time Award 
08:45  10:00 
Sigmetrics Achievement Award (Prof. Sem Borst) 
10:00  10:30 
Break 
10:30  11:45 
Session 1 
11:45  13:30 
Lunch 
13:30  14:45 
Session 2 
14:45  15:15 
Break 
15:15  16:30 
Session 3 
16:45  19:00 
Reception / Poster Session 


Wednesday, June 7th
07:30  08:30 
Breakfast + Registration 
08:30  10:00 
Keynote (Dr. Vahab Mirrokni) 
10:00  10:30 
Break 
10:30  11:45 
Session 4 
11:45  13:30 
Lunch 
13:30  14:45 
Session 5 
14:45  15:15 
Break 
15:15  16:30 
Session 6 
16:40  17:15 
Sigmetric Board Update  New Journal 
17:45  18:45 
Tour of
Blue Waters Supercomputer

19:00  21:45 
Banquet at
Papa Del's
See map for transportation on your own or bus, for Blue Waters tour and/or
banquet. Bus departs from NCSA building at 17:30 for tour and at 18:30 for Papa Del's. 
Thursday, June 8th
<
Friday, June 9th
Tutorials Breakfast and registration opens at 8:30.
Sessions run 9:0012:00 and 13:3016:30 in Rooms 1030 and 2100. Twenty
minute coffee breaks. Lunch on your own.
Program
Session 1: Load Balancing Among Switches and CachesTuesday, 10:3011:45 Session Chair: Yi Lu (University of Illinois)
A LowComplexity Approach to Distributed Cooperative Caching with Geographic Constraints Konstantin Avrachenkov (INRIA Sophia Antipolis), Jasper Goseling (University of Twente), Berksan Serbetci (University of Twente)
Abstract:We consider caching in cellular networks in which each base station is equipped with a cache that can store a limited number of files. The popularity of the files is known and the goal is to place files in the caches such that the probability that a user at an arbitrary location in the plane will find the file that she requires in one of the covering caches is maximized.
We develop distributed asynchronous algorithms for deciding which contents to store in which cache. Such cooperative algorithms require communication only between caches with overlapping coverage areas and can operate in asynchronous manner. The development of the algorithms is
principally based on an observation that the problem can be viewed as a potential game. Our basic algorithm is derived from the best response
dynamics. We demonstrate that the complexity of each best response step is independent of the number of files, linear in the cache capacity and linear in the maximum number of base stations that cover a certain area. Then, we show that the overall algorithm complexity for a
discrete cache placement is polynomial in both network size and catalog size. In practical examples, the algorithm converges in just a
few iterations. Also, in most cases of interest, the basic algorithm finds the best Nash equilibrium corresponding to the global optimum.
We provide two extensions of our basic algorithm based on stochastic and deterministic simulated annealing which find the global optimum.
Finally, we demonstrate the hit probability evolution on real and synthetic networks numerically and show that our distributed caching algorithm performs significantly better than storing the most popular content, probabilistic content placement policy and MultiLRU caching policies.
Optimal Service Elasticity in LargeScale Distributed Systems Debankur Mukherjee (Eindhoven University of Technology), Souvik Dhara (Eindhoven University of Technology), Sem Borst (Eindhoven University of Technology, Nokia Bell Labs), Johan S.H. van Leeuwaarden (Eindhoven University of Technology)
Abstract:A fundamental challenge in largescale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent userperceived performance in the presence of uncertain and timevarying demand patterns. Autoscaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queuedriven autoscaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queuedriven autoscaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all.
Motivated by the above issues, we propose a joint autoscaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably nearoptimal service elasticity. We establish the fluidlevel dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluidlimit results show that the proposed scheme achieves asymptotic optimality in terms of userperceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. Extensive simulation experiments corroborate the fluidlimit results, and demonstrate that the proposed scheme can match the user performance and energy consumption of stateoftheart approaches that do take full advantage of a centralized queue.
QueueProportional Sampling: A Better Approach to Crossbar
Scheduling for InputQueued Switches Long Gong (Georgia Institute of Technology), Paul Tune (), Liang Liu (Georgia Institute of Technology), Sen Yang (Georgia Institute of Technology), Jun (Jim) Xu (Georgia Institute of Technology)
Abstract:Most present day switching systems, in Internet routers and datacenter switches, employ a single inputqueued crossbar to interconnect input ports with output ports. Such switches need to compute a matching, between input and output ports, for each switching cycle (time slot). The main challenge in designing such matching algorithms is to deal with the unfortunate tradeoff between the quality of the computed matching and the computational complexity of the algorithm. In this paper, we propose a general approach that can significantly boost the performance of both SERENA and iSLIP, yet incurs only $O(1)$ additional computational complexity at each input/output port. Our approach is a novel proposing strategy, called \textit{QueueProportional Sampling (QPS)}, that generates an excellent {\it starter matching}. We show, through rigorous simulations, that when starting with this starter matching, iSLIP and SERENA can output much better final matching decisions, as measured by the resulting throughput and delay performance, than they otherwise can.
Session 2: Algorithms for Massive Processing ApplicationsTuesday, 13:3014:45 Session Chair: Y.C. Tay (National University of Singapore)
Hieroglyph: LocallySufficient Graph Processing via ComputeSyncMerge Xiaoen Ju (University of Michigan), Hani Jamjoom (IBM T.J. Watson Research Center), Kang G. Shin (University of Michigan)
Abstract:Despite their widespread adoption, largescale graph processing systems do not fully decouple computation and communication, often yielding suboptimal performance. Locallysufficient computation—computation that relies only on the graph state local to a computing host—can mitigate the effects of this coupling. In this paper, we present ComputeSyncMerge (CSM), a new programming abstraction that achieves efficient locallysufficient computation. CSM enforces local sufficiency at the programming abstraction level and enables the activation of vertexcentric computation on all vertex replicas, thus supporting vertexcut partitioning. We demonstrate the simplicity of expressing several fundamental graph algorithms in CSM. Hieroglyph—our implementation of a graph processing system with CSM support—outperforms state of the art by up to 53x, with a median speedup of 3.5x and an average speedup of 6x across a wide range of datasets.
A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing Lingda Li (Rutgers University), Robel Geda (Rutgers University), Ari B. Hayes (Rutgers University), Pranav Chaudhari (Rutgers University), Eddy Z. Zhang (Rutgers University), Mario Szegedy (Rutgers University)
Abstract:Graph {\bf edge} partition models have recently become an appealing alternative to graph {\bf vertex} partition models for distributed computing due to their flexibility in balancing loads and their performance in reducing communication cost \cite{Bourse+:KDD14,gonzalez2012powergraph}.
In this paper, we propose a simple yet effective graph edge partitioning algorithm. In practice, our algorithm provides good partition quality (and better than similar stateoftheart edge partition approaches, at least for powerlaw graphs) while maintaining low partition overhead.
In theory, previous work \cite{Bourse+:KDD14} showed that an approximation guarantee of $\mathcal{O}(d_{max}\sqrt{\log{}n\log{}k})$ apply to the graphs with $m=\Omega(k^2)$ edges ($k$ is the number of partitions). We further rigorously proved that this approximation guarantee hold for all graphs.
We show how our edge partition model can be applied to parallel computing. We draw our example from GPU program locality enhancement and demonstrate that the graph edge partition model does not only apply to distributed computing with many computer nodes, but also to parallel computing in a single computer node with a manycore processor.
Overcommitment in Cloud Services  Bin Packing with Chance Constraints Maxime C. Cohen (New York University (NYU)  Leonard N. Stern School of Business), Philipp Keller (Google), Vahab Mirrokni (Google), Morteza Zadimoghaddam (Google)
Abstract:
This paper considers a traditional problem of resource allocation,
scheduling jobs on machines. One such recent application is cloud
computing, where jobs arrive in an online fashion with capacity
requirements and need to be immediately scheduled on physical
machines in data centers. It is often observed that the requested
capacities are not fully utilized, hence offering an opportunity to
employ an overcommitment policy, i.e., selling resources beyond
capacity. Serving the right overcommitment level can induce a signifcant cost reduction for the cloud provider, while only inducing
a very low risk of violating capacity constraints. We introduce
and study a model that quantizes the value of overcommitment
by modeling the problem as a bin packing with chance constraints.
We then propose an alternative formulation that transforms each
chance constraint into a submodular function. We show that our
model captures the risk pooling effect and can guide scheduling
and overcommitment decisions. We also develop a family of online
algorithms that are intuitive, easy to implement and provide a
constant factor guarantee from optimal. Finally, we calibrate our
model using realistic workload data, and test our approach in a
practical setting. Our analysis and experiments illustrate the benenfit
of overcommitment in cloud services, and suggest a cost reduction
of 1.5% to 17% depending on the provider’s risk tolerance.
Session 3: Assessing Vulnerability of Large NetworksTuesday, 15:1516:30 Session Chair: Adam Wierman (California Institute of Technology)
Investigation of the 2016 Linux TCP Stack Vulnerability at Scale Alan Quach (University of California, Riverside), Zhongjie Wang (University of California, Riverside), Zhiyun Qian (University of California, Riverside)
Abstract:To combat blind inwindow attacks against TCP, changes proposed in RFC 5961 have been implemented by Linux since late 2012. While successfully eliminating the old vulnerabilities, the new TCP implementation was reported in August 2016 to have introduced a subtle yet serious security flaw. Assigned CVE20165696, the flaw exploits the challenge ACK rate limiting feature that could allow an offpath attacker to infer the presence/absence of a TCP connection between two arbitrary hosts, terminate such a connection, and even inject payload into an unsecured TCP connection.
In this work, we perform a comprehensive measurement of the impact of the new vulnerability. This includes (1) tracking the vulnerable Internet servers, (2) monitoring the patch behavior over time, (3) picturing the overall security status of TCP stacks at scale. Towards this goal, we design a scalable measurement methodology to scan the Alexa top 1 million websites for almost 6 months. We also present how notifications impact the patching behavior, and compare the result with the Heartbleed and the Debian PRNG vulnerability. The measurement represents a valuable data point in understanding how Internet servers react to serious security flaws in the operating system kernel.
Characterizing and Modeling Patching Practices of Industrial Control Systems Brandon Wang (University of Iowa), Xiaoye Li (University of Iowa), Leandro P. de Aguiar (Siemens), Daniel S Menasche (UFRJ), Zubair Shafiq (University of Iowa)
Abstract:Industrial Control Systems (ICS) are widely deployed in mission critical infrastructures such as manufacturing, energy, and transportation. The mission critical nature of ICS devices poses important security challenges for ICS vendors and asset owners. In particular, the patching of ICS devices is usually deferred to scheduled production outages so as to prevent potential operational disruption of critical systems. Unfortunately, anecdotal evidence suggests that ICS devices are riddled with security vulnerabilities that are not patched in a timely manner, which leaves them vulnerable to exploitation by hackers, nation states, and hacktivist organizations.
In this paper, we present the results from our longitudinal measurement and characterization study of ICS patching behavior. Our study is based on IP scan data collected from Shodan over the duration of three years for more than 500 known industrial ICS protocols and products. Our longitudinal measurements reveal the impact of vulnerability disclosures on ICS patching. Our analysis of more than 100 thousand Internetexposed ICS devices reveals that about 50% upgrade to newer patched versions within 60 days of a vulnerability disclosure. Based on our measurement and analysis, we further propose a variation of the Bass model to forecast the patching behavior of ICS devices. The evaluation shows that our proposed models have comparable prediction accuracy when contrasted against traditional ARIMA timeseries forecasting models, while requiring less parameters and being amenable to direct physical interpretation.
Security Game with Nonadditive Utilities and Multiple Attacker Resources Sinong Wang (The Ohio State University), Ness Shroff (The Ohio State University)
Abstract:There has been significant interest in studying security games for modeling the interplay of attacks and defenses on various systems involving critical infrastructure, financial system security, political campaigns, and civil safeguarding. However, existing security game models typically either assume additive utility functions, or that the attacker can attack only one target. Such assumptions lead to tractable analysis, but miss key inherent dependencies that exist among different targets in current complex networks. In this paper, we generalize the classical security game models to allow for nonadditive utility functions. We also allow attackers to be able to attack multiple targets. We examine such a general security game from a theoretical perspective and provide a unified view. In particular, we show that each security game is equivalent to a combinatorial optimization problem over a set system $\varepsilon$, which consists of defender's pure strategy space. The key technique we use is based on the transformation, projection of a polytope, and the ellipsoid method. This work settles several open questions in security game domain and significantly extends the stateoftheart of both the polynomial solvable and NPhard class of the security game.
Session 4: Performance Analysis of Very Large SystemsWednesday, 10:3011:45 Session Chair: Mor HarcholBalter (Carnegie Mellon University)
Stein's Method for Mean Field Approximations in Light and Heavy Traffic Regimes Lei Ying (Arizona State University)
Abstract:Meanfield analysis is an analytical method for understanding largescale stochastic systems such as largescale data centers and communication networks. The idea is to approximate the stationary distribution of a largescale stochastic system using the equilibrium point (called the meanfield limit) of a dynamical system (called the meanfield model). This approximation is often justified by proving the weak convergence of stationary distributions to its meanfield limit. Most existing meanfield models concerned the lighttraffic regime where the load of the system, denote by $\rho$, is strictly less than one and is independent of the size of the system. This is because a traditional meanfield model represents the limit of the corresponding stochastic system. Therefore, the load of the meanfield model is $\rho=\lim_{N\rightarrow \infty} \rho^{(N)},$ where $\rho^{(N)}$ is the load of the stochastic system of size $N.$ Now if $\rho^{(N)}\rightarrow 1$ as $N\rightarrow \infty$ (i.e., in the heavytraffic regime), then $\rho=1.$ For most systems, the meanfield limits when $\rho=1$ are trivial and meaningless. To overcome this difficulty of traditional meanfield models, this paper takes a different point of view on meanfield models. Instead of regarding a meanfield model as the limiting system of largescale stochastic system, it views the equilibrium point of the meanfield model, called a meanfield solution, simply as an approximation of the stationary distribution of the finitesize system. Therefore both meanfield models and solutions can be functions of $N.$ This paper first outlines an analytical method to bound the approximation error based on Stein's method and the perturbation theory. We further present two examples: the $M/M/N$ queueing system and the supermarket model under the poweroftwochoices algorithm. For both applications, the method enables us to characterize the system performance under a broad range of traffic loads. For the supermarket model, this is the first paper that rigorously quantifies the steadystate performance of thepoweroftwochoices in the heavytraffic regime. These results in the heavytraffic regime cannot be obtained using the traditional meanfield analysis and the interchange of the limits.
Expected Values Estimated via MeanField Approximation are 1/NAccurate Nicolas Gast (Inria)
Abstract:Meanfield approximation is a powerful tool to study largescale stochastic systems such as datacenters  one example being the famous power of twochoice paradigm. It is shown in the literature that under quite general conditions, the empirical measure of a system of $N$ interacting objects converges at rate O(1/sqrt(N)) to a deterministic dynamical system, called its meanfield approximation.
In this paper, we revisit the accuracy of meanfield approximation by focusing on expected values. We show that, under almost the same general conditions, the expectation of any performance functional converges at rate O(1/N) to its meanfield approximation. Our result applies for finite and infinitedimensional meanfield models. We also develop a new perturbation theory argument that shows that the result holds for the stationary regime if the dynamical system is asymptotically exponentially stable. We provide numerical experiments that demonstrate that this rate of convergence is tight and that illustrate the necessity of our conditions. As an example, we apply our result to the classical twochoice model. By combining our theory with numerical experiments, we claim that, as the load rho goes to 1, the average queue length of a twochoice system with N servers is log_2 1/(1rho) + 1/(2N(1rho)) +O(1/N^2).
Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems: A MeanField Approach Wen Sun (INRIA), Veronique Simon (UPMC Paris), Sebastien Monnet (Universite Savoie Mont Blanc), Philippe Robert (INRIA), Pierre Sens (UPMC Paris)
Abstract:Distributed storage systems such as Hadoop File System or Google File System (GFS) ensure data availability and durability using replication. Persistence is achieved by replicating the same data block on several nodes, and ensuring that a minimum number of copies are available on the system at any time. Whenever the contents of a node are lost, for instance due to a hard disk crash, the system regenerates the data blocks stored before the failure by transferring them from the remaining replicas. This paper is focused on the analysis of the efficiency of replication mechanism that determines the location of the copies of a given file at some server. The variability of the loads of the nodes of the network is investigated for several policies. Three replication mechanisms are tested against simulations in the context of a real implementation of a such a system: Random, Least Loaded and Power of Choice.
The simulations show that some of these policies may lead to quite unbalanced situations: if $\beta$ is the average number of copies per node it turns out that, at equilibrium, the load of the nodes may exhibit a high variability. It is shown in this paper that a simple variant of a power of choice type algorithm has a striking effect on the loads of the nodes: at equilibrium, the distribution of the load of a node has a bounded support, most of nodes have a load less than $2\beta$ which is an interesting property for the design of the storage space of these systems. Stochastic models are introduced and investigated to explain this interesting phenomenon.
Session 5: Towards Efficient and Durable StorageWednesday, 13:3014:45 Session Chair: Bhuvan Urgaonkar (Pennsylvania State University)
Understanding ReducedVoltage Operation in Modern DRAM
Devices: Experimental Characterization, Analysis, and Mechanisms Donghyuk Lee (NVIDIA), Samira Khan (University of Virginia), Lavanya Subramanian (Carnegie Mellon University), Saugata Ghose (Carnegie Mellon University), Rachata Ausavarungnirun (Carnegie Mellon University), Gennady Pekhimenko (Microsoft Research), Vivek Seshadri (Microsoft Research), Onur Mutlu (ETH Zurich)
Abstract:The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy. Aggressive supply voltage reduction requires a thorough understanding of the effect voltage scaling has on DRAM access latency and DRAM reliability.
In this paper, we take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the supply voltage is lowered below the nominal voltage level specified by DRAM standards. Using an FPGAbased testing platform, we perform an experimental study of 124 real DDR3L (lowvoltage) DRAM chips manufactured recently by three major DRAM vendors. We find that reducing the supply voltage below a certain point introduces bit errors in the data, and we comprehensively characterize the behavior of these errors. We discover that these errors can be avoided by increasing the latency of three major DRAM operations (activation, restoration, and precharge). We perform detailed DRAM circuit simulations to validate and explain our experimental findings. We also characterize the various relationships between reduced supply voltage and error locations, stored data patterns, DRAM temperature, and data retention.
Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron. The key idea of Voltron is to use a performance model to determine by how much we can reduce the supply voltage without introducing errors and without exceeding a userspecified threshold for performance loss. Our evaluations show that Voltron reduces the average DRAM and system energy consumption by 10.5% and 7.3%, respectively, while limiting the average system performance loss to only 1.8%, for a variety of memoryintensive quadcore workloads. We also show that Voltron significantly outperforms prior dynamic voltage and frequency scaling mechanisms for DRAM.
Exploiting Data Longevity for Enhancing the Lifetime of Flashbased Storage Class Memory Wonil Choi (The Pennsylvania State University), Mohammad Arjomand (The Pennsylvania State University), Myoungsoo Jung (Yonsei University), Mahmut Kandemir (The Pennsylvania State University)
Abstract:Storageclass memory (SCM) combines the benefits of a solidstate memory, such as highperformance and robustness, with the archival capabilities and low cost of conventional harddisk magnetic storage. Among candidate solidstate nonvolatile memory technologies that could potentially be used to construct SCM, flash memory is a wellestablished technology and have been widely used in commercially available SCM incarnations. Flashbased SCM enables much better tradeoffs between performance, space and power than diskbased systems. However, write endurance is a significant challenge for a flashbased SCM (each act of writing a bit may slightly damage a cell, so one flash cell can be written 10^410^5 times, depending on the flash technology, before it becomes unusable). This is a welldocumented problem and has received a lot of attention by manufactures that are using some combination of write reduction and wearleveling techniques for achieving longer lifetime. In an effort to improve flash lifetime, first, by quantifying data longevity in an SCM, we show that a majority of the data stored in a solidstate SCM do not require long retention times provided by flash memory (i.e., up to 10 years in modern devices); second, by exploiting retention time relaxation, we propose a novel mechanism, called DenseSLC (DSLC), which enables us perform multiple writes into a cell during each erase cycle for lifetime extension; and finally, we discuss the required changes in the flash management software (FTL) in order to use DSLC mechanism for extending the lifetime of the solidstate part of an SCM. Using an extensive simulationbased analysis of an SLC flashbased SCM, we demonstrate that DSLC is able to significantly improve device lifetime (between 5.1X and 8.6X) with no performance overhead and also very small changes at the FTL software.
DesignInduced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms Kevin K. Chang (Carnegie Mellon University), Abdullah Giray Yaglikci (Carnegie Mellon University), Saugata Ghose (Carnegie Mellon University), Abhijith Kashyap (NVIDIA), Hasan Hassan (TOBB University), Aditya Agrawal (NVIDIA), Niladrish Chatterjee (NVIDIA), Donghyuk Lee (NVIDIA), Mike O'Connor (NVIDIA / UTAustin), Onur Mutlu (ETH)
Abstract:Variation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturingprocess or temperatureinduced variation. We empirically demonstrate a new form of variation that exists within a real DRAM chip, induced by the design and placement of different components in the DRAM chip: different regions in DRAM, based on their relative distances from the peripheral structures, require different minimum access latencies for reliable operation. In particular, we show that in most real DRAM chips, cells closer to the peripheral structures can be accessed much faster than cells that are farther. We call this phenomenon designinduced variation in DRAM. Our goals are to i) understand designinduced variation that exists in real, stateoftheart DRAM chips, ii) exploit it to develop lowcost mechanisms that can dynamically find and use the lowest latency at which to operate a DRAM chip reliably, and, thus, iii) improve overall system performance while ensuring reliable system operation.
To this end, we first experimentally demonstrate and analyze designedinduced variation in modern DRAM devices by testing and characterizing 96 DIMMs (768 DRAM chips). Our characterization identifies DRAM regions that are vulnerable to errors, if operated at lower latency, and finds consistency in their locations across a given DRAM chip generation, due to designinduced variation. Based on our extensive experimental analysis, we develop two mechanisms that reliably reduce DRAM latency. First, DIVA Profiling uses runtime profiling to dynamically identify the lowest DRAM latency that does not introduce failures. DIVA Profiling exploits designinduced variation and periodically profiles only the vulnerable regions to determine the lowest DRAM latency at low cost. It is the first mechanism to dynamically determine the lowest latency that
can be used to operate DRAM reliably. DIVA Profiling reduces the latency of read/write requests by 35.1%/57.8%, respectively, at 55°C. Our second mechanism, DIVA Shuffling, shuffles data such that values stored in vulnerable regions are mapped to multiple errorcorrecting code (ECC) codewords. As a result, DIVA Shuffling can correct 26% more multibit errors than conventional ECC. Combined together, our two mechanisms reduce read/write latency by 40.0%/60.5%, which translates to an overall system performance improvement of 14.7%/13.7%/13.8% (in 2/4/8core systems) across a variety of workloads, while ensuring reliable operation.
Session 6: New Design for Large Network ApplicationsWednesday, 15:1516:30 Session Chair: Anshul Gandhi (Stony Brook University)
Hadoop on Named Data Networking: Experience and Results Mathias Gibbens (University of Arizona), Chris Gniady (University of Arizona), Beichuan Zhang (University of Arizona), Lei Ye (University of Arizona)
Abstract:The Named Data Networking (NDN) architecture retrieves content by names rather than connecting to specific hosts. It provides benefits such as highly efficient and resilient content distribution, which fit well to dataintensive distributed computing. This paper presents and discusses our experience in modifying Apache Hadoop, a popular MapReduce framework, to operate on an NDN network. Through this firstofitskind implementation process, we demonstrate the feasibility of running an existing, large, and complex piece of distributed software commonly seen in data centers over NDN. We show advantages such as simplified network code and reduced network traffic which are beneficial in a data center environment. There are also challenges faced by NDN, that are being addressed by the community, which can be magnified under data center traffic. Through detailed evaluation, we show a reduction of 16% for overall data transmission between Hadoop nodes while writing data with default replication settings. Preliminary results also show promise for innetwork caching of repeated reads in distributed applications. We also show that overall performance is currently slower under NDN, and we identify challenges and opportunities for further NDN improvements.
Using Burstable Instances in the Public Cloud: Why, When and How? Cheng Wang (Penn State University), Bhuvan Urgaonkar (Penn State University), Neda Nasiriani (Penn State University), George Kesidis (Penn State University)
Abstract:Amazon EC2 and Google Compute Engine (GCE) have recently introduced a new class of virtual machines called "burstable" instances that are cheaper than even the smallest traditional/regular instances. These lower prices come with reduced average capacity and increased variance. Using measurements from both EC2 and GCE, we identify key idiosyncrasies of resource capacity dynamism for burstable instances that set them apart from other instance types. Most importantly, certain resources for these instances appear to be regulated by deterministic token bucket like mechanisms. We find widely different types of disclosures by providers of the parameters governing these regulation mechanisms: full disclosure (e.g., CPU capacity for EC2 t2 instances), partial disclosure (e.g., CPU capacity and remote disk IO bandwidth for GCE sharedcore instances), or no disclosure (network bandwidth for EC2 t2 instances). A tenant modeling these variations as random phenomena (as some recent work suggests) might make suboptimal procurement and operation decisions. We present modeling techniques for a tenant to infer the properties of these regulation mechanisms via simple offline measurements. We also present two case studies of how certain memcached workloads might benefit from our modeling when operating on EC2 by: (i) augmenting cheap but low availability inmemory storage offered by spot instances with backup of popular content on burstable instances, and (ii) temporal multiplexing of multiple burstable instances to achieve the CPU or network bandwidth (and thereby throughput) equivalent of a more expensive regular EC2 instance.
Dandelion: Redesigning the Bitcoin Network for Anonymity Shaileshh Bojja Venkatakrishnan (University of Illinois at UrbanaChampaign), Giulia Fanti (University of Illinois at UrbanaChampaign), Pramod Viswanath (University of Illinois at UrbanaChampaign)
Abstract:Bitcoin and other cryptocurrencies have surged in popularity over the last decade.
Although Bitcoin does not claim to provide anonymity for its users, it enjoys a public perception of being a privacypreserving financial system.
In reality, cryptocurrencies publish users' entire transaction histories in plaintext, albeit under a pseudonym; this is required for transaction validation.
Therefore, if a user's pseudonym can be linked to their human identity, the privacy fallout can be significant.
Recently, researchers have demonstrated deanonymization attacks that exploit weaknesses in the Bitcoin network's peertopeer (P2P) networking protocols.
In particular, the P2P network currently forwards content in a structured way that allows observers to deanonymize users.
In this work, we redesign the P2P network from first principles with the goal of providing strong, provable anonymity guarantees.
We propose a simple networking policy called Dandelion which provides quasioptimal, networkwide anonymity, with minimal cost to the network's utility.
We also discuss practical implementation challenges and propose heuristic solutions.
Session 7: Resource Allocation & EconomicsThursday, 10:3011:45 Session Chair: Lei Ying (Arizona State University)
Portfoliodriven Resource Management for Transient Cloud Servers Prateek Sharma (University of Massachusetts Amherst), David Irwin (University of Massachusetts Amherst), Prashant Shenoy (University of Massachusetts Amherst)
Abstract:Cloud providers have begun to offer their surplus capacity in the form of lowcost transient servers, which can be revoked unilaterally at any time. While the low cost of transient servers makes them attractive for a wide range of applications, such as data processing and scientific computing, failures due to server revocation can severely degrade application performance. Since different transient server types offer different cost and availability tradeoffs, we present the notion of server portfolios that is based on financial portfolio modeling. Server portfolios enable construction of an ``optimal'' mix of severs to meet an application's sensitivity to cost and revocation risk. We implement modeldriven portfolios in a system called ExoSphere, and show how diverse applications can use portfolios and applicationspecific policies to gracefully handle transient servers. We show that ExoSphere enables widelyused parallel applications such as Spark, MPI, and BOINC to be made transiencyaware with modest effort. Our experiments show that allowing the applications to use suitable transiencyaware policies, ExoSphere is able to achieve 80\% cost savings when compared to ondemand servers and greatly reduces revocation risk compared to existing approaches.
Optimal Posted Prices for Online Cloud Resource Allocation Zijun Zhang (University of Calgary), Zongpeng Li (University of Calgary), Chuan Wu (The University of Hong Kong)
Abstract:We study online resource allocation in a cloud computing platform through posted pricing: The cloud provider publishes a unit price for each resource type, which may vary over time; upon arrival at the cloud system, a cloud user either takes the current prices, renting resources to execute its job, or refuses the prices without running its job there. We design pricing functions based on current resource utilization ratios, in a wide array of demandsupply relationships and resource occupation durations, and prove worstcase competitive ratios in social welfare. In the basic case of a singletype, nonrecycled resource (allocated resources are not later released for reuse), we prove that our pricing function design is optimal, in that it achieves the smallest competitive ratio among all possible pricing functions. Insights obtained from the basic case are then used to generalize the pricing functions to more realistic cloud systems with multiple types of resources, where a job occupies allocated resources for a number of time slots till completion, upon which time the resources are returned to the cloud resource pool.
On Optimal TwoSided Pricing of Congested Networks Xin Wang (University of Science and Technology of China), Richard T. B. Ma (School of Computing, National University of Singapore), Yinlong Xu (University of Science and Technology of China)
Abstract:Traditionally, Internet Access Providers (APs) only charge endusers for Internet access services; however, to recoup infrastructure costs and increase revenues, some APs have recently adopted twosided pricing schemes under which both endusers and content providers are charged. Meanwhile, with the rapid growth of traffic, network congestion could seriously degrade user experiences and influence providers' utility. To optimize profit and social welfare, APs and regulators need to design appropriate pricing strategies and regulatory policies that take the effects of network congestion into consideration. In this paper, we model twosided networks under which users' traffic demands are influenced by exogenous pricing and endogenous congestion parameters and derive the system congestion under an equilibrium. We characterize the structures and sensitivities of profit and welfareoptimal twosided pricing schemes and reveal that 1) the elasticity of system throughput plays a crucial role in determining the structures of optimal pricing, 2) the changes of optimal pricing under varying AP's capacity and users' congestion sensitivity are largely driven by the type of data traffic, e.g., text or video, and 3) APs and regulators will be incentivized to shift from onesided to twosided pricing when APs' capacities and user demand for video traffic grow. Our results can help APs design optimal twosided pricing and guide regulators to legislate desirable policies.
Session 8: Analyzing and Controlling Network InteractionThursday, 14:1515:30 Session Chair: Nicolas Gast (INRIA)
Outward Influence and Cascade Size Estimation in Billionscale Networks Hung T. Nguyen (Virginia Commonwealth University), Tri P. Nguyen (Virginia Commonwealth University), Tam Vu (University of Colorado, Denver), Thang N. Dinh (Virginia Commonwealth Unviersity)
Abstract:Estimating cascade size and nodes' influence is a fundamental task in social, technological, and biological networks. Yet this task is extremely challenging due to the sheer size and the structural heterogeneity of networks. We investigate a new influence measure, termed \emph{outward influence} (OI), defined as the (expected) number of nodes that a subset of nodes $S$ will activate, \emph{excluding the nodes in $S$}. Thus, OI equals, the de facto standard measure, \emph{influence spread} of $S$ minus $S$. OI is not only more informative for nodes with small influence, but also, critical in designing new effective sampling and statistical estimation methods.
Based on OI, we propose SIEA/SOIEA, novel methods to estimate influence spread/outward influence \emph{at scale and with rigorous theoretical guarantees}. The proposed methods are built on two novel components 1) IICP an important sampling method for outward influence; and 2) RSA, a robust mean estimation method that minimize the number of samples through analyzing variance and range of random variables. Compared to the stateofthe art for influence estimation, SIEA is $\Omega(\log^4 n)$ times faster in theory and up to \emph{several orders of magnitude faster} in practice. For the first time, influence of nodes in the networks of billions of edges can be estimated with high accuracy within a few minutes. Our comprehensive experiments on realworld networks also give evidence against the popular practice of using a fixed number, e.g. 10K or 20K, of samples to compute the ``ground truth'' for influence spread.
Accelerating Performance Inference over Closed Systems by Asymptotic Methods Giuliano Casale (Imperial College London)
Abstract:Recent years have seen a rapid growth of interest in exploiting monitoring data collected from enterprise applications for automated management and performance analysis. In spite of this trend, even simple performance inference problems involving queueing theoretic formulas often incur computational bottlenecks, for example upon computing likelihoods in models of batch systems. Motivated by this issue, we revisit the solution of multiclass closed queueing networks, which are popular models used to describe batch and distributed applications with parallelism constraints. We first prove that the normalizing constant of the equilibrium state probabilities of a closed model can be reformulated exactly as a multidimensional integral over the unit simplex. This gives as a byproduct novel explicit expressions for the multiclass normalizing constant. We then derive a method based on cubature rules to efficiently evaluate the proposed integral form in small and mediumsized models. For large models, we propose novel asymptotic expansions and Monte Carlo sampling methods to efficiently and accurately approximate normalizing constants and likelihoods. We illustrate the resulting accuracy gains in problems involving optimizationbased inference.
Quality and Cost of Deterministic Network Calculus – Design and Evaluation of an Accurate and Fast Analysis Steffen Bondorf (Distributed Computer Systems (DISCO) Lab, University of Kaiserslautern, Germany), Paul Nikolaus (Distributed Computer Systems (DISCO) Lab, University of Kaiserslautern, Germany), Jens B. Schmitt (Distributed Computer Systems (DISCO) Lab, University of Kaiserslautern, Germany)
Abstract:Networks are integral parts of modern safetycritical systems and certification demands the provision of guarantees for data transmissions. Deterministic Network Calculus (DNC) can compute a worstcase bound on a data flow's endtoend delay. Accuracy of DNC results has been improved steadily, resulting in two DNC branches: the classical algebraic analysis and the more recent optimizationbased analysis. The optimizationbased branch provides a theoretical solution for tight bounds. Its computational cost grows, however, (possibly super)exponentially with the network size. Consequently, a heuristic optimization formulation trading accuracy against computational costs was proposed. In this article, we challenge optimizationbased DNC with a new algebraic DNC algorithm.
We show that:
1. no current optimization formulation scales well with the network size and
2. algebraic DNC can be considerably improved in both aspects, accuracy and computational cost.
To that end, we contribute a novel DNC algorithm that transfers the optimization's search for best attainable delay bounds to algebraic DNC. It achieves a high degree of accuracy and our novel efficiency improvements reduce the cost of the analysis dramatically. In extensive numerical experiments, we observe that our delay bounds deviate from the optimizationbased ones by only 1.142% on average while computation times simultaneously decrease by several orders of magnitude.
Session 9: Accurate and Efficient Performance MeasurementThursday, 16:0017:15 Session Chair: Giuliano Casale (Imperial College London)
A Case Study in Power Substation Network Dynamics David Formby (Georgia Institute of Technology), Anwar Walid (Bell Laboratories), Raheem Beyah (Georgia Institute of Technology)
Abstract:The modern world is becoming increasingly dependent on computing and communication technology to function, but unfortunately its application and impact on areas such as critical infrastructure and industrial control system (ICS) networks remains to be thoroughly studied. Significant research has been conducted to address the myriad security concerns in these areas, but they are virtually all based on artificial testbeds or simulations designed on assumptions about their behavior either from knowledge of traditional IT networking or from basic principles of ICS operation. In this work, we provide the most detailed characterization of an example ICS to date in order to determine if these common assumptions hold true. A live power distribution substation is observed over the course of two and a half years to measure its behavior and evolution over time. Then, a horizontal study is conducted that compared this behavior with three other substations from the same company. Although most predictions were found to be correct, some unexpected behavior was observed that highlights the fundamental differences between ICS and IT networks including round trip times dominated by processing speed as opposed to network delay, several well known TCP features being largely irrelevant, and surprisingly large jitter from devices running realtime operating systems. The impact of these observations is discussed in terms of generality to other embedded networks, network security applications, and the suitability of the TCP protocol for this environment.
Persistent Spread Measurement for Big Network Data Based on
Register Intersection You Zhou (Department of Computer & Information Science & Engineering, University of Florida), Yian Zhou (Department of Computer & Information Science & Engineering, University of Florida), Min Chen (Department of Computer & Information Science & Engineering, University of Florida), Shigang Chen (Department of Computer & Information Science & Engineering, University of Florida)
Abstract:Persistent spread measurement is to count the number of distinct elements that persist in each network flow for predefined time periods. It has many practical applications, including detecting longterm stealthy network activities in the background of normaluser activities, such as stealthy DDoS attack, stealthy network scan, or faked network trend, which cannot be detected by traditional flow cardinality measurement. With big network data, one challenge is to measure the persistent spreads of a massive number of flows without incurring too much memory overhead as such measurement may be performed at the line speed by network processors with fast but small onchip memory. We propose a highly compact Virtual Intersection HyperLogLog (VIHLL) architecture for this purpose. It achieves far better memory efficiency than the best prior work of VBitmap, and in the meantime drastically extends the measurement range. Theoretical analysis and extensive experiments demonstrate that VIHLL provides good measurement accuracy even in very tight memory space of less than 1 bit per flow.
Deconstructing the Energy Consumption of the Mobile Page Load Yi Cao (Stony Brook University), Javad Nejati (Stony Brook University), Muhammad Wajahat (Stony Brook University), Aruna Balasubramanian (Stony Brook University), Anshul Gandhi (Stony Brook University)
Abstract:Modeling the energy consumption of applications on mobile devices is an important topic that has received much attention in recent years. However, there has been very little research on modeling the energy consumption of the mobile Web. This is primarily due to the shortlived yet complex page load process that makes it infeasible to rely on coarsegrained resource monitoring for accurate power estimation.
We present RECON, a modeling approach that accurately estimates the energy consumption of any Web page load and deconstructs it into the energy contributions of individual page load activities. Our key intuition is to leverage lowlevel application semantics in addition to coarsegrained resource utilizations for modeling the page load energy consumption. By exploiting finegrained information about the individual activities that make up the page load, RECON enables fast and accurate energy estimations without requiring complex models. Experiments across 80 Web pages and under four different optimizations show that RECON can estimate the energy consumption for a Web page load with an average error of less than 7%. Importantly, RECON helps to analyze and explain the energy effects of an optimization on the individual components of Web page loads.
Posters
FluidModelBased Car Routing for Modern Ridesharing Systems Anton Braverman (ORIE, Cornell University), J. G. Dai (ORIE, Cornell University), Xin Liu (ECE, Arizona State University), Lei Ying (ECE, Arizona State University)
On the Capacity Requirement for Arbitrary EndtoEnd Deadline and Reliability Guarantees in Multihop Networks Han Deng (Texas A&M University), IHong Hou (Texas A&M University), Derya Cansever (Army CERDEC)
Why "Some" Like It Hot Too: Thermal Attack on Data Centers Xing Gao (College of William and Mary), Zhang Xu (College of William and Mary), Haining Wang (University of Delaware), Li Li (Ohio State University), Xiaorui Wang (Ohio State University)
Incentivizing Reliable Demand Response with Customers' Uncertainties and Capacity Planning Joshua Comden (Stony Brook University), Zhenhua Liu (Stony Brook University), Yue Zhao (Stony Brook University)
A Study on Performance and Power Efficiency of Dense NonVolatile Caches in MultiCore Systems Amin Jadidi (Pennsylvania State University), Mohammad Arjomand (Pennsylvania State University), Mahmut T. Kandemir (Pennsylvania State University), Chita R. Das (Pennsylvania State University)
PseudoSeparation for Assessment of Structural Vulnerability of a Network Alan Kuhnle (University of Florida), Tianyi Pan (University of Florida), Victoria G. Crawford (University of Florida), Md Abdul Alim (University of Florida), My T. Thai (University of Florida)
Optimizing Speculative Execution of DeadlineSensitive Jobs in Cloud Maotong Xu (George Washington University), Sultan Alamro (George Washington University), Tian Lan (George Washington University), Suresh Subramaniam (George Washington University)
A Spot Capacity Market to Increase Power Infrastructure Utilization in MultiTenant Data Centers Mohammad A. Islam (UC Riverside), Xiaoqi Ren (Caltech), Shaolei Ren (UC Riverside), Adam Wierman (Caltech)
HourAhead Offering Strategies in Electricity Market for Power Producers with Storage and Intermittent Supply Lin Yang (The Chinese University of Hong Kong), Mohammad Hassan Hajiesmaili (Institute of Network Coding, The Chinese University of Hong Kong), Hanling Yi (The Chinese University of Hong Kong), Minghua Chen (The Chinese University of Hong Kong)
Scheduling Coflows in Datacenter Networks: Improved Bound for Total Weighted Completion Time Mehrnoosh Shafiee (Columbia University), Javad Ghaderi (Columbia University)
Characterizing 3D Floating Gate NAND Flash Qin Xiong (Huazhong University of Science and Technology), Fei Wu (Huazhong University of Science and Technology), Zhonghai Lu (KTH Royal Institute of Technology), Yue Zhu (Huazhong University of Science and Technology), You Zhou (Huazhong University of Science and Technology), Yibing Chu (Renice Technology Co., Limited), Changsheng Xie (Huazhong University of Science and Technology), Ping Huang (Temple University)
ECF: An MPTCP Path Scheduler to Manage Heterogeneous Paths Yeonsup Lim (University of Massachusetts Amherst), Erich M. Nahum (IBM Research), Don Towsley (University of Massachusetts Amherst), Richard J. Gibbens (University of Cambridge)
Simplex Queues for HotData Download Mehmet Fatih Aktas (Rutgers University), Elie Najm (École Polytechnique Fédérale de Lausanne), Emina Soljanin (Rutgers University)
An Empirical Analysis of Facebook's Free Basics Rijurekha Sen (MPISWS), Siddharth Singh (IIITDelhi), Vedant Nanda (IIITDelhi), Sohaib Ahmad (LUMS), Satadal Sengupta (IITKharagpur), Amreesh Phokeer (University of Cape Town), Zaid Ahmed Farooq (LUMS), Taslim Arefin Khan (BUET), Ponnurangam Kumaraguru (IIITDelhi), Ihsan Ayyub Qazi (LUMS), David Choffnes (Northeastern University), Krishna P. Gummadi (MPISWS)
Multipath TCP on a VANET: A Performance Study Jorge Mena (UCLA), Peter Bankole (UCLA), Mario Gerla (UCLA)
A Fast, Small, and Dynamic Forwarding Information Base Ye Yu (University of Kentucky), Djamal Belazzougui (CERIST), Chen Qian (University of California, Santa Cruz), Qin Zhang (Indiana University Bloomington)
HFTraC: HighFrequency Traffic Control Ning Wu (Cornell University), Yingjie Bi (Cornell University), Nithin Michael (Waltz Networks, Inc.), Ao Tang (Cornell University), John Doyle (Caltech), Nikolai Matni (Caltech)
Adaptive TTLBased Caching for Content Delivery Soumya Basu (The University of Texas at Austin), Aditya Sundarrajan (University of Massachusetts Amherst), Javad Ghaderi (Columbia University), Sanjay Shakkottai (The University of Texas at Austin), Ramesh Sitaraman (University of Massachusetts Amherst and Akamai Technologies)
