We are pleased to announce that we will have three tutorials this year. All three tutorials will be on Monday, the 17th of June.
Monday, June 17th, 9:00am - 10:30am
Geo-Replication in Data Center Applications
Speaker: Marcos K. Aguilera, Microsoft Research Silicon Valley
Abstract: Data center applications increasingly require a storage system that is geo-replicated, that is, replicated across many geographic locations. Geo-replication can reduce access latency, improve availability, and provide disaster tolerance. It turns out there are many techniques for geo-replication with different trade-offs. In this talk, we give an overview of these techniques, organized according to two orthogonal dimensions: level of synchrony (synchronous and asynchronous) and type of storage service (read-write, state machine, transaction). We explain the basic idea of these techniques, together with their applicability and trade-offs.
Bio: Marcos received a Ph.D. in Computer Science from Cornell University in 2000. We has worked as a researcher at Compaq's Systems Research Center and HP Labs. He is now a senior researcher at Microsoft Research Silicon Valley. His interests include distributed systems, distributed algorithms, fault tolerance, and storage systems.
Monday, June 17th, 11:00am - 12:30pm
The Fundamentals of Heavy-tails: Properties, Emergence, and Identification
Speakers: Adam Wierman, Caltech; Jayakrishnan Nair, Caltech; Bert Zwart, CWI
Abstract: Heavy-tails are a continual source of excitement and confusion across disciplines as they are repeatedly "discovered" in new contexts. This is especially true within computer systems, where heavy-tails seemingly pop up everywhere -- from degree distributions in the internet and social networks to file sizes and interarrival times of workloads. However, despite nearly a decade of work on heavy-tails they are still treated as mysterious, surprising, and even controversial.
The goal of this tutorial is to show that heavy-tailed distributions need not be mysterious and should not be surprising or controversial. In particular, we will demystify heavy-tailed distributions by showing how to reason formally about their counter-intuitive properties; we will highlight that their emergence should be expected (not surprising) by showing
that a wide variety of general processes lead to heavy-tailed
distributions; and we will highlight that most of the controversy surrounding heavy-tails is the result of bad statistics, and can be avoided by using the proper tools.
Adam Wierman is a Professor in the Department of Computing and Mathematical Sciences at the California Institute of Technology, where he is a member of the Rigorous Systems Research Group (RSRG). He received his Ph.D., M.Sc. and B.Sc. in Computer Science from
Carnegie Mellon University in 2007, 2004, and 2001, respectively. His
research interests center around resource allocation and scheduling
decisions in computer systems and services. More specifically, his
work focuses both on developing analytic techniques in stochastic
modeling, queueing theory, scheduling theory, and game theory, and
applying these techniques to application domains such as
energy-efficient computing, data centers, social networks, and
electricity markets. He received the 2011 ACM SIGMETRICS Rising Star
award, and has been co-recipient of best paper awards at ACM
SIGMETRICS, IEEE INFOCOM, IFIP Performance, IEEE Green Computing
Conference, and ACM GREENMETRICS. He was named a Seibel Scholar,
received an Okawa Foundation grant, and received an NSF CAREER grant.
Additionally, his dissertation received the CMU School of Computer
Science Distinguished Dissertation Award and was given an honorable
mention for the INFORMS Doctoral Dissertation Award for Operations
Research in Telecommunications. He has also received multiple
teaching awards, including the Associated Students of the California
Institute of Technology (ASCIT) Teaching Award. Dr. Wierman has more
than 60 refereed publications and serves as an Associate Editor for
the Operations Research journal and on the editorial board of the
Performance Evaluation journal and the IEEE Transactions on Cloud
Bert Zwart is currently a senior researcher at CWI, where he leads
the Probability and Stochastic Networks group. He also holds a full
professor position at VU University Amsterdam, is senior fellow at
Eurandom, and holds an adjunct professor position at the H. Milton
Stewart School of Industrial and Systems Engineering at Georgia
Institute of Technology, where he was holding a Coca-Cola Chair until
2008. Bert Zwart is the 2008 recipient of the Erlang prize for
outstanding contributions to applied probability by a researcher not
older than 35 years old, and an IBM faculty award. His research is
concerned with the application of analytic and probabilistic
asymptotic methods to applied probability models in computer systems,
communication networks, customer contact centers, and manufacturing
systems. Dr. Zwart has published more than 70 refereed publications
and is council member of the Applied Probability Society of INFORMS.
Dr. Zwart has been area editor of Stochastic Models for Operations
Research, the flagship journal of his profession, from 2009-2011. In
addition, dr. Zwart is editor-in-chief (with J.K. Lenstra and M.
Trick) of the journal Surveys in Operations Research and Management
Science, and serves on the editorial board of Mathematics of
Operations Research, Mathematical Methods of Operations Research,
Operations Research, Queueing Systems and Stochastic Systems. He is a
recipient of Veni and Vidi research grants from NWO.
Jayakrishnan Nair received his PhD from California Institute of
Technology (Caltech) in 2012. His PhD thesis focused on scheduling
for heavy-tailed and light-tailed workloads in queueing systems. He
is currently a post-doctoral scholar at Caltech and will join CWI as
a post-doctoral scholar in May 2013. His research interests include
modeling, performance evaluation, and design issues in queueing
systems and communication networks. Jayakrishnan was a recipient of
the best paper award at IFIP Performance, 2010.
Monday, June 17th, 11:00am - 12:30pm
Profiling and Analyzing the I/O Performance of NoSQL DBs
Speaker: Jiri Schindler, NetApp
Abstract: The advent of the so-called NoSQL databases has brought about a new
model of using storage systems. While traditional relational database
systems took advantage of features offered by centrally-managed,
enterprise-class storage arrays, the new generation of database
systems with weaker data consistency models is content with using and
managing locally attached individual storage devices and providing
data reliability and availability through high-level software
features and protocols.
This tutorial aims to review the architecture of selected NoSQL DBs to
lay the foundations for understanding how these new DB systems behave.
In particular, it focuses on how (in)efficiently these new systems
use I/O and other resources to accomplish their work. The tutorial
examines the behavior of several NoSQL DBs with an emphasis on
Cassandra - a popular NoSQL DB system. It uses I/O traces and
resource utilization profiles caputred in private cloud deployments
that use both dedicated directly attached storage as well as shared
The material is geared specifically towards SIGMETRICS attendees who
are familiar with system profiling and analysis both theoretically as
well as through hands-on experiences as systems administrators. It
does not assume any prior experience with NoSQL or relational DB
systems. Nor does it require deep understaing of storage systems
architecture. The necessary concepts are reviewed to establish a
common ground and to relate the concepts of NoSQL DBs. The
participant will be able to learn that NoSQL DB systems are not much
different in their fundamentals from other systems for storing
(semi)structured data even tough their architecture (scale-out
clustred shared-nothing model) and the use cases (with eventual consistency data models) are much different.
Bio: Jiri Schindler is a Member of Technical Staff at the NetApp Advanced Technology Group where he works on storage architectures integrating
flash memory and disk drives in support of applications for
management of (semi)structured data. Recently, he has been
investigating the I/O profiles of columnar databases and designed a
system for efficient de-staging of small updates to disk drives with
the help of flash memory. Jiri has over a decade of systems
experience ranging from device level request scheduling, though file
systems, data layouts, and whole-system performance analysis.
Previously, Jiri worked at EMC on Centera - the shared-nothing
clustered content-addressable storage system. While getting his PhD
at Carnegie Mellon University he and his colleagues designed and
built the Fates (Clotho, Atropos, and
Lachesis) system for efficient execution of mixed database workloads
with different I/O profiles. Jiri has been an adjunct professor at
the Northeastern University where he taught storage systems classes.