All tutorials will be held on June 20, 2014. Details are given below.
Algorithm Design for MapReduce and Beyond
Friday, 09:00am - 10:30am
Speaker: Sergei Vassilvitskii, Google
Abstract: Traditionally we think of algorithms as running on data available in a single location, typically in main memory or on disk. In many modern applications, the data is often too large to reside in a single location (tera- and petabyte sized datasets are common). Processing such data requires new algorithms and new models of computation. In recent years, practitioners have turned to MapReduce-based systems, such as Hadoop, for large scale data analysis. The goal of this tutorial is to provide an overview of algorithmic advances in MapReduce algorithms.
We begin with the question of modeling MapReduce and then turn our attention to graph algorithms, where we show how fundamental algorithms, such as those for counting connected components, finding spanning trees and computing matchings can be implemented in this setting. We then tackle the large class of greedy algorithms and show how to adapt these seemingly sequential algorithms to MapReduce. Finally, we turn our attention to data mining algorithms such as those for clustering and social network analysis. We present new parallel-friendly approaches for these problems, and discuss how to handle other important practical considerations such as data skew.
Friday, 11:00am - 12:30pm
Speaker: Sebastien Bubeck, Princeton
Abstract: In the recent years the multi-armed bandit problem has attracted a lot of attention in the theoretical learning community. This growing interest is a consequence of the large number of problems that can be modeled as a multi-armed bandit: ad placement, website optimization, packet routing, ect. Furthermore the bandit methodology is also used as a building block for more complicated scenarios such as reinforcement learning, model selection in statistics, or computer game-playing. While the basic stochastic multi-armed bandit can be traced back to Thompson (1933) and Robbins (1952), it is only very recently that we obtained an (almost) complete understanding of this simple model. Moreover many extensions of the original problem have been proposed in the past fifteen years, such as bandits without a stochastic assumption (the so-called adversarial model), or bandits with a very large (but structured) set of arms.
The tutorial will be divided into three parts:
In the first part we discuss the state-of-the-art results on the basic multi-armed bandit problem (both stochastic and adversarial).
In the second part the focus will be on continuously-armed stochastic bandits, with a Lipschitz assumption on the mean-payoff.
Finally in the third part we consider the case of adversarial bandits, with a linear loss and a very large set of arms with some combinatorial structure.
Big Data Privacy: Threats, Countermeasures, and Incentives
Friday, 1:45pm - 05:30pm (3 hours + breaks)
Speakers: Ioannidis Stratis, Technicolor and Udi Weinsberg, Technicolor
Abstract: Big data analytics are the cornerstone of numerous online services, underlying tasks such as targeted advertising, personalized recommendations and user profiling. The proliferation of user data collection practices that have enabled such services, along with several recent high-profile privacy breaches, have raised privacy concerns from consumer advocacy groups, regulatory bodies and the public at large. In turn, this has recently spurred an extensive research effort on both theoretical and practical aspects of privacy in big data analytics.
The purpose of this tutorial is to overview this nascent field along three different axes: threats, countermeasures, and user incentives. After a brief introduction of big data analytic techniques, we will cover privacy threats arising from their unfettered application on user data, focusing on case studies of private information leaks. We will then discuss methods for mitigating privacy risks, while still preserving the utility of deployed services. We will cover statistical methods, such as differential privacy, as well as cryptographic techniques applied to inference tasks such as regression and matrix factorization. Finally, we will conclude by discussing how to incentivize strategic users to relax their privacy constraints in return for monetary compensation. We will cover the formal setting of privacy auctions, and overview recent results for such auctions over different regression tasks.