ACM SIGMETRICS 2021
Beijing, China
June 14-18, 2021
ChonLam Lao
Tsinghua University
ATP: In-network Aggregation for Multi-tenant Learning
Abstract
Distributed deep neural network training (DT) systems are widely deployed in clusters
where the network is shared across multiple tenants, i.e., multiple DT jobs.
Each DT job computes and aggregates gradients. Recent advances in hardware accelerators
have shifted the the performance bottleneck of training from computation to communication.
To speed up DT jobs' communication, we propose ATP, a service for in-network aggregation
aimed at modern multi-rack, multi-job DT settings.
ATP uses emerging programmable switch hardware to support in-network aggregation
at multiple rack switches in a cluster to speedup DT jobs. ATP performs decentralized, dynamic,
best-effort aggregation, enables efficient and equitable sharing of limited switch resources
across simultaneously running DT jobs, and gracefully accommodates heavy contention for switch resources.
ATP outperforms existing systems accelerating training throughput by up to 38% - 66% in a cluster
shared by multiple DT jobs.
Biography
ChonLam Lao is a master's student in Computer Science at IIIS Tsinghua University, advised by Professor Wenfei Wu.
Next year, he will study for a doctoral degree at Harvard University co-advised by Professor Minlan Yu and Professor Aditya Akella.
His research primary focuses on programmable networks and machine learning systems.
Recently, his paper "ATP: In-network Aggregation for Multi-tenant Learning" is accepted and received the best paper award at NSDI 2021.
Michael Lingzhi Li
MIT
Forecasting Covid-19 With Application To Vaccine Trial Design and Distribution
Abstract
To help combat the COVID-19 pandemic and understand the impact of government interventions, we develop DELPHI,
a novel epidemiological model. We applied DELPHI across over 200 regions since early April 2020
with consistently high predictive power, and is a key contributor to the core CDC ensemble forecast.
DELPHI compares favorably with other models and predicted large-scale epidemics in areas such as
South Africa and Russia weeks before realization.
Furthermore, using DELPHI, we can quantify the impact of interventions and provide insights
on future virus incidence under different policies. We illustrate how Janssen Pharmaceuticals (J&J)
effectively utilized such analysis from DELPHI to optimally select the Phase III trial locations of
the first single-dose vaccine Ad26.Cov2.S, accelerating the trial by 8 weeks while reducing the number
of participants needed by 25%. We also demonstrate how DELPHI informed FEMA on optimizing vaccine
distribution under constrained supply to minimize the number of pandemic deaths.
Biography
Michael Lingzhi Li is a doctoral candidate at the MIT Operations Research Center,
advised by Prof. Dimitris Bertsimas. His research interests primarily focus on scalable algorithms
that combine machine learning and optimization, with emphasis on real-world applications
in both healthcare and supply chain management. He has worked on problems in interpretable machine learning,
personalized risk predictions, medical therapy prescription, infectious disease epidemiology,
warehouse optimization and labor scheduling. He is the recipient of awards including the 2021
Innovative Applications in Analytics Award, the 2020 INFORMS Pierskalla Award and
the 2019 MSOM Best Student Paper Finalist Award.
Ayush Sekhari
Cornell University
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations
Abstract
We design an algorithm which finds an ϵ-approximate stationary point (with ‖∇F(x)‖≤ϵ) using O(ϵ−3)
stochastic gradient and Hessian-vector products, matching guarantees that were previously available
only under a stronger assumption of access to multiple queries with the same random seed.
We prove a lower bound which establishes that this rate is optimal and---surprisingly---that
it cannot be improved using stochastic pth order methods for any p≥2,
even when the first p derivatives of the objective are Lipschitz.
Together, these results characterize the complexity of non-convex stochastic optimization with
second-order methods and beyond. Expanding our scope to the oracle complexity
of finding (ϵ,γ)-approximate second-order stationary points, we establish nearly matching upper
and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
This is joint work with Yossi Arjevani, Yair Carmon, John Duchi, Dylan J. Foster and Karthik Sridharan.
Biography
Ayush is a PhD student in the Computer Science department at Cornell University,
advised by Professor Karthik Sridharan and Professor Robert D. Kleinberg.
His research interests span across optimization, online learning, reinforcement learning and control,
and the interplay between them. Before coming to Cornell, he spent a year at Google as a part of the
Brain residency program. Before Google, he completed his undergraduate studies in computer science
from IIT Kanpur in India where he was awarded the President's gold medal.