# 2019-2020

## 2020/06/18: Clifford Stein, "Parallel Approximate Undirected Shortest Paths Via Low Hop Emulators"

Clifford Stein (Columbia University)

**Abstract:**

Although sequential algorithms with (nearly) optimal running time for finding shortest paths in undirected graphs with non-negative edge weights have been known for several decades, near-optimal parallel algorithms have turned out to be a much tougher challenge. In this talk, we present a (1+epsilon)-approximate parallel algorithm for computing shortest paths in undirected graphs, achieving polylog depth and near-linear work. All prior (1+epsilon)-algorithms with polylog depth perform at least superlinear work. Improving this long-standing upper bound obtained by Cohen (STOC'94) has been open for 25 years.

Our algorithm uses several new tools. Prior work uses hopsets to introduce shortcuts in the graph. We introduce a new notion that we call low hop emulators. We also introduce compressible preconditioners, which we use in conjunction with Serman's framework (SODA '17) for the uncapacitated minimum cost flow problem.

Joint work with Alex Andoni and Peilin Zhong.

## 2020/06/03: Michael P. Kim, "Learning from Outcomes: Evidence-Based Rankings"

Michael P. Kim (Stanford University)

*Slides**: **please contact the speaker**.*

**Abstract:**

In this work, we address the task of ranking members of a population according to their qualifications based on a training set of binary outcome data. A natural approach for ranking is to reduce to prediction: first learn to predict individuals’ “probability” of success; then rank individuals in the order specified by the predictions. A concern with this approach is that such rankings may be vulnerable to manipulation. The rank of an individual depends not only on their own qualification but also on every other individuals’ qualifications, so small inaccuracies in prediction may result in a highly inaccurate and unfair induced ranking. We show how to obtain rankings that satisfy a number of desirable accuracy and fairness criteria, despite the coarseness of binary outcome data. We develop two parallel definitions of evidence-based rankings. First, we study a semantic notion of domination-compatibility: if the training data suggest that members of a set S are on-average more qualified than the members of T, then a ranking that favors T over S (i.e., where T dominates S) is blatantly inconsistent with the data, and likely to be discriminatory. Our definition asks for domination-compatibility, not just for a pair of sets (e.g., majority and minority populations), but rather for every pair of sets from a rich collection \mathcal{C} of subpopulations. The second notion—evidence-consistency—aims at precluding even more general forms of discrimination: the ranking must be justified on the basis of consistency with the expectations for every set in the collection \mathcal{C}. Somewhat surprisingly, while evidence-consistency is a strictly stronger notion than domination-compatibility when the collection \mathcal{C} is predefined, the two notions are equivalent when the collection \mathcal{C} may depend on the ranking itself. Finally, we show a tight connection between evidence-based rankings and multi-calibrated predictors [HKRR’18]. This connection establishes a way to reduce the task of ranking to prediction that ensures strong guarantees of fairness in the resulting ranking.

Joint work with Cynthia Dwork, Omer Reingold, Guy N. Rothblum, and Gal Yona. Appeared at FOCS 2019.

## 2020/05/27: Rahul Ilango, "The Story of the Minimum Circuit Size Problem (MCSP) So Far, and Hardness for Multi-output Circuits"

Rahul Ilango (MIT)

**Abstract:**

The Minimum Circuit Size Problem (MCSP) roughly asks what the "complexity" of a given string is. Informally, one can think of this as determining the degree of "computational order" a string has.

In the past several years, there has been a resurgence of interest in MCSP. A series of exciting results have begun unraveling what looks to be a fascinating story. This story already reveals deep connections between MCSP and a growing list of fields, including cryptography, learning theory, structural complexity theory, average-case complexity, and circuit complexity. As an example, Santhanam recently proved a conditional equivalence between the complexity of MCSP and the existence of one-way functions.

This talk is split into two parts. The first part is a broad introduction to MCSP, answering the following questions: What is this problem? Why is it interesting? What do we know so far, and where might the story go next? The second part discusses recent joint work with Bruno Loff and Igor Oliveira showing that the "multi-output version" of MCSP is NP-hard.

## 2020/05/20: Mark Bun, "An Equivalence Between Private Classification and Online Prediction"

Mark Bun (Boston University)

**Abstract:**

We prove that every concept class with finite Littlestone dimension can be learned by an (approximate) differentially-private algorithm. The converse direction was shown in recent work of Alon, Livni, Malliaris, and Moran, STOC '19. Together these two results show that a class of functions is privately learnable if and only if it is learnable in the mistake-bound model of online learning. To establish our result, we introduce "global stability," a new notion of algorithmic stability for learning algorithms that we show can always be satisfied when learning classes of finite Littlestone dimension.

## 2020/05/13: Sahil Singla, "Online Vector Balancing and Geometric Discrepancy"

Sahil Singla (Princeton University and IAS)

**Abstract:**

We consider an online vector balancing question where T vectors, chosen from an arbitrary distribution over [-1,1]^n, arrive one-by-one and must be immediately given a {+1,-1} sign. The goal is to keep the discrepancy---the \ell_{\infty}-norm of any signed prefix-sum---as small as possible. A concrete example of this question is the online interval discrepancy problem where T points are sampled one-by-one uniformly in the unit interval [0,1], and the goal is to immediately color them {+1,-1} such that every sub-interval remains always nearly balanced. As random coloring incurs \Omega(T^{1/2}) discrepancy, while the worst-case offline bounds are \Theta(\sqrt{n \log (T/n)}) for vector balancing and 1 for interval balancing, a natural question is whether one can (nearly) match the offline bounds in the online setting for these problems. One must utilize the stochasticity as in the worst-case scenario it is known that discrepancy is \Omega(T^{1/2}) for any online algorithm.

In this work, we introduce a new framework that allows us to handle online vector balancing even when the input distribution has dependencies across coordinates. In particular, this lets us obtain a \poly(n, \log T) bound for online vector balancing under arbitrary input distributions, and a \polylog (T) bound for online interval discrepancy. Our framework is powerful enough to capture other well-studied geometric discrepancy problems; e.g., we obtain a \poly(\log^d (T)) bound for the online d-dimensional Tusnády's problem. All our bounds are tight up to polynomial factors.

A key new technical ingredient in our work is an anti-concentration inequality for sums of pairwise uncorrelated random variables, which might also be of independent interest.

Based on joint works with Nikhil Bansal, Haotian Jiang, Janardhan Kulkarni, and Makrand Sinha. Part of this work appears in STOC 2020 and is available at https://arxiv.org/abs/1912.03350

## 2020/05/06: Nathan Klein, "An improved approximation algorithm for TSP in the half integral case"

Nathan Klein (University of Washington)

**Abstract:**

A classic result from Christofides in the 70s tells us that a fast algorithm for the traveling salesperson problem (TSP) exists which returns a solution at most 3/2 times worse than the optimal. Since then, however, no better approximation algorithm has been found. In this talk, I will give an overview of research towards the goal of beating 3/2 and will present the first sub-3/2 approximation algorithm for the special case of "half integral" TSP instances. These instances have received significant attention in part due to a conjecture from Schalekamp, Williamson and van Zuylen that they attain the integrality gap of the subtour polytope. If this conjecture is true, our work shows that the integrality gap of the polytope is bounded away from 3/2, giving hope for an improved approximation for the general case.

This presentation is of joint work with Anna Karlin and Shayan Oveis Gharan.

## 2020/04/29: Sepideh Mahabadi, "Non-Adaptive Adaptive Sampling in Turnstile Streams"

Sepideh Mahabadi (TTIC)

**Abstract:**

Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly selects rows of an underlying n by d matrix A, where n is much greater than d, with probabilities proportional to their distances to the subspace of the previously selected rows. Intuitively, adaptive sampling seems to be limited to trivial multi-pass algorithms in the streaming model of computation due to its inherently sequential nature of assigning sampling probabilities to each row only after the previous iteration is completed. Surprisingly, we show this is not the case by giving the first one-pass algorithms for adaptive sampling on turnstile streams and using space poly(d,k,log n), where k is the number of adaptive sampling rounds to be performed.

Our adaptive sampling procedure has a number of applications to various data summarization problems on turnstile streams that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model. This includes column subset selection, subspace approximation, projective clustering, and volume maximization. We complement our volume maximization algorithmic results with lower bounds that are tight up to lower order terms, even for multi-pass algorithms. By a similar construction, we also obtain lower bounds for volume maximization in the row-arrival model, which we match with competitive upper bounds.

This is a joint work with Ilya Razenshteyn, David Woodruff, and Samson Zhou.

## 2020/04/22: Huacheng Yu, "Nearly Optimal Static Las Vegas Succinct Dictionary"

Huacheng Yu (Princeton University)

**Abstract:**

Given a set S of n (distinct) keys from key space [U], each associated with a value from Σ, the *static dictionary* problem asks to preprocess these (key, value) pairs into a data structure, supporting value-retrieval queries: for any given x in [U], valRet(x) must return the value associated with x if x is in S, or return "N/A" if x is not in S. The special case where |Σ|=1 is called the membership problem. The "textbook" solution is to use a hash table, which occupies linear space and answers each query in constant time. On the other hand, the minimum possible space to encode all (key, value) pairs is only OPT:= lg (U choose n) + n lg |Σ| bits, which could be much less.

In this talk, we will talk about a randomized dictionary data structure using OPT+poly lg n+O(lglg...lg U) bits of space and expected constant query time, assuming the query algorithm have access to an external data-independent lookup table of size n^0.001. Previously, even for membership queries and when U \leq n^O(1), the best known data structure with constant query time requires OPT+n/poly lg n bits of space (due to Pagh [Pagh'01] and Pătraşcu [Pat'08]). It has O(lg n) query time when the space is at most OPT+n^{0.999}.

## 2020/04/08: Ramon van Handel, "Rademacher type and Enflo type coincide"

Ramon van Handel (Princeton University)

*(No slides)*

**Abstract:**

In Banach space theory, Rademacher type is an important invariant that controls many geometric and probabilistic properties of normed spaces. It is of considerable interest in various settings to understand to what extent such powerful tools extend to general metric spaces. A natural metric analogue of Rademacher type was proposed by Enflo in the 1960s-70s, and has found a number of interesting applications. Despite much work in the intervening years, however, the relationship between Rademacher type and Enflo type has remained unclear. This basic question is settled in joint work with Paata Ivanisvili and Sasha Volberg: in the setting of Banach spaces, Rademacher type and Enflo type coincide. The proof is based on a very simple but apparently novel insight on how to prove dimension-free inequalities on the Boolean cube. I will not assume any prior background in Banach space theory in the talk.

## 2020/04/01: Venkat Guruswami, "Arıkan meets Shannon: Polar codes with near-optimal convergence to channel capacity"

Venkat Guruswami (CMU)

**Abstract:**

We establish a constructive version of Shannon’s noisy coding theorem for binary codes, with information rate converging near-optimally fast to channel capacity as a function of code length. Specifically, let W be an arbitrary binary-input memoryless symmetric channel with Shannon capacity I(W), and fix any delta greater than 0. We construct, for any sufficiently small epsilon greater than 0, binary linear codes of block length O(1/epsilon^{2+delta}) and rate I(W)−epsilon that enable reliable communication on W with quasi-linear time encoding and decoding. Shannon's theorem implies the existence of such codes (without efficient constructions or decoding) with block length O(1/epsilon^2). This quadratic dependence on the gap epsilon to capacity is known to be best possible. Previously such a construction was only known for the case of the erasure channel.

Our codes are a variant of Arıkan's polar codes based on multiple carefully constructed local kernels, one for each intermediate channel that arises in the decoding. A key technical ingredient in the analysis is a strong converse to the noisy coding theorem that shows extreme unpredictability of even a single message bit when communicating via random linear codes at rates slightly above capacity.

Joint work with Andrii Riazanov and Min Ye.

## 2020/03/25: Dana Moshkovitz, "Nearly Optimal Pseudorandomness From Hardness"

Dana Moshkovitz (UT Austin)

**Abstract:**

Existing proofs that deduce BPP=P from circuit lower bounds convert randomized algorithms into deterministic algorithms with a large polynomial slowdown. We convert randomized algorithms into deterministic ones with little slowdown. Specifically, assuming exponential lower bounds against randomized single-valued nondeterministic (SVN) circuits, we convert any randomized algorithm over inputs of length n running in time t \geq n to a deterministic one running in time t^{2+alpha} for an arbitrarily small constant alpha. Such a slowdown is nearly optimal, as, under complexity-theoretic assumptions, there are problems with an inherent quadratic derandomization slowdown. We also convert any randomized algorithm that errs rarely into a deterministic algorithm having a similar running time (with pre-processing). The latter derandomization result holds under weaker assumptions, of exponential lower bounds against deterministic SVN circuits. Our results follow from a new, nearly optimal, explicit pseudorandom generator. The construction uses, among other ideas, a new connection between pseudoentropy generators and locally list recoverable codes.

## 2020/03/11: Thomas Steinke, "Reasoning About Generalization via Conditional Mutual Information"

Thomas Steinke (IBM Research)

*Slides (PPTX)* [The bottom of the slides were apparently cut off in the recording, you can follow along with the slides using this link]

**Abstract:**

We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. Our framework ties together existing approaches, including uniform convergence bounds and recent methods for adaptive data analysis. Specifically, we use Conditional Mutual Information (CMI) to quantify how well the input (i.e., the training data) can be recognized given the output (i.e., the trained model) of the learning algorithm. We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods. We then show that bounded CMI implies various forms of generalization.

Based on joint work with Lydia Zakynthinou.

**Abstract:**

MIP* denotes the class of problems that admit interactive proofs with quantum entangled provers. It has been an outstanding question to characterize the complexity of MIP*. Most notably, there was no known computable upper bound on this class.We show that MIP* is equal to the class RE, the set of recursively enumerable languages. In particular, this shows that MIP* contains uncomputable problems. Through a series of known connections, this also yields a negative answer to Connes’ Embedding Problem from the theory of operator algebras. In this talk, I will explain the connection between Connes’ Embedding Problem, quantum information theory, and complexity theory. I will then give an overview of our approach, which involves reducing the Halting Problem to the problem of approximating the entangled value of nonlocal games.

Joint work with Zhengfeng Ji, Anand Natarajan, Thomas Vidick, and John Wright.

## 2020/02/12: Albert Atserias, "Automating Resolution is NP-Hard"

Albert Atserias (Universitat Politècnica de Catalunya)

**Abstract:**

We show that it is NP-hard to distinguish CNF formulas that have Resolution refutations of almost linear length from CNF formulas that do not even have weakly exponentially long ones. It follows from this that Resolution is not automatable in polynomial time unless P = NP, or in weakly exponential time unless ETH fails. The proof of this is simple enough that all its ideas can be explained in a talk. Along the way, I will try to explain the process of discovery that led us to the result.

This is joint work with Moritz Müller.

**Abstract:**

Peer review is the backbone of scholarly research. It is however faced with a number of challenges (or "demons") which cause unfairness to authors, and degrade the overall quality of the process. This talk will present principled and practical approaches to battle these demons in peer review:

Subjectivity: How to ensure that all papers are judged by the same yardstick?

Mis-calibration: How to use ratings in presence of arbitrary or adversarial mis-calibration?

Bias: How to rigorously test for existence of (gender/fame/race/...) biases in peer review?

Strategic behavior: How to insulate peer review from strategic behavior of author-reviewers?

Noise: How to assign reviewers to papers to simultaneously ensure fair and accurate evaluations in the presence of review noise?

The work uses tools from social choice theory, statistics and learning theory, information theory, game theory and decision theory. No prior knowledge on these topics will be assumed.

## 2019/11/20: Jason Li, "The Karger-Stein Algorithm is Optimal for k-cut"

Jason Li (CMU)

**Abstract:**

In the $k$-cut problem, we are given an edge-weighted graph and want to find the least-weight set of edges whose deletion breaks the graph into $k$ connected components. Algorithms due to Karger-Stein and Thorup showed how to find such a minimum $k$-cut in time approximately $O(n^{2k-2})$. The best lower bounds come from conjectures about the solvability of the $k$-clique problem and a reduction from $k$-clique to $k$-cut, and show that solving $k$-cut is likely to require time $\Omega(n^k)$. Our recent results have given special-purpose algorithms that solve the problem in time $n^{1.98k + O(1)}$, and ones that have better performance for special classes of graphs (e.g., for small integer weights).

In this work, we resolve the problem for general graphs, by showing that for any fixed $k \geq 2$, the Karger-Stein algorithm outputs any fixed minimum $k$-cut with probability at least $\widehat{O}(n^{-k})$, where $\widehat{O}(\cdot)$ hides a $2^{O(\ln \ln n)^2}$ factor. This also gives an extremal bound of $\widehat{O}(n^k)$ on the number of minimum $k$-cuts in an $n$-vertex graph and an algorithm to compute a minimum $k$-cut in similar runtime. Both are tight up to $\widehat{O}(1)$ factors.

The first main ingredient in our result is a fine-grained analysis of how the graph shrinks---and how the average degree evolves---under the Karger-Stein process. The second ingredient is an extremal result bounding the number of cuts of size at most $(2-\delta) OPT/k$, using the Sunflower lemma.

## 2019/11/06: Ryan O'Donnell, "Explicit near-Ramanujan graphs of every degree"

Ryan O'Donnell (CMU)

**Abstract:**

For every constant d and epsilon, we give a deterministic poly(n)-time algorithm that outputs a d-regular graph on ~n vertices that is epsilon-near-Ramanujan; i.e., its eigenvalues are bounded in magnitude by 2sqrt(d-1) + epsilon (excluding the single trivial eigenvalue of d).

Joint work with Sidhanth Mohanty (Berkeley) and Pedro Paredes (CMU).

## 2019/10/22: Hao Huang, "A Proof of the Sensitivity Conjecture"

Hao Huang (Emory University)

**Abstract:**

In the n-dimensional hypercube graph, one can easily choose half of the vertices such that they induce an empty graph. However, having even just one more vertex would cause the induced subgraph to contain a vertex of degree at least \sqrt{n}. This result is best possible, and improves a logarithmic lower bound shown by Chung, Furedi, Graham and Seymour in 1988. In this talk we will discuss a very short algebraic proof of it.

As a direct corollary of this purely combinatorial result, the sensitivity and degree of every boolean function are polynomially related. This solves an outstanding foundational problem in theoretical computer science, the Sensitivity Conjecture of Nisan and Szegedy.

**Abstract:**

A sunflower with r petals is a collection of r sets so that the intersection of each pair is equal to the intersection of all. Erdos and Rado in 1960 proved the sunflower lemma: for any fixed r, any family of sets of size w, with at least about w^w sets, must contain a sunflower. The famous sunflower conjecture is that the bound on the number of sets can be improved to c^w for some constant c. Despite much research, the best bounds until recently were all of the order of w^{cw} for some constant c. In this work, we improve the bounds to about (\log w)^w.

There are two main ideas that underlie our result. The first is a structure vs pseudo-randomness paradigm, a commonly used paradigm in combinatorics. This allows us to either exploit structure in the given family of sets, or otherwise to assume that it is pseudo-random in a certain way. The second is a duality between families of sets and DNFs (Disjunctive Normal Forms). DNFs are widely studied in theoretical computer science. One of the central results about them is the switching lemma, which shows that DNFs simplify under random restriction. We show that when restricted to pseudo-random DNFs, much milder random restrictions are sufficient to simplify their structure.

Joint work with Ryan Alweiss, Kewen Wu and Jiapeng Zhang.

**Abstract:**

I will explain our recent understanding of the chasing convex bodies problem posed by Friedman and Linial in 1991. In this problem, an online player receives a request sequence K_1,\dots,K_T of convex sets in d dimensional space and moves his position online into each requested set. The player’s movement cost is the length of the resulting path. Chasing convex bodies asks if there exists an online algorithm with cost competitive against the offline optimal path. This is both a challenging metrical task system and (equivalent to) a competitive analysis view on online convex optimization.

This problem was open for d>2 until last year but has recently been solved twice. The first solution gives a 2^{O(d)} competitive algorithm while the second gives a nearly optimal \min(d,\sqrt(d\log(T))) competitive algorithm for T requests. The latter result is based on the Steiner point, which is the exact optimal solution to a related geometric problem called Lipschitz selection and dates from 1840. In the talk, I will briefly outline the first solution and fully explain the second.

Partially based on joint works with Sébastien Bubeck, Bo’az Klartag, Yin Tat Lee, and Yuanzhi Li.

**Abstract:**

A long-standing puzzle in quantum complexity theory is to understand the power of the class \textsf{MIP*} of multiprover interactive proofs with shared entanglement. This question is closely related to the study of entanglement through non-local games, which dates back to the pioneering work of Bell. In this work we show that \textsf{MIP*} contains \textsf{NEEXP} (non-deterministic doubly-exponential time), exponentially improving the prior lower bound of \textsf{NEXP} due to Ito and Vidick. Our result shows that shared entanglement exponentially increases the power of these proof systems, as the class \textsf{MIP} of multiprover interactive proofs without shared entanglement is known to be equal to \textsf{NEXP}.

## 2019/05/29: Lior Kamma, "Lower Bounds for Multiplication via Network Coding"

Lior Kamma (Aarhus)

**Abstract:**

Multiplication is one of the most fundamental computational problems, yet its true complexity remains elusive. The best known upper bound, very recently proved by Harvey and Van Der Hoven shows that two n-bit numbers can be multiplied via a boolean circuit of size O(n \lg n).

We prove that if a central conjecture in the area of network coding is true, then any constant degree Boolean circuit for multiplication must have size \Omega(n \lg n), thus (conditioned on the conjecture) completely settling the complexity of multiplication circuits. We additionally revisit classic conjectures in circuit complexity, due to Valiant, and show that the network coding conjecture also implies one of Valiant’s conjectures.

Joint work with Peyman Afshani, Casper Freksen and Kasper Green Larsen

## 2019/05/15: Ewin Tang, "Quantum-inspired classical linear algebra algorithms: why and how?"

Ewin Tang (University of Washington)

**Abstract:**

Over the past ten years, the field of quantum machine learning (QML) has produced many polylogarithmic-time procedures for linear algebra routines, assuming certain “state preparation” assumptions. Though such algorithms are formally incomparable with classical computing, a recent line of work uses an analogous classical model of computation as an effective point of comparison to reveal speedups (or lack thereof) gained by QML. The resulting “dequantized” algorithms assume sampling access to input to speed up runtimes to polylogarithmic in input size.

In this talk, we will discuss the motivation behind this model and its relation to existing randomized linear algebra literature. Then, we will delve into an example quantum-inspired algorithm: Gilyen, Lloyd, and Tang’s algorithm for low-rank matrix inversion. This dequantizes a variant of Harrow, Hassidim, and Lloyd’s matrix inversion algorithm, a seminal work in QML. Finally, we will consider the implications of this work on exponential speedups in QML. No background of quantum computing is assumed for this talk.