2021/06/09: Pravesh Kothari and Ankur Moitra, "Robustly Learning Mixtures of Gaussians"

Pravesh Kothari (CMU) and Ankur Moitra (MIT)

Slides (PDF): Part 1 and Part 2


For a while now the problem of robustly learning a high-dimensional mixture of Gaussians has had a target on its back. The first works in algorithmic robust statistics gave provably robust algorithms for learning a single Gaussian. Since then there has been steady progress, including algorithms for robustly learning mixtures of spherical Gaussians, mixtures of Gaussians under separation conditions, and arbitrary mixtures of two Gaussians. In this talk we will discuss two recent works that essentially resolve the general problem. There are important differences in their techniques, setup, and overall quantitative guarantees, which we will discuss.

The talk will cover the following independent works:

  • Liu, Moitra, "Settling the Robust Learnability of Mixtures of Gaussians" [arXiv:2011.03622]

  • Bakshi, Diakonikolas, Jia, Kane, Kothari, Vempala, "Robustly Learning Mixtures of k Arbitrary Gaussians" [arXiv:2012.02119]

2021/05/26: Kira Goldner, "An Overview of Using Mechanism Design for Social Good"

Kira Goldner (Columbia University)


In order to accurately predict an algorithm's outcome and quality when it interacts with participants who have a stake in the outcome, we must design it to be robust to strategic manipulation. This is the subject of algorithmic mechanism design, which borrows ideas from game theory and economics to design robust algorithms. In this talk, I will show how results from the theoretical foundations of algorithmic mechanism design can be used to solve problems of societal concern.

I will overview recent work in this area in many different applications — housing, labor markets, carbon license allocations, health insurance markets, and more — as well as discuss open problems and directions ripe for tools from both mechanism design and general TCS.

2021/05/12: Santhoshini Velusamy, "Classification of the approximability of all finite Max-CSPs in the dynamic streaming setting"

Santhoshini Velusamy (Harvard University)


A maximum constraint satisfaction problem, Max-CSP(F), is specified by a finite family of constraints F, where each constraint is of arity k. An instance of the problem on n variables is given by m applications of constraints from F to length-k subsequences of the n variables, and the goal is to find an assignment to the n variables that satisfies the maximum number of constraints. The class of Max-CSP(F) includes optimization problems such as Max-CUT, Max-DICUT, Max-3SAT, Max-q-Coloring, Unique Games, etc.

In this talk, I will present our recent dichotomy theorem on the approximability of Max-CSP(F) for every finite family F, in the single-pass dynamic streaming setting. In this setting, at each time step, a constraint is either added to or deleted from the stream. In the end, the streaming algorithm must estimate the maximum number of constraints that can be satisfied using space that is only polylogarithmic in n. No background in streaming algorithms or constraint satisfaction problems will be needed to enjoy this talk!

The talk will be based on this paper, and this paper with Chi-Ning Chou, Alexander Golovnev, and Madhu Sudan.

2021/04/28: Ronen Eldan, "Localization, Stochastic Localization and Yuansi Chen's Recent Breakthrough on the Kannan–Lovasz–Simonovitz Conjecture"

Ronen Eldan (Weizmann Institute)


The Kannan-Lovasz and Simonovits (KLS) conjecture considers the following isoperimetric problem on high-dimensional convex bodies: Given a convex body $K$, consider the optimal way to partition it into two pieces of equal volume so as to minimize their interface. Is it true that up to a universal constant, the minimal partition is attained via a hyperplane cut? Roughly speaking, this question can be thought of as asking "to what extent is a convex set a good expander"?

In analogy to expander graphs, such lower bounds on the capacity would imply bounds on mixing times of Markov chains associated with the convex set, and so this question has direct implications on the complexity of many computational problems on convex sets. Moreover, it was shown that a positive answer would imply Bourgain's slicing conjecture.

Very recently, Yuansi Chen obtained a striking breakthrough, nearly solving this conjecture. In this talk, we will overview some of the central ideas used in the proof. We will start with the classical concept of "localization" (a very useful tool to prove concentration inequalities) and its extension, stochastic localization - the main technique used in the proof.

2021/04/14: Andrea Lincoln, "New Techniques for Proving Fine-Grained Average-Case Hardness"

Andrea Lincoln (Simons Institute)


In this talk I will cover a new technique for worst-case to average-case reductions. There are two primary concepts introduced in this talk: "factored" problems and a framework for worst-case to average-case fine-grained (WCtoACFG) self reductions.

We will define new versions of OV, kSUM and zero-k-clique that are both worst-case and average-case fine-grained hard assuming the core hypotheses of fine-grained complexity. We then use these as a basis for fine-grained hardness and average-case hardness of other problems. Our hard factored problems are also simple enough that we can reduce them to many other problems, e.g. to edit distance, k-LCS and versions of Max-Flow. We further consider counting variants of the factored problems and give WCtoACFG reductions for them for a natural distribution.

To show hardness for these factored problems we formalize the framework of [Boix-Adsera et al. 2019] that was used to give a WCtoACFG reduction for counting k-cliques. We define an explicit property of problems such that if a problem has that property one can use the framework on the problem to get a WCtoACFG self reduction.

In total these factored problems and the framework together give tight fine-grained average-case hardness for various problems including the counting variant of regular expression matching.

Based on joint work with Mina Dalirrooyfard and Virginia Vassilevska Williams.

2021/03/31: Jasper Lee, "Optimal Sub-Gaussian Mean Estimation in ℝ"

Jasper Lee (Brown University)


We revisit and settle a fundamental problem in statistics: given access to independent samples from a 1D random variable (with finite but unknown mean and variance), what is the best way to estimate the mean in the high probability regime, in terms of error convergence with respect to sample size? The conventional wisdom is to use the empirical mean as our estimate. However, it is known that the empirical mean can in fact have exponentially sub-optimal convergence for certain heavy-tailed distributions. On the other hand, the median-of-means estimator (invented and reinvented in various literature) does have sub-Gaussian convergence for all finite-variance distributions, albeit only in the big-O sense with a sub-optimal multiplicative constant. The natural remaining question then, is whether it is possible to bridge the gap, and have an estimator that has optimal convergence with the right constant for all finite-variance distributions.

In this talk, we answer the question affirmatively by giving an estimator that converges with the optimal constant inside the big-O, up to a 1+o(1) multiplicative factor. The estimator is also easy to compute. The convergence analysis involves deriving tail bounds using linear and convex-concave programming dualities, which may be of independent interest.

Based on joint work with Paul Valiant.

2021/03/17: Avishay Tal, "Junta Distance Approximation with Sub-Exponential Queries"

Avishay Tal (UC Berkeley)


Joint Work with Vishnu Iyer and Michael Whitmeyer

A Boolean function f:{0,1}^n \to {0,1} is called a k-junta if it depends only on k out of the n input bits. Junta testing is the task of distinguishing between k-juntas and functions that are far from k-juntas. A long line of work settled the query complexity of testing k-juntas, which is O(k log(k)) [Blais, STOC 2009; Saglam, FOCS 2018]. Suppose, however, that f is not a perfect k-junta but rather correlated with a k-junta. How well can we estimate this correlation? This question was asked by De, Mossel, and Neeman [FOCS 2019], who gave an algorithm for the task making ~exp(k) queries. We present an improved algorithm that makes ~exp(sqrt{k}) many queries.

Along the way, we also give an algorithm, making poly(k) queries, that provides "implicit oracle access" to the underlying k-junta. Our techniques are Fourier analytical and introduce the notion of "normalized influences" that might be of independent interest.

Paper: https://eccc.weizmann.ac.il/report/2021/004/

2021/03/3: Steve Hanneke, "A Theory of Universal Learning"

Steve Hanneke (TTIC)


How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy.

In this work, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this work is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case.

Joint work with Olivier Bousquet, Shay Moran, Ramon van Handel, and Amir Yehudayoff.

2021/02/17: William Hoza, "Fooling Constant-Depth Threshold Circuits"

William Hoza (UT Austin)


We present the first non-trivial pseudorandom generator (PRG) for linear threshold (LTF) circuits of arbitrary constant depth and super-linear size. This PRG fools circuits with depth d and n^{1 + delta} wires, where delta = exp(-O(d)), using seed length O(n^{1 - delta}) and with error exp(-n^{delta}). This tightly matches the best known lower bounds for this circuit class. As a consequence of our result, all the known hardness for LTF circuits has now effectively been translated into pseudorandomness. This brings the extensive effort in the last decade to construct PRGs and deterministic circuit-analysis algorithms for this class to the point where any subsequent improvement would yield breakthrough lower bounds.

A key ingredient in our construction is a pseudorandom restriction procedure that has tiny failure probability, but simplifies the function to a non-natural "hybrid computational model" that combines decision trees and LTFs. As part of our proof we also construct an "extremely low-error" PRG for the class of functions computable by an arbitrary function of s linear threshold functions that can handle even the extreme setting of parameters s = n/polylog(n) and epsilon = exp(-n/polylog(n)).

Joint work with Pooya Hatami, Avishay Tal, and Roei Tell. ECCC: TR21-002

2020/12/02: Yang Liu, "Faster Algorithms for Unit Maximum Flow"

Yang Liu (Stanford)


The maximum flow problem is one of the most well-studied problems in combinatorial optimization, encompassing a broad range of cut, matching, and scheduling problems. Here we present a recent line of work obtaining provably faster algorithms for solving the maximum flow problem using interior point methods. In particular, we show how to solve the maximum flow problem in m-edge unit capacity graphs in time almost m^(4/3), improving over the breakthrough m^(10/7) time algorithm of Mądry.

This is based on joint work with Aaron Sidford (abs/1910.14276, abs/2003.08929).

2020/11/25: Sumegha Garg, "The Coin Problem with Applications to Data Streams"

Sumegha Garg (Harvard)


Consider the problem of computing the majority of a stream of n i.i.d. uniformly random bits. This problem, known as the coin problem, is central to a number of counting problems in different data stream models. We show that any streaming algorithm for solving this problem with large constant advantage (over the uniform distribution) must use Ω(log n) bits of space. Previously, it was known that computing the majority on every input with a constant probability takes Ω(log n) space. We extend our lower bound to proving tight lower bounds for solving multiple, randomly interleaved copies of the coin problem, as well as for solving the OR of multiple copies of a variant of the coin problem. Our proofs involve new measures of information complexity that are well-suited for data streams. We use these lower bounds to obtain a number of new results for data streams. In each case there is an underlying d-dimensional vector x with additive updates to its coordinates given in a stream of length m. The input streams arising from our coin lower bound have nice distributional properties, and consequently for many problems for which we only had lower bounds in general turnstile streams, we now obtain the same lower bounds in more natural models, such as the bounded deletion model, in which ||x||_2 never drops by a constant fraction of what it was earlier, or in the random order model, in which the updates are ordered randomly.

Based on joint work with Mark Braverman and David P. Woodruff.

2020/11/11: Shuai Shao, "A Dichotomy for Real Boolean Holant Problems"

Shuai Shao (UW Madison)


In this talk, we present a complexity dichotomy for Holant problems on the boolean domain with arbitrary sets of real-valued constraint functions. These constraint functions need not be symmetric nor do we assume any auxiliary functions as in previous results. It is proved that for every set F of real-valued constraint functions, Holant(F) is either P-time computable or #P​-hard. The classification has an explicit criterion. This is a culmination of much research on a decade-long classification program for Holant problems, and it uses previous results and techniques from many researchers. However, as it turned out, the journey to the present theorem has been arduous. Some particularly intriguing concrete functions f6, f8 and their associated families with extraordinary closure properties related to Bell states in quantum information theory play an important role in this proof.

Based on joint work with Jin-Yi Cai.

2020/11/04: Shalev Ben-David, "Forecasting Algorithms, Minimax Theorems, and Randomized Lower Bounds"

Shalev Ben-David (Waterloo)


Abstract: I will present a new approach to randomized lower bounds, particularly in the setting where we wish to give a fine-grained analysis of randomized algorithms that achieve small bias. The approach is as follows: instead of considering ordinary randomized algorithms which give an output in {0,1} and may err, we switch models to look at "forecasting" randomized algorithms which output a confidence in [0,1] for whether they think the answer is 1. When scored by a proper scoring rule, the performance of the best forecasting algorithm is closely related to the bias of the best (ordinary) randomized algorithm, but is more amenable to analysis. As an application, I'll present a new minimax theorem for randomized algorithms, which can be viewed as a strengthening of Yao's minimax theorem. Yao's minimax theorem guarantees the existence of a hard distribution for a function f such that solving f against this distribution (to a desired error level) is as hard as solving f in the worst case (to that same error level). However, the hard distribution provided by Yao's theorem depends on the chosen error level. Our minimax theorem removes this dependence, giving a distribution which certifies the hardness of f against all bias levels at once. In recent work, we used this minimax theorem to give a tight composition theorem for randomized query complexity.

Based on joint work with Eric Blais.

2020/10/28: Omar Montasser, "Adversarially Robust Learnability: Characterization and Reductions"

Omar Montasser (TTIC)


Abstract: We study the question of learning an adversarially robust predictor from uncorrupted samples. We show that any VC class H is robustly PAC learnable, but we also show that such learning must sometimes be improper (i.e. use predictors from outside the class), as some VC classes are not robustly properly learnable. In particular, the popular robust empirical risk minimization approach (also known as adversarial training), which is proper, cannot robustly learn all VC classes. After establishing learnability, we turn to ask whether having a tractable non-robust learning algorithm is sufficient for tractable robust learnability and give a reduction algorithm for robustly learning any hypothesis class H using a non-robust PAC learner for H, with nearly-optimal oracle complexity.
This is based on joint work with Steve Hanneke and Nati Srebro, available at https://arxiv.org/abs/1902.04217​ and https://arxiv.org/abs/2010.12039​.

2020/10/21: Aayush Jain, "Indistinguishability Obfuscation from Well-Founded Assumptions"

Aayush Jain (UCLA)


We present a construction of an indistinguishability obfuscation scheme, whose security rests on the subexponential hardness of four well-founded assumptions. We show the existence of an indistinguishability Obfuscation scheme for all circuits assuming sub-exponential security of the following assumptions: - The Learning with Errors (LWE) assumption with arbitrarily small subexponential modulus-to-noise ratio, - The SXDH assumption with respect to bilinear groups of prime order p, Existence of a Boolean Pseudorandom Generator (PRG) in \mathsf{NC}^0 with arbitrary polynomial stretch, that is, mapping n bits to n^{1+\tau} bits, for any positive constant \tau. - The Learning Parity with Noise (LPN) assumption over \mathbb{Z}_p with error-rate \ell^{-\delta}, where \ell is the dimension of the secret and \delta is an arbitrarily small positive constant. Further, assuming only polynomial security of these assumptions, there exists a compact public-key functional encryption scheme for all circuits. The main technical novelty is the introduction and construction of a cryptographic pseudorandom generator that we call a Structured-Seed PRG (sPRG), assuming LPN over \mathbb{Z}_p and PRGs in \mathsf{NC}^0. During the talk, I will discuss how structured seed PRGs have evolved from different notions of novel pseudorandom generators proposed in the past few years, and how an interplay between different areas of theoretical computer science played a major role in providing valuable insights leading to this work. Time permitting, I will go into the details of how to construct sPRGs.

Joint work with Huijia (Rachel) Lin and Amit Sahai

2020/10/14: Jayadev Acharya, "Distributed Statistical Inference under Local Information Constraints"

Jayadev Acharya (Cornell)


We consider statistical inference tasks in a distributed setting where access to data samples is subjected to strict “local constraints,” through a unified framework that captures communication limitations and (local) privacy constraints as special cases. We study estimation (learning) and goodness-of-fit (testing) for both discrete and high-dimensional distributions. Our goal is to understand how the sample complexity increases under the information constraints.

In this talk we will provide an overview of this field and a sample of some of our results. We will discuss the role of (public) randomness and interactivity in information-constrained inference, and make a case for thinking about randomness and interactivity as resources.

The work is part of a long-term ongoing collaboration with Clément Canonne (IBM Research) and Himanshu Tyagi (IISc), and includes works done with Cody Freitag (Cornell), Yanjun Han (Stanford), Yuhan Liu (Cornell), and Ziteng Sun (Cornell).

2020/10/07: Susanna F. de Rezende, "Lifting with Simple Gadgets and Applications to Circuit and Proof Complexity"

Susanna F. de Rezende (Czech Academy of Sciences)


Lifting theorems in complexity theory are a method of transferring lower bounds in a weak computational model into lower bounds for a more powerful computational model, via function composition. There has been an explosion of lifting theorems in the last ten years, essentially reducing communication lower bounds to query complexity lower bounds. These theorems only hold for composition with very specific "gadgets" such as indexing and inner product.

In this talk, we will present a generalization of the theorem lifting Nullstellensatz degree to monotone span program size by Pitassi and Robere (2018) so that it works for any gadget with high enough rank, in particular, for useful gadgets such as equality and greater-than. We will then explain how to apply our generalized theorem to solve three open problems:

  • We present the first result that demonstrates a separation in proof power for cutting planes with unbounded versus polynomially bounded coefficients. Specifically, we exhibit CNF formulas that can be refuted in quadratic length and constant line space in cutting planes with unbounded coefficients, but for which there are no refutations in subexponential length and subpolynomial line space if coefficients are restricted to be of polynomial magnitude.

  • We give the strongest separation to-date between monotone Boolean formulas and monotone Boolean circuits. Namely, we show that the classical GEN problem, which has polynomial-size monotone Boolean circuits, requires monotone Boolean formulas of size 2^{Omega(n / polylog n)}.

  • We give the first explicit separation between monotone Boolean formulas and monotone real formulas. Namely, we give an explicit family of functions that can be computed with monotone real formulas of nearly linear size but require monotone Boolean formulas of exponential size. Previously only a non-explicit separation was known.

This talk is based on joint work with Or Meir, Jakob Nordström, Toniann Pitassi, Robert Robere, and Marc Vinyals, available at https://arxiv.org/abs/2001.02144

2020/09/30: Alex Wein, "Low-degree Hardness of Random Optimization Problems"

Alex Wein (NYU)


In high-dimensional statistical problems (including planted clique, sparse PCA, community detection, etc.), the class of "low-degree polynomial algorithms" captures many leading algorithmic paradigms such as spectral methods, approximate message passing, and local algorithms on sparse graphs. As such, lower bounds against low-degree algorithms constitute concrete evidence for average-case hardness of statistical problems. This method has been widely successful at explaining and predicting statistical-to-computational gaps in these settings.

While prior work has understood the power of low-degree algorithms for problems with a "planted" signal, we consider here the setting of "random optimization problems" (with no planted signal), including the problem of finding a large independent set in a random graph, as well as the problem of optimizing the Hamiltonian of mean-field spin glass models. I will define low-degree algorithms in this setting, argue that they capture the best known algorithms, and explain new proof techniques for giving lower bounds against low-degree algorithms in this setting. The proof involves a variant of the so-called "overlap gap property", which is a structural property of the solution space.

Based on joint work with David Gamarnik and Aukosh Jagannath, available at: https://arxiv.org/abs/2004.12063

2020/09/23: Fotis Iliopoulos, "Stochastic Local Search and the Lovász Local Lemma"

Fotis Iliopoulos (Princeton and IAS)


The Lovász Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be used to establish the existence of objects that satisfy certain properties. The breakthrough of Moser and Tardos (who recently received the Gödel Prize for their work) and follow-up works revealed that the LLL has intimate connections with a class of stochastic local search algorithms for finding such desirable objects.

In this talk, I will survey this line of work through the perspective of recent unifying results, and also talk about recent applications to solving pseudo-random constraint satisfaction problems.

2020/09/17: Richard Peng, "Solving Sparse Linear Systems Faster than Matrix Multiplication"

Richard Peng (Georgia Tech)


Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an n-by-n linear system Ax=b is n^omega, where omega 2.372864 is the matrix multiplication exponent. Improving on this has been an open problem even for sparse linear systems with poly(n) condition number.

We present an algorithm that solves linear systems in sparse matrices asymptotically faster than matrix multiplication for any omega ≥ 2. This speedup holds for any input matrix A with o(n^omega-1/log(kappa(A))) non-zeros, where kappa(A) is the condition number of A. For poly(n)-conditioned matrices with O(n) nonzeros, and the current value of \omega, the bit complexity of our algorithm to solve to within any 1/poly(n) error is O(n^2.331645).

Our algorithm can be viewed as an efficient randomized implementation of the block Krylov method via recursive low displacement rank factorizations. It is inspired by the algorithm of [Eberly-Giesbrecht-Giorgi-Storjohann-Villard ISSAC `06 `07] for inverting matrices over finite fields. In our analysis of numerical stability, we develop matrix anti-concentration techniques to bound the smallest eigenvalue and the smallest gap in eigenvalues of semi-random matrices.

Joint work with Santosh Vempala, manuscript at https://arxiv.org/abs/2007.10254​.