2021/12/8: Guy Blanc, "Properly learning decision trees in almost polynomial time"

Guy Blanc (Stanford)


We give an n^{O(loglog n)}-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over {-1,1}^n. Even in the realizable setting, the previous fastest runtime was n^{O(log n)}, a consequence of a classic algorithm of Ehrenfeucht and Haussler.

Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of O'Donnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be "pruned" so that every variable in the resulting tree is influential.

Joint work with Jane Lange, Mingda Qiao, and Li-Yang Tan. To appear in FOCS 2021

2021/12/1: William Kuszmaul, "Linear Probing Revisited: Tombstones Mark the Demise of Primary Clustering"

William Kuszmaul (MIT)


The linear-probing hash table is one of the oldest and most widely used data structures in computer science. However, linear probing also famously comes with a major drawback: as soon as the hash table reaches a high memory utilization, elements within the hash table begin to cluster together, causing insertions to become slow. This phenomenon, now known as "primary clustering", was first captured by Donald Knuth in 1963; at a load factor of $1 - 1/x$, the expected time per insertion becomes $\Theta(x^2)$, rather than the more desirable $\Theta(x)$.

We show that there is more to the story than the classic analysis would seem to suggest. It turns out that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions. If these design decisions are made correctly, then even a hash table that is continuously at a load factor $1 - \Theta(1/x)$ can achieve average insertion time $\tilde{O}(x)$. A key insight is that the tombstones left behind by deletions cause a surprisingly strong "anti-clustering" effect, and that when insertions and deletions are one-for-one, the anti-clustering effects of deletions actually overpower the clustering effects of insertions.

We also present a new variant of linear probing, which we call "graveyard hashing", that completely eliminates primary clustering on any sequence of operations. If, when an operation is performed, the current load factor is $1 - 1/x$ for some $x$, then the expected cost of the operation is $O(x)$. One corollary is that, in the external-memory model with a data block size of $B$, graveyard hashing offers the following remarkable guarantee: at any load factor $1-1/x$ satisfying $x = o(B)$, graveyard hashing achieves $1 + o(1)$ expected block transfers per operation. Past external-memory hash tables have only been able to offer a $1+o(1)$ guarantee when the block size $B$ is at least $\Omega(x^2)$.

Based on joint work with Michael A. Bender and Bradley C. Kuszmaul. To appear in FOCS 2021.

2021/11/10: Kuikui Liu, "Spectral Independence: A New Tool to Analyze Markov Chains"

Kuikui Liu (University of Washington)


Markov chain Monte Carlo is a widely used class of algorithms for sampling from high-dimensional probability distributions, both in theory and in practice. While simple to implement, analyzing the rate of convergence to stationarity, i.e. the "mixing time", remains a challenging problem in many settings. We introduce a new technique to bound mixing times called "spectral independence"", which says that certain pairwise correlation matrices all have bounded spectral norm. This surprisingly powerful technique originates in the emerging study of high-dimensional expanders, and has allowed us to "unify" nearly all existing approaches to approximate counting and sampling by building new connections with other areas, including statistical physics, geometry of polynomials, functional analysis, and more. Through these connections, several long-standing open problems have recently been answered, including counting bases of matroids and optimal mixing of the Glauber dynamics/Gibbs sampler up to the algorithmic phase transition threshold.

Based on several joint works with Dorna Abdolazimi, Nima Anari, Zongchen Chen, Shayan Oveis Gharan, Eric Vigoda, Cynthia Vinzant, and June Vuong.

2021/10/27: Shravas Rao, "Degree vs. Approximate Degree and Quantum Implications of Huang's Sensitivity Theorem"

Shravas Rao (Northwestern University)


Based on the recent breakthrough of Huang (2019), we show that for any total Boolean function f,

  • The degree of f is at most quadratic in the approximate degree of f. This is optimal as witnessed by the OR function.

  • The deterministic query complexity of f is at most quartic in the quantum query complexity of f. This matches the known separation (up to log factors) due to Ambainis, Balodis, Belovs, Lee, Santha, and Smotrovs (2017).

We apply these results to resolve the quantum analogue of the Aanderaa--Karp--Rosenberg conjecture. We show that if f is a nontrivial monotone graph property of an n-vertex graph specified by its adjacency matrix, then Q(f)=Ω(n), which is also optimal. We also show that the approximate degree of any read-once formula on n variables is Θ(√n).

Based on joint work with Scott Aaronson, Shalev Ben-David, Robin Kothari, and Avishay Tal.

2021/10/13: Nutan Limaye, "Superpolynomial Lower Bounds Against Low-Depth Algebraic Circuits"

Nutan Limaye (IT University of Copenhagen)


Every multivariate polynomial P(X) can be written as a sum of monomials, i.e. a sum of products of variables and field constants. In general, the size of such an expression is the number of monomials that have a non-zero coefficient in P.

What happens if we add another layer of complexity, and consider sums of products of sums (of variables and field constants) expressions? Now, it becomes unclear how to prove that a given polynomial P(X) does not have small expressions. In this result, we solve exactly this problem.

More precisely, we prove that certain explicit polynomials have no polynomial-sized "Sigma-Pi-Sigma" (sums of products of sums) representations. We can also show similar results for Sigma-Pi-Sigma-Pi, Sigma-Pi-Sigma-Pi-Sigma and so on for all "constant-depth" expressions.

The talk is based on a joint work of Nutan Limaye, Srikanth Srinivasan, and Sébastien Tavenas.

2021/09/29: Audra McMillan, "Hiding among the clones: a simple and nearly optimal analysis of privacy amplification by shuffling"

Audra McMillan (Apple)


Differential privacy (DP) is a model of privacy-preserving machine learning that has garnered significant interest in recent years due to its rigorous privacy guarantees. An algorithm differentially private if the output is stable under small changes in the input database. While DP has been adopted in a variety of applications, most applications of DP in industry actually satisfy a stronger notion called local differential privacy. In local differential privacy data subjects perturb their data before it reaches the data analyst. While this requires less trust, it comes a substantial cost to accuracy. Recent work of Erlingsson, Feldman, Mironov, Raghunathan, Talwar, and Thakurta [EFMRTT19] demonstrated that random shuffling amplifies differential privacy guarantees of locally randomized data. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously [BEMMRLRKTS17] and has led to significant interest in the shuffle model of privacy [CSUZZ19, EFMRTT19]. In this talk, we will discuss a new result on privacy amplification by shuffling, which achieves the asymptotically optimal dependence in the local privacy parameter. Our result is based on a new proof strategy which is simpler than previous approaches, and extends to a lightly weaker notion known as approximate differential privacy with nearly the same guarantees.

Based on joint work with Vitaly Feldman and Kunal Talwar.