Home | People | Events | Links | Positions

Mathematics of Information
Seminar    (CS286)

4:00-5:00 pm in Annenberg 314
(except where otherwise noted)

October 2, 2018
Ben Lee Volk
Algebraic Complexity Theory: Lower bounds and Derandomization (1st of 2 parts)
    Algebraic Complexity theory is a subfield of complexity theory, studying computational problems of algebraic flavor such as multiplying matrices, computing the determinant, and more generally, computing polynomials over a field, where the complexity is measured by the number of algebraic operations needed to compute it.
    Two basic open problems in the area are (1) proving superpolynomial circuit lower bounds for explicit polynomials (which is the algebraic analog to the P vs. NP problem), and (2) designing deterministic efficient algorithms for the Polynomial Identity Testing problem, that asks to determine, given a circuit computing a polynomial, whether it computes the zero polynomial. Studying these questions often leads to beautiful mathematical questions of combinatorial, algebraic and geometric nature.
    In this talk, we will discuss several old and new results in the area, explore the mysterious, bidirectional and intricate connections between the two problems above, and give some open problems.

October 9, 2018
Ben Lee Volk
Algebraic Complexity Theory: Lower bounds and Derandomization (2nd of 2 parts)

October 16, 2018
Alessandro Zocca
Hard-core interactions on graphs
    In this talk I will introduce the hard-core model with Metropolis transition probabilities on finite graphs and its multi-state generalization. Motivated by the study of random-access networks performance, I will focus on the low-temperature regime and study the asymptotic behavior of these interacting particle systems by looking at first hitting times between stable states and mixing times. In particular, I will show how the order-of-magnitude of these hitting and mixing times depends on the underlying graph structure and derive precise asymptotics for various regular lattices. These results have been obtained extending the so-called "pathwise approach" developed in the statistical physics literature to study metastability phenomena and yielded a rigorous understanding of the root cause for poor delay performance of random-access networks.

October 23, 2018
Alessandro Achille (UCLA)
Complexity, Noise, and Emergent Properties of Learning Deep Representations
    I will show that Information Theoretic quantities control and describe a large part of the training process of Deep Neural Networks, and can be used to explain how properties, such as invariance to nuisance variability and disentanglement of semantic factors, emerge naturally in the learned representation. The resulting theory has connections with several fields ranging from algorithmic complexity to variational inference. This framework not only predicts the asymptotic behavior of deep networks, but also shows that the initial learning transient has a large irreversible effect on the outcome of the training, which gives rise to critical learning periods akin to those observed in biological systems. This urges us to study the complex, and so far neglected, initial phase of learning.

October 30, 2018
Omer Tamuz
Combining and Comparing Experiments
    A classical statistical model of hypothesis testing is the Blackwell Le Cam experiment, in which an observer makes inferences regarding the veracity of an hypothesis by observing a random outcome whose distribution depends on whether or not the hypothesis holds. We address a number of natural, classical questions: when does one experiment have more information than another? How can we quantify the amount of information an experiment holds? Our results include an answer to an old question of Blackwell, as well as a novel axiomatization of Kullback-Leibler divergence.
    Joint with Xiaosheng Mu, Luciano Pomatto and Philipp Strack

November 27, 2018
Rasmus Kyng (Harvard)
Approximate Gaussian Elimination for Laplacians
    We show how to perform sparse approximate Gaussian elimination for Laplacian matrices. We present a simple, nearly linear time algorithm that approximates a Laplacian matrix by a sparse LU factorization. We compute this factorization by subsampling standard Gaussian elimination. This gives the simplest known nearly linear time solver for Laplacian equations. The crux of our proof is the use of matrix martingales to analyze the algorithm. Finally, we will take a look at ongoing efforts to implement robust and fast Laplacian solvers based on these ideas.
    Based on joint work with Sushant Sachdeva and Daniel Spielman.

December 4, 2018
Jan Hazla (MIT)
Reasoning in Bayesian Opinion Exchange Networks is Computationally Hard
    Bayesian models of opinion exchange are extensively studied in economics, dating back to the work of Aumann on the agreement theorem. An important class of such models features agents arranged on a network (representing, e.g., social interactions), with the network structure determining which agents communicate with each other. It is often argued that the Bayesian computations needed by agents in such models are difficult, but prior to our work there were no rigorous arguments for such hardness.
    We consider a well-studied model where fully rational agents receive private signals indicative of an unknown state of the world. Then, they repeatedly announce the state of the world they consider most likely to their neighbors, at the same time updating their beliefs based on their neighbors' announcements.
    I will discuss our complexity-theoretic results establishing hardness of agents' computations in this model. Specifically, we show that these computations are NP-hard and extend this result to PSPACE-hardness. We show hardness not only for exact computations, but also that it is computationally difficult even to approximate the rational opinion in any meaningful way.
    Joint work with Ali Jadbababie, Elchanan Mossel and Amin Rahimian.

January 22, 2019
Gautam Kamath
Privately Learning High-Dimensional Distributions
    We present novel, computationally efficient, and differentially private algorithms for two fundamental high-dimensional learning problems: learning a multivariate Gaussian in R^d and learning a product distribution in {0,1}^d in total variation distance. The sample complexity of our algorithms nearly matches the sample complexity of the optimal non-private learners for these tasks in a wide range of parameters. Thus, our results show that private comes essentially for free for these problems, providing a counterpoint to the many negative results showing that privacy is often costly in high dimensions. Our algorithms introduce a novel technical approach to reducing the sensitivity of the estimation procedure that we call recursive private preconditioning, which may find additional applications.
    Based on joint work with Jerry Li, Vikrant Singhal, and Jonathan Ullman.

February 26, 2019
    CMI Special Lunch Seminar
            ★ 12 noon ★
            ★ Room 213 ★

Brent Waters (UT Austin)
Cryptographic Code Obfuscation
    A cryptographic code obfuscator takes as input the description of a program P and outputs a program P' that is functionally equivalent to the original, but should hide the inner workings of the original program to the maximal extent possible. Until very recently no general purpose obfuscators existed that were based on cryptographically hard problems; however, in 2013 researchers proposed the first candidate obfuscator for "indistinguishability obfuscation". Since then there has been a tremendous interest in the subject from the cryptography community.
    In this talk I will first introduce the concept and define indistinguishability obfuscation. Then I will show techniques for building cryptographic applications from it. Finally, I will conclude with discussing the challenging open problems in the area.

February 26, 2019
Miles Lopes (UC Davis)
Two New Bootstrap Methods for High-Dimensional and Large-Scale Data
    Bootstrap methods are among the most broadly applicable tools for statistical inference and uncertainty quantification. Although these methods have an extensive literature, much remains to be understood about their applicability in modern settings, where observations are high-dimensional, or where the quantity of data outstrips computational resources. In this talk, I will present a couple of new bootstrap methods that are tailored to these settings. First, I will discuss the topic of "spectral statistics" arising from high-dimensional sample covariance matrices, and describe a method for approximating the distributions of such statistics. Second, in the context of large-scale data, I will discuss a more unconventional application of the bootstrap -- dealing with the tradeoff between accuracy and computational cost for randomized numerical linear algebra. This will include joint work from a paper with Alexander Aue and Andrew Blandino; https://arxiv.org/abs/1709.08251 (to appear at Biometrika), and a paper with Michael Mahoney, and Shusen Wang; https://arxiv.org/abs/1708.01945 (to appear at JMLR).

March 5, 2019
★ NOTE: will be held in
Hall Auditorium (135 Gates Thomas) ★
Gitta Kutyniok (TU Berlin)
Approximation Theory meets Deep Learning
    Despite the outstanding success of deep neural networks in real-world applications, most of the related research is empirically driven and a mathematical foundation is almost completely missing. One central task of a neural network is to approximate a function, which for instance encodes a classification task. In this talk, we will be concerned with the question, how well a function can be approximated by a neural network with sparse connectivity. Using methods from approximation theory and applied harmonic analysis, we will derive a fundamental lower bound on the sparsity of a neural network. By explicitly constructing neural networks based on certain representation systems, so-called $\alpha$-shearlets, we will then demonstrate that this lower bound can in fact be attained. Finally, we present numerical experiments, which surprisingly show that already the standard backpropagation algorithm generates deep neural networks obeying those optimal approximation rates.

March 12, 2019
Nisheeth Vishnoi (Yale)
Towards Controlling Bias in AI Systems
    Powerful AI systems, which are driven by machine learning tools, are increasingly controlling various aspects of modern society: from social interactions (e.g., Facebook, Twitter, Google, YouTube), economics (e.g., Uber, Airbnb, Banking), learning (e.g., Wikipedia, MOOCs), to governance (Judgements, Policing, Voting). These systems have a tremendous potential to change our lives for the better, but, via the ability to mimic and nudge human behavior, they also have the potential to be discriminatory, reinforce societal prejudices, and polarize opinions. Indeed, recent studies have demonstrated that these systems can be quite brittle and generally lack the required qualities to be deployed in various human-centric/societal contexts. The reason being that considerations such as fairness, explainability, accountability etc. have largely been an afterthought in the development of such AI systems.
    In this talk, I will outline our efforts towards incorporating some of the above-mentioned issues in a principled manner for core machine learning tasks such as classification, data summarization, ranking, personalization, and online advertisement. Our work leads to new algorithms that have the ability to control and alleviate bias from their outputs, comes with provable guarantees, and often has low "price of fairness".
    Based on several joint works with Elisa Celis.

May 7, 2019
         ★ Room 213 ★
Alistair Sinclair (UC Berkeley)
The Lee-Yang Theorem and the Ising Model
    The celebrated Lee-Yang Theorem of the 1950s says that the zeros of the partition function of the ferromagnetic Ising model (viewed as a polynomial in the field parameter) lie on the unit circle in the complex plane. In this talk I will discuss a recent revival of interest in this result, inspired by computational questions. I will discuss three developments. First, I will explain how a generalization of the Lee-Yang theorem to rule out repeated zeros leads to hardness results for computing averages of observables associated with the Ising model. Then I will show how to combine the theorem with recent technology of Barvinok and others to obtain the first polynomial time deterministic approximation algorithms for the partition function. And finally I will discuss the relationship of the theorem to decay of correlations. The talk will be self-contained.
    This is joint work with Jingcheng Liu and Piyush Srivastava.

May 14, 2019
         ★ Room 213 ★
Adam Smith (Boston University)
The Structure of Optimal Private Tests for Simple Hypotheses
    Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about testing simple hypotheses subject to a stability, or "privacy", constraint: given two distributions P and Q, and a privacy level e, how many i.i.d. samples are needed to distinguish P from Q such that changing one of the samples changes the test's acceptance probability by at most e. What sort of tests have optimal sample complexity? We characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level e, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. Our result is an analogue of the classical Neyman-Pearson lemma in the setting of private hypothesis testing. Our analysis also sheds light on classical hypothesis testing, giving a new interpretation of the Hellinger distance between distributions.
    Joint work with Clément Canonne, Gautam Kamath, Audra McMillan, and Jonathan Ullman. To appear as STOC 2019. Available as https://arxiv.org/abs/1811.11148

May 21, 2019
         ★ Room 213 ★
Vijay Vazirani (UCI)
Matching is as Easy as the Decision Problem, in the NC Model
    Is matching in NC, i.e., is there a deterministic fast parallel algorithm for it? This has been an outstanding open question in TCS for over three decades, ever since the discovery of Random NC matching algorithms. Over the last five years, the TCS community has launched a relentless attack on this question, leading to the discovery of numerous powerful ideas. We give what appears to be the culmination of this line of work:  An NC algorithm for finding a minimum weight perfect matching in a general graph with polynomially bounded edge weights, provided it is given an oracle for the decision problem. Consequently, for settling the main open problem, it suffices to obtain an NC algorithm for the decision problem.
    We believe this new fact has qualitatively changed the nature of this open problem.
    Our result builds on the work of Anari and Vazirani (2018), which used planarity of the input graph critically; in fact, in three different ways. Our main challenge was to adapt these steps to general graphs by appropriately trading planarity with the use of the decision oracle. The latter was made possible by the use of several of the idea discovered over the last five years.
    The difficulty of obtaining an NC perfect matching algorithm led researchers to study matching vis-a-vis clever relaxations of the class NC. In this vein, Goldwasser and Grossman (2015) gave a pseudo-deterministic RNC algorithm for finding a perfect matching in a bipartite graph, i.e., an RNC algorithm with the additional requirement that on the same graph, it should return the same (i.e., unique) perfect matching for almost all choices of random bits. A corollary of our reduction is an analogous algorithm for general graphs.
    This talk is fully self-contained.
    Based on the following joint paper with Nima Anari: https://arxiv.org/pdf/1901.10387.pdf

CMI Faculty Lunches

by invitation

November 8, 2018
Venkat Chandrasekaran
Newton Polytopes and Relative Entropy Optimization
    We discuss how relative entropy is uniquely suited to optimization problems involving sparse polynomials. Our results connect to the work of Descartes (the rule of signs) and Khovanskii (the theory of fewnomials). The Newton polytope associated to a polynomial plays a central role in our development. This is joint work with Riley Murray and Adam Wierman.

December 13, 2018
Omer Tamuz
Equitable Voting Rules
    Joint with Laurent Bartholdi, Wade Hann-Caruthers, Maya Josyula and Leeat Yariv.
    Given n voters and two candidates A and B, a voting rule is a map from {A,B,abstain}^n to {A,B,tie}. An important example is majority, which has a desirable symmetry property: permuting the voters leaves the result unchanged. We study a weakening of this symmetry assumption, which allows for a far richer set of rules that still treat voters equally. We show that these rules can have very small---but not too small---winning coalitions (i.e., a set of voters that can control the election outcome).
    I will discuss the relation to various known results and open problems in the theory of finite groups. No previous knowledge required.

March 14, 2019
Peter Schröder
Shape from Metric
    Joint work with Albert Chern, Felix Knöppel and Ulrich Pinkall (all of TU Berlin Math)
    We study the isometric immersion problem for orientable surface triangle meshes endowed with only a metric: given the combinatorics of the mesh together with edge lengths, approximate an isometric immersion into R^3.
    To address this challenge we develop a discrete theory for surface immersions into R^3. It precisely characterizes a discrete immersion, up to subdivision and small perturbations. In particular our discrete theory correctly represents the topology of the space of immersions, i.e., the regular homotopy classes which represent its connected components. Our approach relies on unit quaternions to represent triangle orientations and to encode, in their parallel transport, the topology of the immersion. In unison with this theory we develop a computational apparatus based on a variational principle. Minimizing a non-linear Dirichlet energy optimally finds extrinsic geometry for the given intrinsic geometry and ensures low metric approximation error.
    We demonstrate our algorithm with a number of applications from mathematical visualization and art directed isometric shape deformation, which mimics the behavior of thin materials with high membrane stiffness.

May 9, 2019
Adam Wierman
Transparency and Control in Platforms & Networked Markets
    Platforms have emerged as a powerful economic force, driving both traditional markets, like the electricity market, and emerging markets, like the sharing economy. The power of platforms comes from their ability to tame the complexities of networked marketplaces -- marketplaces where there is not a single centralized market, but instead a network of interconnected markets loosely defined by a graph of feasible exchanges. Despite the power and prominence of platforms, the workings of platforms are often guarded secrets. Further, many competing platforms make very different design choices, but little is understood about the impact of these differing choices. In this talk, I will overview recent work from our group that focuses on reverse engineering the design of platforms and understanding the consequences of design choices underlying transparency in modern platforms. I will use electricity markets and ridesharing services as motivating examples throughout.

June 13, 2019
Babak Hassibi
Deep Learning and the Blessing of Dimensionality
    Stochastic descent methods have recently gained tremendous popularity as the workhorse for deep learning. So much so that, it is now widely recognized that the success of deep networks is not only due to their special deep architecture, but also due to the behavior of the stochastic descent methods used, which plays a key role in reaching "good" solutions that generalize well to unseen data. In an attempt to shed some light on why this is the case, we revisit some minimax properties of stochastic gradient descent (SGD)--originally developed for quadratic loss and linear models in the context of H-infinity control in the 1990's--and extend them to general stochastic mirror descent (SMD) algorithms for general loss functions and nonlinear models. These minimax properties can be used to explain the convergence and implicit-regularization of stochastic descent methods in highly over-parametrized settings, exemplified by training a deep neural network. This observation gives some insight into why deep networks exhibit such powerful generalization abilities. It is also a further example of what is increasingly referred to as the "blessing of dimensionality". We also show how different variations of the algorithms can lead to different generalization performances and note some very counter-intuitive phenomena.