The Bayesian View of SOS

Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

What is probability?

Better question: when is the mathematical theory of probability applicable?

“The probability of winning a battle” has no place in our theory because it does not belong to any collective. Probability cannot be applied to this problem any more than the physical concept of work can be applied to the “work” done by an actor reciting his part. Richard Von Mises, 1928 (paraphrased)

The theory of inverse probability is founded upon an error, and must be wholly rejected, Ronald Fisher, 1925

If anyone wishes to study frequencies he is free to do so and we wish him every success. But why does he insist on appropriating the word “probability”? E.T. Jaynes, 1978

I am unable to see why ‘objectivity’ requires us to interpret every probability as a frequency in some random experiment; particularly when in most problems probabilities are frequencies only in an imaginary universe invented just for the purpose of allowing a frequency interpretation., E.T. Jaynes, 1976

What is probability?

Bayesians: Probability of \(A\) is the odds I would give that \(A\) happened.

\(A\) can even be in the past, if I don’t know about it.

Frequentist: Probability of \(A\) is the fraction of times it would happen in independent experiments.

\(A\) is some future event, depends on our random choices.

If \(A\) already happened then \({\mathbb{P}}[A] \in \{0,1\}\).

Example: betting on horses

\(\theta\) - unknown inherent ability
\(x \sim p(X|\theta)\) - race outcome

Bayesian: Distribution \(\Theta\) over \(\theta\) encoding prior+observations, make bet \(b\) minimizing \(\color{green}{{\mathbb{E}}_{\theta \sim \Theta}} {\mathbb{E}}_{x \sim p(X|\theta)} loss_b(x)\).

Frequentist: \(\theta\) already happened. Make bet \(b\) that minimizes \(\color{red}{\max_\theta} {\mathbb{E}}_{x\sim p(x|\theta)} loss_b(x)\)

Bayesian could do better in single race, if prior accurate. (worst case over \(x\), average case over \(\theta\))

Frequentist calbirated: Minimize loss “long run” if \(\theta\) fixed over many races. (worst case over \(\theta\), average case over \(x\))

Frequentist might be incoherent: Bets could be “arbitraged” if don’t correspond to a model (Cox 1946).

Coherence

A set of estimates \(\{ {\tilde{\mathbb{E}}}f \}_f\) is coherent if \({\tilde{\mathbb{E}}}f \geq {\tilde{\mathbb{E}}}g\) whenever \(\forall_x f(x) \geq g(x)\).

You may contradict reality, but at least you don’t contradict yourself

Fundamental unpleasantness: Need to choose between coherence and efficiency

e.g., If \(\varphi\) is unsatisfiable then \(\varphi(x) \lt1\) for all \(x\), but efficient algs can’t know this.

Bayesian: Always coherent, sometimes at steep price.

SOS: pseudo coherent - “diving board principle”
If \(\vdash_d f \geq g\) then \({\tilde{\mathbb{E}}}f \geq {\tilde{\mathbb{E}}}g\) for all \(d\) pdist

The Bayes-Marley thesis

If a computationally bounded agent can’t tell there isn’t a real distribution, we could pretend that it is.

Algorithms: Pretend that we get moments of actual dist, and worry about SOS-ing later.

Lower bounds: Don’t construct pseudo-dist that don’t respect observable correlations.

Example: Reinterpreting ARV Algorithm

Input: Graph \(G\) s.t. \(\exists\) \(x\in {\{0,1\}}^n\) with \(\sum x_i = n/2\) and \(f_G(x) = \sum_{(i,j)\in E_G} (x_i-x_j)^2 \leq \varphi |E| \;\;(*)\)
Goal: Find balanced \(x'\) s.t. \(f_G(x') = O(\sqrt{\log n} \varphi)\).

Pretend have actual distribution over \(x\)’s satisfying \((*)\).

(in fact, assume \({\mathbb{E}}(x_i-x_j)^2 \leq \varphi\) for all \((i,j) \in E_G\))

Suppose uniform over support \(\{ x^1,\ldots,x^m \}\).

Rows of matrix \((x_1|\cdots|x^m)\) are \(n\) vectors \(y_1,\ldots, y_n\) in \({\{0,1\}}^m\).

Goal reduces to finding partition in subgraphs of noisy Boolean cube \(H=H_{m,\varphi}\): \(V_H = {\{0,1\}}^m\), \((y,y')\in E_H\) iff \(|y-y'| \leq \varphi m\).

Reduced partitioning arbitrary graph to isoperimetric question on cube.

The “inception” approach to algorithm design

Pretend that pseudo distributions are actual distributions.

Use every fact you can about actual distributions..

(including highly non trivial isoperimteric results)

.. to get from their moments an approximate solution.

Then wake up from the dream and hope for the best.