Boaz Barak and David Steurer
UCSD winter school on Sum-of-Squares, January 2017
recall: sparsest cut value \(\displaystyle \min_{x\in {\{0,1\}}^n} \frac {\color{red}{f_G(x)}}{\color{green}{\frac d n {\lvert x \rvert} (n-{\lvert x \rvert})}}\)
claim: if sparsest cut value \(\ge \color{#d33682}{{\varepsilon}}\), then
\[ \vdash_2{\left\{ \color{red}{f_G(x)} \ge \color{#d33682}{{\varepsilon}^2/2} \cdot \color{green}{\frac d n {\lvert x \rvert}(n-{\lvert x \rvert})} \right\}} \]
\(\leadsto\) poly-time algorithm to approximate sparsest cut
later:
Laplacian matrix \(L_G = \frac 1 d \sum_{\{i,j\}\in E_G} {(e_i-e_j) {(e_i-e_j)}^\intercal}\)
eigenvalues \(0=\lambda_1\le \lambda_2 \le \cdots \le \lambda_n\le 2\)
Cheeger bound: \(\lambda_2 \le\) sparsest cut value \(\le \sqrt{2\lambda_2}\)
SOS captures this bound
claim: \(\vdash_2 \{ f_G \ge \lambda_2 \cdot \frac d n {\lvert x \rvert}(n-{\lvert x \rvert})\}\)
good approximation because \(\lambda_2\ge {\varepsilon}^2/2\) if sparsest cut \(\ge {\varepsilon}\)
idea: prove easy direction of Cheeger in SOS
proof: let \(L_K\) be projector to space orthogonal to
top eigenvector \((1,\ldots,1)\) of \(L_G\). then,\[ \begin{aligned} {\langle x,L_G x \rangle}& = \tfrac 1 d \cdot f_G(x) \\ {\langle x,L_K x \rangle}& = \tfrac 1 n \cdot {\lvert x \rvert}(n-{\lvert x \rvert}) \end{aligned} \]to show: \(\vdash_2 \{ {\langle x,(L_G - \lambda_2 \cdot L_K)x \rangle} \ge 0\}\)
follows from \(L_G-\lambda_2 \cdot L_K \succeq 0\)
useful way to reason about SOS
dual to SOS certificates
generalization of classical probability
uncertainty arises from complexity
(as opposed to lack of information)
formal expectation with respect to \(\mu{\colon}{\{0,1\}}^n\to{\mathbb{R}}\)
def’n: \(\mu{\colon}{\{0,1\}}^n\to{\mathbb{R}}\) is level-\(\ell\) pseudo-distribution if
- normalization \({\tilde{\mathbb{E}}}_\mu 1 = 1\)
- positivity \({\tilde{\mathbb{E}}}_\mu g^2 \ge 0\) whenever \(\deg g\le \ell/2\)
level-\(2n\) pseudo-distributions are pointwise nonnegative and thus actual distributions
th’m: optimize over level-\(\ell\) pseudo-distributions in time \(n^{O(\ell)}\)
[Parrilo’00, Lasserre’00]
idea: characterization in terms of positive semidefinite matrices
claim: \(\mu{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) with \({\tilde{\mathbb{E}}}_\mu 1=1\) is level-\(\ell\) pseudo-distr’n
iff following matrix is positive semidefinite\[ {\tilde{\mathbb{E}}}_{\mu(x)} {v_{\ell/2}(x) {v_{\ell/2}(x)}^\intercal} \succeq 0 \](where \(v_k(x)=(1,x)^{\otimes k}\) is the Veronese map)
more formally: set of moments \({\tilde{\mathbb{E}}}_{\mu(x)}v_\ell(x)\) has \(n^{O(\ell)}\)-time separation oracle
th’m: either \(\vdash_\ell \{f\ge 0\}\) or \(\exists\) level-\(\ell\) pseudo-distr’n \(\mu\) with
\[ {\tilde{\mathbb{E}}}_\mu f\lt 0 \]
most classical algorithms that use semidefinite prog’ing (SDP) are captured by deg-2 SOS
claim: \(\exists\) level-2 pseudo-distr’n with mean \(m\) and 2nd moment \(M\)
iff \({\mathop{\mathrm{diag}}}M = m\) and \(M-{m {m}^\intercal}\succeq 0\)
characterization useful for developing algorithms based on deg-2 SOS and showing limitations of deg-2 SOS
proof idea: consider linear system of equations in \(\mu\)
\[ {\left\{ {\tilde{\mathbb{E}}}_{\mu(x)} x=m,~ {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal}=M \right\}} \]satisfiable iff \({\mathop{\mathrm{diag}}}M =m\) (by linear indep’nce of multilinear monomials)for every linear polynomial \(g(x)={\langle a,x \rangle} + b\),
\[ \begin{aligned}[t] {\tilde{\mathbb{E}}}_\mu g^2 & = {\langle a,M a \rangle} + 2b{\langle a,m \rangle} + b^2\\ & \ge {\langle a,Ma \rangle} - 2{\langle a,m \rangle}^2\\ & = {\langle a,(M-{m {m}^\intercal})a \rangle} \ge 0 \end{aligned} \]
lemma: for every level-2 pseudo-distr’n \(\mu\) over \({\{0,1\}}^n\), there exist Gaussian vector \(X=(X_1,\ldots,X_n)\) with matching first two moments, so that
\[ {\tilde{\mathbb{E}}}_{\mu(x)} x = {\mathbb{E}}X \text{ and } {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal} = {\mathbb{E}}{X {X}^\intercal} \]
proof: let \(m={\tilde{\mathbb{E}}}_\mu x\) be mean of \(\mu\) and \(\Sigma={\tilde{\mathbb{E}}}_\mu {x {x}^\intercal} - {m {m}^\intercal}\) be covariance of \(\mu\). let \(g\) be standard \(n\)-dimensional Gaussian vector. choose Gaussian vector \(X\) as
\[ X = m + \Sigma^{1/2} g\,. \]
will see: deg-2 SOS approximation for Sparsest Cut no better than what follows from Cheeger’s bound
th’m: for every \({\varepsilon}\gt0\), exists graph \(G\) and level-2 pseudo-distr’n \(\mu\) such that \(G\) has sparsest cut value \(\ge {\varepsilon}\) but
will choose \(G\) to be \(n\)-vertex path for \(n=1/{\varepsilon}\) + self-loops for regul’ty
\(\leadsto\) sparsest cut \(S={\{ 1,\ldots,n/2 \}}\) (first half of path), value \(\approx 1/n\)
to construct: level-2 pseudo-distr’n that believes \(G\) has sparsest cut value \(O(1/n^2)\)
claim: \(\exists\) level-2 pseudo-distr’n \(\mu\) over \({\{0,1\}}^n\) with
\[ \textstyle {\tilde{\mathbb{E}}}_{\mu (x)} \color{red}{\sum_{i=1}^{n-1} (x_i-x_{i+1})^2} \le O{\left( \frac{1}{n^2} \right)} \cdot {\tilde{\mathbb{E}}}_{\mu(x)}\color{green}{\tfrac 1n \sum_{i\lt j} (x_i-x_j)^2} \]
intuition: choose covariance of pseudo-distr’n as projector into space of low eigenvalues of cycle
proof:
let \(\omega=e^{2\pi/n}\) be \(n\)-th root of unity. let \(u=(\omega^1,\ldots,\omega^n)\).
let \(v,w\) be real and imaginary part of \(v\), so that \(u=v+i\cdot w\).
let \(\mu\) be level-2 pseudo-distr’n such that
\({\tilde{\mathbb{E}}}_\mu \color{green}{\tfrac 1n \sum_{i\lt j}(x_i-x_{j})^2 } = \tfrac 14 \sum_{j=1}^n {\lvert 1-\omega^j \rvert}^2 \ge n\cdot \Omega(1)\)
\({\tilde{\mathbb{E}}}_\mu \color{red}{\sum_{i=1}^{n-1}(x_i-x_{i+1})^2} = (n-1) \cdot \tfrac 14 {\lvert 1-\omega \rvert}^2 \le n \cdot O(1/n^2)\)
🞏