sum-of-squares and sparsest cut

Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

Cheeger’s bound
and degree-2 sos

sparsest cut with deg-2 sos

recall: sparsest cut value \(\displaystyle \min_{x\in {\{0,1\}}^n} \frac {\color{red}{f_G(x)}}{\color{green}{\frac d n {\lvert x \rvert} (n-{\lvert x \rvert})}}\)

claim: if sparsest cut value \(\ge \color{#d33682}{{\varepsilon}}\), then
\[ \vdash_2{\left\{ \color{red}{f_G(x)} \ge \color{#d33682}{{\varepsilon}^2/2} \cdot \color{green}{\frac d n {\lvert x \rvert}(n-{\lvert x \rvert})} \right\}} \]

\(\leadsto\) poly-time algorithm to approximate sparsest cut

later:

this guarantee is tight for degree-2 sos
better guarantee with degree-4 sos

Cheeger bound as SOS certificate

Laplacian matrix \(L_G = \frac 1 d \sum_{\{i,j\}\in E_G} {(e_i-e_j) {(e_i-e_j)}^\intercal}\)

eigenvalues \(0=\lambda_1\le \lambda_2 \le \cdots \le \lambda_n\le 2\)

Cheeger bound: \(\lambda_2 \le\) sparsest cut value \(\le \sqrt{2\lambda_2}\)

SOS captures this bound

claim: \(\vdash_2 \{ f_G \ge \lambda_2 \cdot \frac d n {\lvert x \rvert}(n-{\lvert x \rvert})\}\)

good approximation because \(\lambda_2\ge {\varepsilon}^2/2\) if sparsest cut \(\ge {\varepsilon}\)

idea: prove easy direction of Cheeger in SOS

SOS proof of Cheeger

proof: let \(L_K\) be projector to space orthogonal to
top eigenvector \((1,\ldots,1)\) of \(L_G\). then,
\[ \begin{aligned} {\langle x,L_G x \rangle}& = \tfrac 1 d \cdot f_G(x) \\ {\langle x,L_K x \rangle}& = \tfrac 1 n \cdot {\lvert x \rvert}(n-{\lvert x \rvert}) \end{aligned} \]
to show: \(\vdash_2 \{ {\langle x,(L_G - \lambda_2 \cdot L_K)x \rangle} \ge 0\}\)
follows from \(L_G-\lambda_2 \cdot L_K \succeq 0\)

SOS via pseudo-probability

pseudo-probability

useful way to reason about SOS
dual to SOS certificates
generalization of classical probability
uncertainty arises from complexity
(as opposed to lack of information)

formalization

formal expectation with respect to \(\mu{\colon}{\{0,1\}}^n\to{\mathbb{R}}\)

\[ {\tilde{\mathbb{E}}}_{\mu} f = \sum_{x} \mu(x)\cdot f(x) \]

(values of \(f\) weighted by \(\mu\))

def’n: \(\mu{\colon}{\{0,1\}}^n\to{\mathbb{R}}\) is level-\(\ell\) pseudo-distribution if
normalization \({\tilde{\mathbb{E}}}_\mu 1 = 1\)
positivity \({\tilde{\mathbb{E}}}_\mu g^2 \ge 0\) whenever \(\deg g\le \ell/2\)

level-\(2n\) pseudo-distributions are pointwise nonnegative and thus actual distributions

efficient algorithm

th’m: optimize over level-\(\ell\) pseudo-distributions in time \(n^{O(\ell)}\)
[Parrilo’00, Lasserre’00]

idea: characterization in terms of positive semidefinite matrices

claim: \(\mu{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) with \({\tilde{\mathbb{E}}}_\mu 1=1\) is level-\(\ell\) pseudo-distr’n
iff following matrix is positive semidefinite
\[ {\tilde{\mathbb{E}}}_{\mu(x)} {v_{\ell/2}(x) {v_{\ell/2}(x)}^\intercal} \succeq 0 \]
(where \(v_k(x)=(1,x)^{\otimes k}\) is the Veronese map)

more formally: set of moments \({\tilde{\mathbb{E}}}_{\mu(x)}v_\ell(x)\) has \(n^{O(\ell)}\)-time separation oracle

duality of pseudo-distr’s and sos cert’s

th’m: either \(\vdash_\ell \{f\ge 0\}\) or \(\exists\) level-\(\ell\) pseudo-distr’n \(\mu\) with
\[ {\tilde{\mathbb{E}}}_\mu f\lt 0 \]

character’n of level 2 pseudo-distr’s

most classical algorithms that use semidefinite prog’ing (SDP) are captured by deg-2 SOS

claim: \(\exists\) level-2 pseudo-distr’n with mean \(m\) and 2nd moment \(M\)
iff \({\mathop{\mathrm{diag}}}M = m\) and \(M-{m {m}^\intercal}\succeq 0\)

characterization useful for developing algorithms based on deg-2 SOS and showing limitations of deg-2 SOS

proof idea: consider linear system of equations in \(\mu\)
\[ {\left\{ {\tilde{\mathbb{E}}}_{\mu(x)} x=m,~ {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal}=M \right\}} \]
satisfiable iff \({\mathop{\mathrm{diag}}}M =m\) (by linear indep’nce of multilinear monomials)
for every linear polynomial \(g(x)={\langle a,x \rangle} + b\),
\[ \begin{aligned}[t] {\tilde{\mathbb{E}}}_\mu g^2 & = {\langle a,M a \rangle} + 2b{\langle a,m \rangle} + b^2\\ & \ge {\langle a,Ma \rangle} - 2{\langle a,m \rangle}^2\\ & = {\langle a,(M-{m {m}^\intercal})a \rangle} \ge 0 \end{aligned} \]

quadratic sampling

lemma: for every level-2 pseudo-distr’n \(\mu\) over \({\{0,1\}}^n\), there exist Gaussian vector \(X=(X_1,\ldots,X_n)\) with matching first two moments, so that
\[ {\tilde{\mathbb{E}}}_{\mu(x)} x = {\mathbb{E}}X \text{ and } {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal} = {\mathbb{E}}{X {X}^\intercal} \]

proof: let \(m={\tilde{\mathbb{E}}}_\mu x\) be mean of \(\mu\) and \(\Sigma={\tilde{\mathbb{E}}}_\mu {x {x}^\intercal} - {m {m}^\intercal}\) be covariance of \(\mu\). let \(g\) be standard \(n\)-dimensional Gaussian vector. choose Gaussian vector \(X\) as
\[ X = m + \Sigma^{1/2} g\,. \]

SOS limitation
at degree 2
for sparsest cut

Cheeger bound is tight (even for deg-2 SOS)

will see: deg-2 SOS approximation for Sparsest Cut no better than what follows from Cheeger’s bound

th’m: for every \({\varepsilon}\gt0\), exists graph \(G\) and level-2 pseudo-distr’n \(\mu\) such that \(G\) has sparsest cut value \(\ge {\varepsilon}\) but

\[ {\tilde{\mathbb{E}}}\underbrace{\color{red}{f_G}}_{\color{red}{\text{sparsity numerator}}} \le O({\varepsilon}^2)\cdot {\tilde{\mathbb{E}}}_{\mu(x)} \underbrace{\color{green}{\tfrac d n{\lvert x \rvert}(n-{\lvert x \rvert})}}_{\color{green}{\text{sparsity denominator}}} \]

will choose \(G\) to be \(n\)-vertex path for \(n=1/{\varepsilon}\) + self-loops for regul’ty

\(\leadsto\) sparsest cut \(S={\{ 1,\ldots,n/2 \}}\) (first half of path), value \(\approx 1/n\)

to construct: level-2 pseudo-distr’n that believes \(G\) has sparsest cut value \(O(1/n^2)\)

claim: \(\exists\) level-2 pseudo-distr’n \(\mu\) over \({\{0,1\}}^n\) with
\[ \textstyle {\tilde{\mathbb{E}}}_{\mu (x)} \color{red}{\sum_{i=1}^{n-1} (x_i-x_{i+1})^2} \le O{\left( \frac{1}{n^2} \right)} \cdot {\tilde{\mathbb{E}}}_{\mu(x)}\color{green}{\tfrac 1n \sum_{i\lt j} (x_i-x_j)^2} \]

intuition: choose covariance of pseudo-distr’n as projector into space of low eigenvalues of cycle

proof: let \(\omega=e^{2\pi/n}\) be \(n\)-th root of unity. let \(u=(\omega^1,\ldots,\omega^n)\).
let \(v,w\) be real and imaginary part of \(v\), so that \(u=v+i\cdot w\).
let \(\mu\) be level-2 pseudo-distr’n such that

\[ {\tilde{\mathbb{E}}}_{\mu(x)} x = \tfrac 12 \cdot {\mathbf 1}\text{ and } {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal} = \tfrac 14 \cdot {\left( {{\mathbf 1}{{\mathbf 1}}^\intercal} + {v {v}^\intercal} + {w {w}^\intercal} \right)} \]

(using character’n of level-2 pseudo-distr’s)
then,

\({\tilde{\mathbb{E}}}_\mu \color{green}{\tfrac 1n \sum_{i\lt j}(x_i-x_{j})^2 } = \tfrac 14 \sum_{j=1}^n {\lvert 1-\omega^j \rvert}^2 \ge n\cdot \Omega(1)\)

\({\tilde{\mathbb{E}}}_\mu \color{red}{\sum_{i=1}^{n-1}(x_i-x_{i+1})^2} = (n-1) \cdot \tfrac 14 {\lvert 1-\omega \rvert}^2 \le n \cdot O(1/n^2)\)

🞏