sum-of-squares and sparsest cut

Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

Cheeger’s bound
and degree-2 sos

sparsest cut with deg-2 sos

recall: sparsest cut value \(\displaystyle \min_{x\in {\{0,1\}}^n} \frac {\color{red}{f_G(x)}}{\color{green}{\frac d n {\lvert x \rvert} (n-{\lvert x \rvert})}}\)

claim: if sparsest cut value \(\ge \color{#d33682}{{\varepsilon}}\), then

\[ \vdash_2{\left\{ \color{red}{f_G(x)} \ge \color{#d33682}{{\varepsilon}^2/2} \cdot \color{green}{\frac d n {\lvert x \rvert}(n-{\lvert x \rvert})} \right\}} \]

\(\leadsto\) poly-time algorithm to approximate sparsest cut

later:

  • this guarantee is tight for degree-2 sos
  • better guarantee with degree-4 sos

Cheeger bound as SOS certificate

Laplacian matrix \(L_G = \frac 1 d \sum_{\{i,j\}\in E_G} {(e_i-e_j) {(e_i-e_j)}^\intercal}\)

eigenvalues \(0=\lambda_1\le \lambda_2 \le \cdots \le \lambda_n\le 2\)

Cheeger bound: \(\lambda_2 \le\) sparsest cut value \(\le \sqrt{2\lambda_2}\)

SOS captures this bound

claim: \(\vdash_2 \{ f_G \ge \lambda_2 \cdot \frac d n {\lvert x \rvert}(n-{\lvert x \rvert})\}\)

good approximation because \(\lambda_2\ge {\varepsilon}^2/2\) if sparsest cut \(\ge {\varepsilon}\)

idea: prove easy direction of Cheeger in SOS

SOS proof of Cheeger

proof: let \(L_K\) be projector to space orthogonal to
top eigenvector \((1,\ldots,1)\) of \(L_G\). then,

\[ \begin{aligned} {\langle x,L_G x \rangle}& = \tfrac 1 d \cdot f_G(x) \\ {\langle x,L_K x \rangle}& = \tfrac 1 n \cdot {\lvert x \rvert}(n-{\lvert x \rvert}) \end{aligned} \]

to show: \(\vdash_2 \{ {\langle x,(L_G - \lambda_2 \cdot L_K)x \rangle} \ge 0\}\)

follows from \(L_G-\lambda_2 \cdot L_K \succeq 0\)

SOS via pseudo-probability

pseudo-probability

  • useful way to reason about SOS

  • dual to SOS certificates

  • generalization of classical probability

  • uncertainty arises from complexity
    (as opposed to lack of information)

formalization

formal expectation with respect to \(\mu{\colon}{\{0,1\}}^n\to{\mathbb{R}}\)

\[ {\tilde{\mathbb{E}}}_{\mu} f = \sum_{x} \mu(x)\cdot f(x) \]
(values of \(f\) weighted by \(\mu\))

def’n: \(\mu{\colon}{\{0,1\}}^n\to{\mathbb{R}}\) is level-\(\ell\) pseudo-distribution if

  • normalization \({\tilde{\mathbb{E}}}_\mu 1 = 1\)
  • positivity \({\tilde{\mathbb{E}}}_\mu g^2 \ge 0\) whenever \(\deg g\le \ell/2\)

level-\(2n\) pseudo-distributions are pointwise nonnegative and thus actual distributions

efficient algorithm

th’m: optimize over level-\(\ell\) pseudo-distributions in time \(n^{O(\ell)}\)
 [Parrilo’00, Lasserre’00]

idea: characterization in terms of positive semidefinite matrices

claim: \(\mu{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) with \({\tilde{\mathbb{E}}}_\mu 1=1\) is level-\(\ell\) pseudo-distr’n
iff following matrix is positive semidefinite

\[ {\tilde{\mathbb{E}}}_{\mu(x)} {v_{\ell/2}(x) {v_{\ell/2}(x)}^\intercal} \succeq 0 \]
(where \(v_k(x)=(1,x)^{\otimes k}\) is the Veronese map)

more formally: set of moments \({\tilde{\mathbb{E}}}_{\mu(x)}v_\ell(x)\) has \(n^{O(\ell)}\)-time separation oracle

duality of pseudo-distr’s and sos cert’s

th’m: either \(\vdash_\ell \{f\ge 0\}\) or \(\exists\) level-\(\ell\) pseudo-distr’n \(\mu\) with

\[ {\tilde{\mathbb{E}}}_\mu f\lt 0 \]

character’n of level 2 pseudo-distr’s

most classical algorithms that use semidefinite prog’ing (SDP) are captured by deg-2 SOS

claim: \(\exists\) level-2 pseudo-distr’n with mean \(m\) and 2nd moment \(M\)
iff \({\mathop{\mathrm{diag}}}M = m\) and \(M-{m {m}^\intercal}\succeq 0\)

characterization useful for developing algorithms based on deg-2 SOS and showing limitations of deg-2 SOS

proof idea: consider linear system of equations in \(\mu\)

\[ {\left\{ {\tilde{\mathbb{E}}}_{\mu(x)} x=m,~ {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal}=M \right\}} \]
satisfiable iff \({\mathop{\mathrm{diag}}}M =m\) (by linear indep’nce of multilinear monomials)

for every linear polynomial \(g(x)={\langle a,x \rangle} + b\),

\[ \begin{aligned}[t] {\tilde{\mathbb{E}}}_\mu g^2 & = {\langle a,M a \rangle} + 2b{\langle a,m \rangle} + b^2\\ & \ge {\langle a,Ma \rangle} - 2{\langle a,m \rangle}^2\\ & = {\langle a,(M-{m {m}^\intercal})a \rangle} \ge 0 \end{aligned} \]

quadratic sampling

lemma: for every level-2 pseudo-distr’n \(\mu\) over \({\{0,1\}}^n\), there exist Gaussian vector \(X=(X_1,\ldots,X_n)\) with matching first two moments, so that

\[ {\tilde{\mathbb{E}}}_{\mu(x)} x = {\mathbb{E}}X \text{ and } {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal} = {\mathbb{E}}{X {X}^\intercal} \]

proof: let \(m={\tilde{\mathbb{E}}}_\mu x\) be mean of \(\mu\) and \(\Sigma={\tilde{\mathbb{E}}}_\mu {x {x}^\intercal} - {m {m}^\intercal}\) be covariance of \(\mu\). let \(g\) be standard \(n\)-dimensional Gaussian vector. choose Gaussian vector \(X\) as

\[ X = m + \Sigma^{1/2} g\,. \]

SOS limitation
at degree 2
for sparsest cut

Cheeger bound is tight (even for deg-2 SOS)

will see: deg-2 SOS approximation for Sparsest Cut no better than what follows from Cheeger’s bound

th’m: for every \({\varepsilon}\gt0\), exists graph \(G\) and level-2 pseudo-distr’n \(\mu\) such that \(G\) has sparsest cut value \(\ge {\varepsilon}\) but

\[ {\tilde{\mathbb{E}}}\underbrace{\color{red}{f_G}}_{\color{red}{\text{sparsity numerator}}} \le O({\varepsilon}^2)\cdot {\tilde{\mathbb{E}}}_{\mu(x)} \underbrace{\color{green}{\tfrac d n{\lvert x \rvert}(n-{\lvert x \rvert})}}_{\color{green}{\text{sparsity denominator}}} \]

will choose \(G\) to be \(n\)-vertex path for \(n=1/{\varepsilon}\) + self-loops for regul’ty

\(\leadsto\) sparsest cut \(S={\{ 1,\ldots,n/2 \}}\) (first half of path), value \(\approx 1/n\)

to construct: level-2 pseudo-distr’n that believes \(G\) has sparsest cut value \(O(1/n^2)\)

claim: \(\exists\) level-2 pseudo-distr’n \(\mu\) over \({\{0,1\}}^n\) with

\[ \textstyle {\tilde{\mathbb{E}}}_{\mu (x)} \color{red}{\sum_{i=1}^{n-1} (x_i-x_{i+1})^2} \le O{\left( \frac{1}{n^2} \right)} \cdot {\tilde{\mathbb{E}}}_{\mu(x)}\color{green}{\tfrac 1n \sum_{i\lt j} (x_i-x_j)^2} \]

intuition: choose covariance of pseudo-distr’n as projector into space of low eigenvalues of cycle

proof: let \(\omega=e^{2\pi/n}\) be \(n\)-th root of unity. let \(u=(\omega^1,\ldots,\omega^n)\).
let \(v,w\) be real and imaginary part of \(v\), so that \(u=v+i\cdot w\).
let \(\mu\) be level-2 pseudo-distr’n such that

\[ {\tilde{\mathbb{E}}}_{\mu(x)} x = \tfrac 12 \cdot {\mathbf 1}\text{ and } {\tilde{\mathbb{E}}}_{\mu(x)} {x {x}^\intercal} = \tfrac 14 \cdot {\left( {{\mathbf 1}{{\mathbf 1}}^\intercal} + {v {v}^\intercal} + {w {w}^\intercal} \right)} \]
(using character’n of level-2 pseudo-distr’s)
then,

\({\tilde{\mathbb{E}}}_\mu \color{green}{\tfrac 1n \sum_{i\lt j}(x_i-x_{j})^2 } = \tfrac 14 \sum_{j=1}^n {\lvert 1-\omega^j \rvert}^2 \ge n\cdot \Omega(1)\)

\({\tilde{\mathbb{E}}}_\mu \color{red}{\sum_{i=1}^{n-1}(x_i-x_{i+1})^2} = (n-1) \cdot \tfrac 14 {\lvert 1-\omega \rvert}^2 \le n \cdot O(1/n^2)\)

🞏

Opps, you cannot play draw N guess with this browser!