sum-of-squares and sparsest cut

Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

Beyond degree 2: Arora–Rao–Vazirani Approximation for Sparsest Cut

recall: sparsest cut value \(\displaystyle \min_{x\in {\{0,1\}}^n} \frac {\color{red}{f_G(x)}}{\color{green}{\frac d n {\lvert x \rvert} (n-{\lvert x \rvert})}}\)

better approx. ratio for sparsest cut

deg-2 SOS has bad approx. ratio for Sparsest Cut in terms of \(n\) — approx. ratio \(\Omega(n)\) for \(n\)-vertex path

will see: higher-degree SOS achieve much better approx. ratios

also improves over previous \(O(\log n)\) approx. ratio based on linear programming [Leighton-Rao]

th’m: deg-4 SOS has approx. ratio \(O(\sqrt{\log n})\) for Sparsest Cut
[Arora-Rao-Vazirani]

bird’s eye view of algorithm

(\(d\)-regular) undirected graph \(G\) with vertex set \([n]\)

algorithm outline:
1. find level-4 pseudo-distribution \(\mu\) with min Sparsest Cut value
2. find “\(\Delta\)-good” sets \(A,B\subseteq [n]\) for \(\mu\) (goodness to be defined)
3. find min cut \(S\) between \(A\) and \(B\) (so that \(A\subseteq S \subseteq B\))

claim 1: if \(A,B\) are “\(\Delta\)-good” for \(\mu\), then min cut \(S\) with \(A\subseteq S \subseteq B\) has value at most \(O(1/\Delta)\) times SC value of \(\mu\)

claim 2: “\(\Delta\)-good” sets always exist for some \(\Delta\ge \Omega(1/\sqrt{\log n})\)
(and can be found efficiently)

claim 2 is ❤ of analysis; easier proof for \(\Delta=1/\log n\)

well-spread assumption

assumption: pseudo-distribution is well-spread, so that

\[ \textstyle {\tilde{\mathbb{E}}}_{\mu(x)} \underbrace{\sum_{i\lt j} (x_i-x_j)^2}_{={\lvert x \rvert} \cdot (n-{\lvert x \rvert})} \ge 0.01 n^2 \]

being “well-spread” roughly means that pseudo-distribution is supported mostly on “balanced cuts” say \({\lvert x \rvert}\in [0.3n,0.7n]\)

assumption essentially without loss of generality (see lecture note)

good sets

<<<<<<< HEAD notation: =======

notation: >>>>>>> 8caeedfb3ab21a38365e113be043d19b6272ebd8 \(d(i,j) = {\tilde{\mathbb{E}}}_{\mu(x)} (x_i-x_j)^2\) (probability that \(i,j\) on different sides)

def’n: sets \(A,B\subseteq [n]\) are \(\Delta\)-good if
\(\Delta\)-separated: \(d(a,b)\ge \Delta\) for all \(a\in A\) and \(b\in B\)
large: \({\lvert A \rvert},{\lvert B \rvert}\ge 0.0001 \cdot n\)

for well-spread pseudo-distribution typical vertex pair \(i,j\) has \(d(i,j)\ge \Omega(1)\)

from good sets to sparse cuts

lemma: if \(A,B\) are \(\Delta\)-good for \(\mu\) with SC value \(\phi\), then min cut between \(A,B\) has SC value \(O(1/\Delta)\cdot \phi\)

intuition: good sets close to sparse cut; min cut procedure allows us to clean up mistakes

proof: let \(d(A,i)=\min_{a\in A} d(a,i)\). choose \(t\in [0,\Delta]\) uniformly at random. choose \(S={\{ i \mid d(A,i)\lt t \}}\) and \(y={\mathbf 1}_S\).

claim 1: \({\mathbb{E}}\color{red}{f_G(y)} \le 1/\Delta \cdot {\tilde{\mathbb{E}}}_\mu \color{red}{f_G}\)

proof of claim 1: sum up following inequality over all edges \(ij\)

\[ \begin{aligned}[t] {\mathbb{P}}{\left\{ y_i\neq y_j \right\}} &= {\mathbb{P}}{\{ d(A,i)\lt t \le d(A,j) \}}\\ &\le 1/\Delta \cdot {\lvert d(A,i)-d(A,j) \rvert}\\ & \le 1/\Delta \cdot d(i,j)=1/\Delta \cdot {\tilde{\mathbb{E}}}_{\mu(x)}(x_i-x_j)^2 \end{aligned} \]

\(\leadsto\) min cut \(y^*\) between \(A,B\) has \(\color{red}{f_G(y^*)}\le 1/\Delta \cdot {\tilde{\mathbb{E}}}_\mu \color{red}{f_G}\)

claim 2: \(\color{green}{{\lvert y^\ast \rvert}(n-{\lvert y^\ast \rvert})} \ge \Omega(1) \cdot {\tilde{\mathbb{E}}}_{\mu(x)} \color{green}{{\lvert x \rvert}(n-{\lvert x \rvert})}\)

proof of claim 2: \({\lvert y^\ast \rvert}(n-{\lvert y^\ast \rvert})\ge {\lvert A \rvert}{\lvert B \rvert}\ge \Omega(1)\cdot n^2\)

existence of good sets

structure theorem: every well-spread level-4 pseudo-distribution over \({\{0,1\}}^n\) has \(\Delta\)-good sets for \(\Delta=1/\sqrt{\log n}\)
[Arora-Rao-Vazirani]

tightness of structure theorem

for the “hypercube graph”, there exists an actual probability distribution over cuts without \(\Delta\)-good sets for \(\Delta\gg 1/\sqrt{\log n}\)

let \(\mu=\) uniform distribution over coordinate cuts for \(k\)-dim hypercube graph (\(k=\log n\))

\(d(i,j)\) is relative Hamming distance between \(i\) and \(j\) (as \(k\)-bit strings)

best sets \(A,B\): Hamming balls around \(0\) and \({\mathbf 1}\) of radius \(\tfrac12 k - \sqrt{k}\)

separation for these sets: \(d(a,b)\ge 1/\sqrt k=1/\sqrt{\log n}\)

bird’s eye view of structure theorem

proof structure: give randomized algorithm to construct \(\Delta\)-separated sets \(A,B\) and matching \(M\) of \([n]\) such that
with high probability, \({\lvert A \rvert},{\lvert B \rvert}\ge 0.001 n - {\lvert M \rvert}\)
expected matching size \({\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}\)

conclude: that algorithm constructs large separated sets when \(\Delta\le c/\sqrt{\log n}\) for sufficiently small \(c\gt0\)

upper bound on \({\mathbb{E}}{\lvert M \rvert}\) will be ♡ of analysis

interlude: maxima of Gaussian processes

let \(Z=(Z_1,\ldots,Z_t)\) be a centered Gaussian vector
variances \(\mathbb V[ Z_i] = {\mathbb{E}}[ Z_i^2]\)

expectation bound (crude version)

\({\mathbb{E}}[ \max_{i} Z_i] \le \class{fragment}{O(\sqrt {\log t})}\cdot \max_i (\mathbb V[ Z_i])^{1/2}\)

variance bound [Sudakov, Cirel’son ’74; Borell’75]

\(\mathbb V[ \max_{i} Z_i ]\le \color{#d33682}{\underbrace{O(1)}_{\text{independent of t!}}}\cdot \max_i \mathbb V[ Z_i]\)

(equivalent to concentration for Lipschitz functions over Gaussian distribution)

construction of separated sets

wlog \({\tilde{\mathbb{E}}}_{\mu(x)} x=\tfrac 12 {\mathbf 1}\) (by symmetry of \(x\) and \({\mathbf 1}-x\))

apply quadratic sampling lemma for \(x-\tfrac 12 {\mathbf 1}\) \(\leadsto\) Gaussian \(X=(X_1,\ldots,X_n)\) with \({\mathbb{E}}X=0\) and \(d(i,j)={\mathbb{E}}(X_i-X_j)^2\)

candidate sets: \(A_0={\{ i \mid X_i \le -1 \}}\) and \(B_0={\{ i \mid X_i \ge 1 \}}\)
find maximal \(M\subseteq A_0\times B_0\) of disjoint pairs not \(\Delta\)-separated
output pruned sets: \(A=A_0\setminus V(M)\) and \(B=B_0\setminus V(M)\)

(without pruning, construction gives good sets only for \(\Delta\le 1/\log n\))

\(\mu\) well-spread \(\leadsto\) \({\lvert A_0 \rvert},{\lvert B_0 \rvert}\ge 0.001n\) w. prob. \(\ge \Omega(1)\)

by construction, \({\lvert A \rvert}\ge {\lvert A_0 \rvert}-{\lvert M \rvert}\) (same for \(B\) and \(B_0\))

together: \({\lvert A \rvert},{\lvert B \rvert}\ge 0.001 - {\lvert M \rvert}\) w. prob. \(\ge \Omega(1)\) (as desired)

analysis of construction

• graph \(H\) with edge \(ij\) if \({\mathbb{E}}(X_i-X_j)^2\le \Delta\)
• Gaussian \(X=(X_1,\ldots,X_n)\) with \({\mathbb{E}}X=0\), \(d(i,j)={\mathbb{E}}(X_i-X_j)^2\)
• matching \(M\subseteq E(H)\) with \(X_j-X_i\ge 2\) whenever \((i,j)\in M(X)\)

remains to prove: \({\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}\)

will relate \(M\) to maxima of Gaussian processes defined by \(X\)

\[{{Y_i}^{(k)}} = \max_{j\in H^k(i)} X_j-X_i\color{white}{\text{ where }} H^k(i)= \color{white}{k\text{-step neighbors of }i}\]

chaining inequality: let \(\Phi(k)=\sum_{i} {\mathbb{E}}{{Y_i}^{(k)}}\) (potential). then,
\(\Phi(k+1) \ge \Phi(k)+\color{#d33682}{{\mathbb{E}}{\lvert M \rvert}} - O(n)\cdot \max_{{\{ i,j \}}\in E(H^k)}\underbrace{({\mathbb{E}}(X_i-X_j)^2)^{1/2}}_{\color{#d33682}{\le \sqrt{k\cdot \Delta}~(\ast)} }\)

bound \(\color{#d33682}{(\ast)}\) by \(\Delta\)-ineq. for pseudo-distr’n over \({\{0,1\}}^n\)

upper bound on matching size follows by combining these inequalities and the upper bound \(\Phi(k)\le n \cdot O(\sqrt{ \log n})\)

assume: \(M\) is always perfect matching (general case almost the same)

key inequalities: for every edge \(ij\) of \(H\),

\[ {{Y_i}^{(k+1)}} \ge {{Y_j}^{(k)}} + X_j -X_i \]

in particular for every edge \((i,j)\in M\),

\[ {{Y_i}^{(k+1)}} \ge {{Y_j}^{(k)}} + 2 \]

summing over \(M\),

\[ \textstyle \sum_i L_i\cdot {{Y_i}^{(k+1)}} \ge \sum_j R_j\cdot {{Y_j}^{(k)}} + {\lvert M \rvert} \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \]

where \(L_i,R_j\) indicates left and right side of \(M\)

🙁 no control over \(L_i\) and \(R_j\) except \({\mathbb{E}}L_i={\mathbb{E}}R_j = 1/2\) (by symmetry)

🙂 variance bound on \(Y\) variables \(\leadsto\) can treat \(L_i,R_j\) as constants

\[\mathbb V {{Y_{i}}^{(k+1)}} \le O(1)\max_{{\{ i,j \}}\in E(H^{k+1})} {\mathbb{E}}(X_i-X_j)^2\]

taking expectation and using variance bound

\[ \textstyle \sum_i {\mathbb{E}}{{Y_i}^{(k+1)}} \ge \sum_i {\mathbb{E}}{{Y_i}^{(k)}} + {\mathbb{E}}{\lvert M \rvert} - \sum_i (\mathbb V {{ Y_i}^{(k+1)}})^{1/2} \]

summary

new bird’s eye view of algorithm

quadratic sampling: initial partial assignment
pruning step: refine partial assignment
exact min-cut: complete partial assignment optimally

both 2nd and 3rd step use deg-4 SOS properties

key innovation: control pruning step by relating it to maxima of Gaussian processes and exploit deep properties of them