# sum-of-squares and sparsest cut

UCSD winter school on Sum-of-Squares, January 2017

# Beyond degree 2: Arora–Rao–Vazirani Approximation for Sparsest Cut

recall: sparsest cut value $$\displaystyle \min_{x\in {\{0,1\}}^n} \frac {\color{red}{f_G(x)}}{\color{green}{\frac d n {\lvert x \rvert} (n-{\lvert x \rvert})}}$$

### better approx. ratio for sparsest cut

deg-2 SOS has bad approx. ratio for Sparsest Cut in terms of $$n$$ — approx. ratio $$\Omega(n)$$ for $$n$$-vertex path

will see: higher-degree SOS achieve much better approx. ratios

also improves over previous $$O(\log n)$$ approx. ratio based on linear programming [Leighton-Rao]

th’m: deg-4 SOS has approx. ratio $$O(\sqrt{\log n})$$ for Sparsest Cut
[Arora-Rao-Vazirani]

### bird’s eye view of algorithm

($$d$$-regular) undirected graph $$G$$ with vertex set $$[n]$$

algorithm outline:
1. find level-4 pseudo-distribution $$\mu$$ with min Sparsest Cut value
2. find $$\Delta$$-good” sets $$A,B\subseteq [n]$$ for $$\mu$$ (goodness to be defined)
3. find min cut $$S$$ between $$A$$ and $$B$$ (so that $$A\subseteq S \subseteq B$$)

claim 1: if $$A,B$$ are $$\Delta$$-good” for $$\mu$$, then min cut $$S$$ with $$A\subseteq S \subseteq B$$ has value at most $$O(1/\Delta)$$ times SC value of $$\mu$$

claim 2: $$\Delta$$-good” sets always exist for some $$\Delta\ge \Omega(1/\sqrt{\log n})$$
(and can be found efficiently)

claim 2 is ❤ of analysis; easier proof for $$\Delta=1/\log n$$

assumption: pseudo-distribution is well-spread, so that

$\textstyle {\tilde{\mathbb{E}}}_{\mu(x)} \underbrace{\sum_{i\lt j} (x_i-x_j)^2}_{={\lvert x \rvert} \cdot (n-{\lvert x \rvert})} \ge 0.01 n^2$

being “well-spread” roughly means that pseudo-distribution is supported mostly on “balanced cuts” say $${\lvert x \rvert}\in [0.3n,0.7n]$$

assumption essentially without loss of generality (see lecture note)

### good sets

<<<<<<< HEAD notation: =======

notation: >>>>>>> 8caeedfb3ab21a38365e113be043d19b6272ebd8 $$d(i,j) = {\tilde{\mathbb{E}}}_{\mu(x)} (x_i-x_j)^2$$ (probability that $$i,j$$ on different sides)

def’n: sets $$A,B\subseteq [n]$$ are $$\Delta$$-good if

1. $$\Delta$$-separated: $$d(a,b)\ge \Delta$$ for all $$a\in A$$ and $$b\in B$$
2. large: $${\lvert A \rvert},{\lvert B \rvert}\ge 0.0001 \cdot n$$

for well-spread pseudo-distribution typical vertex pair $$i,j$$ has $$d(i,j)\ge \Omega(1)$$

### from good sets to sparse cuts

lemma: if $$A,B$$ are $$\Delta$$-good for $$\mu$$ with SC value $$\phi$$, then min cut between $$A,B$$ has SC value $$O(1/\Delta)\cdot \phi$$

intuition: good sets close to sparse cut; min cut procedure allows us to clean up mistakes

proof: let $$d(A,i)=\min_{a\in A} d(a,i)$$. choose $$t\in [0,\Delta]$$ uniformly at random. choose $$S={\{ i \mid d(A,i)\lt t \}}$$ and $$y={\mathbf 1}_S$$.

claim 1: $${\mathbb{E}}\color{red}{f_G(y)} \le 1/\Delta \cdot {\tilde{\mathbb{E}}}_\mu \color{red}{f_G}$$

proof of claim 1: sum up following inequality over all edges $$ij$$

\begin{aligned}[t] {\mathbb{P}}{\left\{ y_i\neq y_j \right\}} &= {\mathbb{P}}{\{ d(A,i)\lt t \le d(A,j) \}}\\ &\le 1/\Delta \cdot {\lvert d(A,i)-d(A,j) \rvert}\\ & \le 1/\Delta \cdot d(i,j)=1/\Delta \cdot {\tilde{\mathbb{E}}}_{\mu(x)}(x_i-x_j)^2 \end{aligned}

$$\leadsto$$ min cut $$y^*$$ between $$A,B$$ has $$\color{red}{f_G(y^*)}\le 1/\Delta \cdot {\tilde{\mathbb{E}}}_\mu \color{red}{f_G}$$

claim 2: $$\color{green}{{\lvert y^\ast \rvert}(n-{\lvert y^\ast \rvert})} \ge \Omega(1) \cdot {\tilde{\mathbb{E}}}_{\mu(x)} \color{green}{{\lvert x \rvert}(n-{\lvert x \rvert})}$$

proof of claim 2: $${\lvert y^\ast \rvert}(n-{\lvert y^\ast \rvert})\ge {\lvert A \rvert}{\lvert B \rvert}\ge \Omega(1)\cdot n^2$$

### existence of good sets

structure theorem: every well-spread level-4 pseudo-distribution over $${\{0,1\}}^n$$ has $$\Delta$$-good sets for $$\Delta=1/\sqrt{\log n}$$
[Arora-Rao-Vazirani]

### tightness of structure theorem

for the “hypercube graph”, there exists an actual probability distribution over cuts without $$\Delta$$-good sets for $$\Delta\gg 1/\sqrt{\log n}$$

let $$\mu=$$ uniform distribution over coordinate cuts for $$k$$-dim hypercube graph ($$k=\log n$$)

$$d(i,j)$$ is relative Hamming distance between $$i$$ and $$j$$ (as $$k$$-bit strings)

best sets $$A,B$$: Hamming balls around $$0$$ and $${\mathbf 1}$$ of radius $$\tfrac12 k - \sqrt{k}$$

separation for these sets: $$d(a,b)\ge 1/\sqrt k=1/\sqrt{\log n}$$

### bird’s eye view of structure theorem

proof structure: give randomized algorithm to construct $$\Delta$$-separated sets $$A,B$$ and matching $$M$$ of $$[n]$$ such that

1. with high probability, $${\lvert A \rvert},{\lvert B \rvert}\ge 0.001 n - {\lvert M \rvert}$$
2. expected matching size $${\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}$$

conclude: that algorithm constructs large separated sets when $$\Delta\le c/\sqrt{\log n}$$ for sufficiently small $$c\gt0$$

upper bound on $${\mathbb{E}}{\lvert M \rvert}$$ will be ♡ of analysis

### interlude: maxima of Gaussian processes

let $$Z=(Z_1,\ldots,Z_t)$$ be a centered Gaussian vector
variances $$\mathbb V[ Z_i] = {\mathbb{E}}[ Z_i^2]$$

expectation bound (crude version)

$${\mathbb{E}}[ \max_{i} Z_i] \le \class{fragment}{O(\sqrt {\log t})}\cdot \max_i (\mathbb V[ Z_i])^{1/2}$$

variance bound [Sudakov, Cirel’son ’74; Borell’75]

$$\mathbb V[ \max_{i} Z_i ]\le \color{#d33682}{\underbrace{O(1)}_{\text{independent of t!}}}\cdot \max_i \mathbb V[ Z_i]$$

(equivalent to concentration for Lipschitz functions over Gaussian distribution)

### construction of separated sets

wlog $${\tilde{\mathbb{E}}}_{\mu(x)} x=\tfrac 12 {\mathbf 1}$$ (by symmetry of $$x$$ and $${\mathbf 1}-x$$)

apply quadratic sampling lemma for $$x-\tfrac 12 {\mathbf 1}$$ $$\leadsto$$ Gaussian $$X=(X_1,\ldots,X_n)$$ with $${\mathbb{E}}X=0$$ and $$d(i,j)={\mathbb{E}}(X_i-X_j)^2$$

1. candidate sets: $$A_0={\{ i \mid X_i \le -1 \}}$$ and $$B_0={\{ i \mid X_i \ge 1 \}}$$
2. find maximal $$M\subseteq A_0\times B_0$$ of disjoint pairs not $$\Delta$$-separated
3. output pruned sets: $$A=A_0\setminus V(M)$$ and $$B=B_0\setminus V(M)$$

(without pruning, construction gives good sets only for $$\Delta\le 1/\log n$$)

$$\mu$$ well-spread $$\leadsto$$ $${\lvert A_0 \rvert},{\lvert B_0 \rvert}\ge 0.001n$$ w. prob. $$\ge \Omega(1)$$

by construction, $${\lvert A \rvert}\ge {\lvert A_0 \rvert}-{\lvert M \rvert}$$ (same for $$B$$ and $$B_0$$)

together: $${\lvert A \rvert},{\lvert B \rvert}\ge 0.001 - {\lvert M \rvert}$$ w. prob. $$\ge \Omega(1)$$ (as desired)

### analysis of construction

• graph $$H$$ with edge $$ij$$ if $${\mathbb{E}}(X_i-X_j)^2\le \Delta$$
• Gaussian $$X=(X_1,\ldots,X_n)$$ with $${\mathbb{E}}X=0$$, $$d(i,j)={\mathbb{E}}(X_i-X_j)^2$$
• matching $$M\subseteq E(H)$$ with $$X_j-X_i\ge 2$$ whenever $$(i,j)\in M(X)$$

remains to prove: $${\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}$$

will relate $$M$$ to maxima of Gaussian processes defined by $$X$$

${{Y_i}^{(k)}} = \max_{j\in H^k(i)} X_j-X_i\color{white}{\text{ where }} H^k(i)= \color{white}{k\text{-step neighbors of }i}$

chaining inequality: let $$\Phi(k)=\sum_{i} {\mathbb{E}}{{Y_i}^{(k)}}$$ (potential). then,
$$\Phi(k+1) \ge \Phi(k)+\color{#d33682}{{\mathbb{E}}{\lvert M \rvert}} - O(n)\cdot \max_{{\{ i,j \}}\in E(H^k)}\underbrace{({\mathbb{E}}(X_i-X_j)^2)^{1/2}}_{\color{#d33682}{\le \sqrt{k\cdot \Delta}~(\ast)} }$$

bound $$\color{#d33682}{(\ast)}$$ by $$\Delta$$-ineq. for pseudo-distr’n over $${\{0,1\}}^n$$

upper bound on matching size follows by combining these inequalities and the upper bound $$\Phi(k)\le n \cdot O(\sqrt{ \log n})$$

assume: $$M$$ is always perfect matching (general case almost the same)

key inequalities: for every edge $$ij$$ of $$H$$,

${{Y_i}^{(k+1)}} \ge {{Y_j}^{(k)}} + X_j -X_i$
in particular for every edge $$(i,j)\in M$$,
${{Y_i}^{(k+1)}} \ge {{Y_j}^{(k)}} + 2$

summing over $$M$$,

$\textstyle \sum_i L_i\cdot {{Y_i}^{(k+1)}} \ge \sum_j R_j\cdot {{Y_j}^{(k)}} + {\lvert M \rvert} \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad$
where $$L_i,R_j$$ indicates left and right side of $$M$$

🙁 no control over $$L_i$$ and $$R_j$$ except $${\mathbb{E}}L_i={\mathbb{E}}R_j = 1/2$$ (by symmetry)

🙂 variance bound on $$Y$$ variables $$\leadsto$$ can treat $$L_i,R_j$$ as constants

$\mathbb V {{Y_{i}}^{(k+1)}} \le O(1)\max_{{\{ i,j \}}\in E(H^{k+1})} {\mathbb{E}}(X_i-X_j)^2$

taking expectation and using variance bound

$\textstyle \sum_i {\mathbb{E}}{{Y_i}^{(k+1)}} \ge \sum_i {\mathbb{E}}{{Y_i}^{(k)}} + {\mathbb{E}}{\lvert M \rvert} - \sum_i (\mathbb V {{ Y_i}^{(k+1)}})^{1/2}$

### summary

new bird’s eye view of algorithm

1. quadratic sampling: initial partial assignment
2. pruning step: refine partial assignment
3. exact min-cut: complete partial assignment optimally

both 2nd and 3rd step use deg-4 SOS properties

key innovation: control pruning step by relating it to maxima of Gaussian processes and exploit deep properties of them

Opps, you cannot play draw N guess with this browser!