Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

**recall:** sparsest cut value
\(\displaystyle \min_{x\in {\{0,1\}}^n} \frac {\color{red}{f_G(x)}}{\color{green}{\frac d n {\lvert x \rvert} (n-{\lvert x \rvert})}}\)

deg-2 SOS has *bad approx. ratio* for Sparsest Cut in terms of \(n\) — approx. ratio \(\Omega(n)\) for \(n\)-vertex path

**will see:**
higher-degree SOS achieve much better
approx. ratios

also improves over previous \(O(\log n)\) approx. ratio based on linear programming [Leighton-Rao]

th’m:deg-4 SOS has approx. ratio \(O(\sqrt{\log n})\) for Sparsest Cut

[Arora-Rao-Vazirani]

(\(d\)-regular) undirected graph \(G\) with vertex set \([n]\)

algorithm outline:

1. find level-4 pseudo-distribution \(\mu\) with min Sparsest Cut value

2. find“\(\Delta\)-good”sets \(A,B\subseteq [n]\) for \(\mu\) (goodnessto be defined)

3. find min cut \(S\) between \(A\) and \(B\) (so that \(A\subseteq S \subseteq B\))

claim 1:if \(A,B\) are “\(\Delta\)-good” for \(\mu\), then min cut \(S\) with \(A\subseteq S \subseteq B\) has value at most \(O(1/\Delta)\) times SC value of \(\mu\)

claim 2:“\(\Delta\)-good” sets always exist for some \(\Delta\ge \Omega(1/\sqrt{\log n})\)

(and can be found efficiently)

claim 2 is ❤ of analysis; easier proof for \(\Delta=1/\log n\)

**assumption:** pseudo-distribution is *well-spread*, so that

\[
\textstyle {\tilde{\mathbb{E}}}_{\mu(x)} \underbrace{\sum_{i\lt j} (x_i-x_j)^2}_{={\lvert x \rvert} \cdot (n-{\lvert x \rvert})} \ge 0.01 n^2
\]

being “well-spread” roughly means that pseudo-distribution is supported mostly on “balanced cuts” say \({\lvert x \rvert}\in [0.3n,0.7n]\)

assumption essentially without loss of generality (see lecture note)

<<<<<<< HEAD
*notation:*
=======

*notation:*
>>>>>>> 8caeedfb3ab21a38365e113be043d19b6272ebd8
\(d(i,j) = {\tilde{\mathbb{E}}}_{\mu(x)} (x_i-x_j)^2\) (probability that \(i,j\) on different sides)

def’n:sets \(A,B\subseteq [n]\) are\(\Delta\)-goodif

\(\Delta\)-separated:\(d(a,b)\ge \Delta\) for all \(a\in A\) and \(b\in B\)large:\({\lvert A \rvert},{\lvert B \rvert}\ge 0.0001 \cdot n\)

for well-spread pseudo-distribution typical vertex pair \(i,j\) has \(d(i,j)\ge \Omega(1)\)

**lemma:** if \(A,B\) are \(\Delta\)-good for \(\mu\) with SC value \(\phi\), then min cut between \(A,B\) has SC value \(O(1/\Delta)\cdot \phi\)

*intuition:* good sets close to sparse cut; min cut procedure allows us to clean up mistakes

*proof:*
let \(d(A,i)=\min_{a\in A} d(a,i)\).
choose \(t\in [0,\Delta]\) uniformly at random.
choose \(S={\{ i \mid d(A,i)\lt t \}}\) and \(y={\mathbf 1}_S\).

proof of claim 1: sum up following inequality over all edges \(ij\)

\[
\begin{aligned}[t]
{\mathbb{P}}{\left\{ y_i\neq y_j \right\}}
&= {\mathbb{P}}{\{ d(A,i)\lt t \le d(A,j) \}}\\
&\le 1/\Delta \cdot {\lvert d(A,i)-d(A,j) \rvert}\\
& \le 1/\Delta \cdot d(i,j)=1/\Delta \cdot {\tilde{\mathbb{E}}}_{\mu(x)}(x_i-x_j)^2
\end{aligned}
\]

\(\leadsto\) min cut \(y^*\) between \(A,B\) has \(\color{red}{f_G(y^*)}\le 1/\Delta \cdot {\tilde{\mathbb{E}}}_\mu \color{red}{f_G}\)

proof of claim 2: \({\lvert y^\ast \rvert}(n-{\lvert y^\ast \rvert})\ge {\lvert A \rvert}{\lvert B \rvert}\ge \Omega(1)\cdot n^2\)

structure theorem:every well-spread level-4 pseudo-distribution over \({\{0,1\}}^n\) has \(\Delta\)-good sets for \(\Delta=1/\sqrt{\log n}\)

[Arora-Rao-Vazirani]

for the “hypercube graph”, there exists an *actual probability distribution* over cuts without \(\Delta\)-good sets for \(\Delta\gg 1/\sqrt{\log n}\)

let \(\mu=\) uniform distribution over *coordinate cuts* for \(k\)-dim hypercube graph (\(k=\log n\))

\(d(i,j)\) is *relative Hamming distance* between \(i\) and \(j\) (as \(k\)-bit strings)

best sets \(A,B\): Hamming balls around \(0\) and \({\mathbf 1}\) of radius \(\tfrac12 k - \sqrt{k}\)

separation for these sets: \(d(a,b)\ge 1/\sqrt k=1/\sqrt{\log n}\)

proof structure:give randomized algorithm to construct \(\Delta\)-separated sets \(A,B\) and matching \(M\) of \([n]\) such that

- with high probability, \({\lvert A \rvert},{\lvert B \rvert}\ge 0.001 n - {\lvert M \rvert}\)
- expected matching size \({\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}\)

*conclude:* that algorithm constructs large separated sets when \(\Delta\le c/\sqrt{\log n}\) for sufficiently small \(c\gt0\)

upper bound on \({\mathbb{E}}{\lvert M \rvert}\) will be *♡ of analysis*

let \(Z=(Z_1,\ldots,Z_t)\) be a centered Gaussian vector

variances \(\mathbb V[ Z_i] = {\mathbb{E}}[ Z_i^2]\)

**expectation bound** (crude version)

\({\mathbb{E}}[ \max_{i} Z_i] \le \class{fragment}{O(\sqrt {\log t})}\cdot \max_i (\mathbb V[ Z_i])^{1/2}\)

**variance bound** [Sudakov, Cirel’son ’74; Borell’75]

\(\mathbb V[ \max_{i} Z_i ]\le \color{#d33682}{\underbrace{O(1)}_{\text{independent of t!}}}\cdot \max_i \mathbb V[ Z_i]\)

(equivalent to concentration for Lipschitz functions over Gaussian distribution)

wlog \({\tilde{\mathbb{E}}}_{\mu(x)} x=\tfrac 12 {\mathbf 1}\) (by symmetry of \(x\) and \({\mathbf 1}-x\))

apply quadratic sampling lemma for \(x-\tfrac 12 {\mathbf 1}\) \(\leadsto\) Gaussian \(X=(X_1,\ldots,X_n)\) with \({\mathbb{E}}X=0\) and \(d(i,j)={\mathbb{E}}(X_i-X_j)^2\)

- candidate sets: \(A_0={\{ i \mid X_i \le -1 \}}\) and \(B_0={\{ i \mid X_i \ge 1 \}}\)
- find maximal \(M\subseteq A_0\times B_0\) of disjoint pairs not \(\Delta\)-separated
- output pruned sets: \(A=A_0\setminus V(M)\) and \(B=B_0\setminus V(M)\)

(without pruning, construction gives good sets only for \(\Delta\le 1/\log n\))

\(\mu\) well-spread \(\leadsto\) \({\lvert A_0 \rvert},{\lvert B_0 \rvert}\ge 0.001n\) w. prob. \(\ge \Omega(1)\)

by construction, \({\lvert A \rvert}\ge {\lvert A_0 \rvert}-{\lvert M \rvert}\) (same for \(B\) and \(B_0\))

*together:* \({\lvert A \rvert},{\lvert B \rvert}\ge 0.001 - {\lvert M \rvert}\) w. prob. \(\ge \Omega(1)\) (as desired)

• graph \(H\) with edge \(ij\) if \({\mathbb{E}}(X_i-X_j)^2\le \Delta\)

• Gaussian \(X=(X_1,\ldots,X_n)\) with \({\mathbb{E}}X=0\), \(d(i,j)={\mathbb{E}}(X_i-X_j)^2\)

• matching \(M\subseteq E(H)\) with \(X_j-X_i\ge 2\) whenever \((i,j)\in M(X)\)

**remains to prove:** \({\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}\)

will relate \(M\) to maxima of *Gaussian processes* defined by \(X\)

\[{{Y_i}^{(k)}} = \max_{j\in H^k(i)} X_j-X_i\color{white}{\text{ where }} H^k(i)= \color{white}{k\text{-step neighbors of }i}\]

**chaining inequality:**
let \(\Phi(k)=\sum_{i} {\mathbb{E}}{{Y_i}^{(k)}}\) (potential). then,

\(\Phi(k+1) \ge \Phi(k)+\color{#d33682}{{\mathbb{E}}{\lvert M \rvert}} - O(n)\cdot \max_{{\{ i,j \}}\in E(H^k)}\underbrace{({\mathbb{E}}(X_i-X_j)^2)^{1/2}}_{\color{#d33682}{\le \sqrt{k\cdot \Delta}~(\ast)} }\)

bound \(\color{#d33682}{(\ast)}\) by \(\Delta\)-ineq. for pseudo-distr’n over \({\{0,1\}}^n\)

upper bound on matching size follows by combining these inequalities and the upper bound \(\Phi(k)\le n \cdot O(\sqrt{ \log n})\)

*assume:* \(M\) is always perfect matching (general case almost the same)

*key inequalities:* for every edge \(ij\) of \(H\),

\[
{{Y_i}^{(k+1)}} \ge {{Y_j}^{(k)}} + X_j -X_i
\]

in particular for every edge \((i,j)\in M\),
\[
{{Y_i}^{(k+1)}} \ge {{Y_j}^{(k)}} + 2
\]

summing over \(M\),

\[
\textstyle \sum_i L_i\cdot {{Y_i}^{(k+1)}} \ge \sum_j R_j\cdot {{Y_j}^{(k)}} + {\lvert M \rvert} \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad
\]

where \(L_i,R_j\) indicates left and right side of \(M\)*🙁 no control over \(L_i\) and \(R_j\) except \({\mathbb{E}}L_i={\mathbb{E}}R_j = 1/2\)* (by symmetry)

**🙂** variance bound on \(Y\) variables \(\leadsto\) can treat \(L_i,R_j\) as constants

\[\mathbb V {{Y_{i}}^{(k+1)}} \le O(1)\max_{{\{ i,j \}}\in E(H^{k+1})} {\mathbb{E}}(X_i-X_j)^2\]

taking expectation and using variance bound

\[
\textstyle \sum_i {\mathbb{E}}{{Y_i}^{(k+1)}} \ge \sum_i {\mathbb{E}}{{Y_i}^{(k)}} + {\mathbb{E}}{\lvert M \rvert} - \sum_i (\mathbb V {{ Y_i}^{(k+1)}})^{1/2}
\]

**new bird’s eye view of algorithm**

quadratic sampling:initial partial assignmentpruning step:refine partial assignmentexact min-cut:complete partial assignment optimally

both 2nd and 3rd step use deg-4 SOS properties

*key innovation:*
control pruning step by relating it to maxima of Gaussian processes and exploit deep properties of them