Boaz Barak and David Steurer
UCSD winter school on Sum-of-Squares, January 2017
recall: sparsest cut value \(\displaystyle \min_{x\in {\{0,1\}}^n} \frac {\color{red}{f_G(x)}}{\color{green}{\frac d n {\lvert x \rvert} (n-{\lvert x \rvert})}}\)
deg-2 SOS has bad approx. ratio for Sparsest Cut in terms of \(n\) — approx. ratio \(\Omega(n)\) for \(n\)-vertex path
will see: higher-degree SOS achieve much better approx. ratios
also improves over previous \(O(\log n)\) approx. ratio based on linear programming [Leighton-Rao]
th’m: deg-4 SOS has approx. ratio \(O(\sqrt{\log n})\) for Sparsest Cut
[Arora-Rao-Vazirani]
(\(d\)-regular) undirected graph \(G\) with vertex set \([n]\)
algorithm outline:
1. find level-4 pseudo-distribution \(\mu\) with min Sparsest Cut value
2. find “\(\Delta\)-good” sets \(A,B\subseteq [n]\) for \(\mu\) (goodness to be defined)
3. find min cut \(S\) between \(A\) and \(B\) (so that \(A\subseteq S \subseteq B\))
claim 1: if \(A,B\) are “\(\Delta\)-good” for \(\mu\), then min cut \(S\) with \(A\subseteq S \subseteq B\) has value at most \(O(1/\Delta)\) times SC value of \(\mu\)
claim 2: “\(\Delta\)-good” sets always exist for some \(\Delta\ge \Omega(1/\sqrt{\log n})\)
(and can be found efficiently)
claim 2 is ❤ of analysis; easier proof for \(\Delta=1/\log n\)
assumption: pseudo-distribution is well-spread, so that
being “well-spread” roughly means that pseudo-distribution is supported mostly on “balanced cuts” say \({\lvert x \rvert}\in [0.3n,0.7n]\)
assumption essentially without loss of generality (see lecture note)
<<<<<<< HEAD notation: =======
notation: >>>>>>> 8caeedfb3ab21a38365e113be043d19b6272ebd8 \(d(i,j) = {\tilde{\mathbb{E}}}_{\mu(x)} (x_i-x_j)^2\) (probability that \(i,j\) on different sides)
def’n: sets \(A,B\subseteq [n]\) are \(\Delta\)-good if
- \(\Delta\)-separated: \(d(a,b)\ge \Delta\) for all \(a\in A\) and \(b\in B\)
- large: \({\lvert A \rvert},{\lvert B \rvert}\ge 0.0001 \cdot n\)
for well-spread pseudo-distribution typical vertex pair \(i,j\) has \(d(i,j)\ge \Omega(1)\)
lemma: if \(A,B\) are \(\Delta\)-good for \(\mu\) with SC value \(\phi\), then min cut between \(A,B\) has SC value \(O(1/\Delta)\cdot \phi\)
intuition: good sets close to sparse cut; min cut procedure allows us to clean up mistakes
proof: let \(d(A,i)=\min_{a\in A} d(a,i)\). choose \(t\in [0,\Delta]\) uniformly at random. choose \(S={\{ i \mid d(A,i)\lt t \}}\) and \(y={\mathbf 1}_S\).
proof of claim 1: sum up following inequality over all edges \(ij\)
\(\leadsto\) min cut \(y^*\) between \(A,B\) has \(\color{red}{f_G(y^*)}\le 1/\Delta \cdot {\tilde{\mathbb{E}}}_\mu \color{red}{f_G}\)
proof of claim 2: \({\lvert y^\ast \rvert}(n-{\lvert y^\ast \rvert})\ge {\lvert A \rvert}{\lvert B \rvert}\ge \Omega(1)\cdot n^2\)
structure theorem: every well-spread level-4 pseudo-distribution over \({\{0,1\}}^n\) has \(\Delta\)-good sets for \(\Delta=1/\sqrt{\log n}\)
[Arora-Rao-Vazirani]
for the “hypercube graph”, there exists an actual probability distribution over cuts without \(\Delta\)-good sets for \(\Delta\gg 1/\sqrt{\log n}\)
let \(\mu=\) uniform distribution over coordinate cuts for \(k\)-dim hypercube graph (\(k=\log n\))
\(d(i,j)\) is relative Hamming distance between \(i\) and \(j\) (as \(k\)-bit strings)
best sets \(A,B\): Hamming balls around \(0\) and \({\mathbf 1}\) of radius \(\tfrac12 k - \sqrt{k}\)
separation for these sets: \(d(a,b)\ge 1/\sqrt k=1/\sqrt{\log n}\)
proof structure: give randomized algorithm to construct \(\Delta\)-separated sets \(A,B\) and matching \(M\) of \([n]\) such that
- with high probability, \({\lvert A \rvert},{\lvert B \rvert}\ge 0.001 n - {\lvert M \rvert}\)
- expected matching size \({\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}\)
conclude: that algorithm constructs large separated sets when \(\Delta\le c/\sqrt{\log n}\) for sufficiently small \(c\gt0\)
upper bound on \({\mathbb{E}}{\lvert M \rvert}\) will be ♡ of analysis
let \(Z=(Z_1,\ldots,Z_t)\) be a centered Gaussian vector
variances \(\mathbb V[ Z_i] = {\mathbb{E}}[ Z_i^2]\)
expectation bound (crude version)
\({\mathbb{E}}[ \max_{i} Z_i] \le \class{fragment}{O(\sqrt {\log t})}\cdot \max_i (\mathbb V[ Z_i])^{1/2}\)
variance bound [Sudakov, Cirel’son ’74; Borell’75]
\(\mathbb V[ \max_{i} Z_i ]\le \color{#d33682}{\underbrace{O(1)}_{\text{independent of t!}}}\cdot \max_i \mathbb V[ Z_i]\)
(equivalent to concentration for Lipschitz functions over Gaussian distribution)
wlog \({\tilde{\mathbb{E}}}_{\mu(x)} x=\tfrac 12 {\mathbf 1}\) (by symmetry of \(x\) and \({\mathbf 1}-x\))
apply quadratic sampling lemma for \(x-\tfrac 12 {\mathbf 1}\) \(\leadsto\) Gaussian \(X=(X_1,\ldots,X_n)\) with \({\mathbb{E}}X=0\) and \(d(i,j)={\mathbb{E}}(X_i-X_j)^2\)
(without pruning, construction gives good sets only for \(\Delta\le 1/\log n\))
\(\mu\) well-spread \(\leadsto\) \({\lvert A_0 \rvert},{\lvert B_0 \rvert}\ge 0.001n\) w. prob. \(\ge \Omega(1)\)
by construction, \({\lvert A \rvert}\ge {\lvert A_0 \rvert}-{\lvert M \rvert}\) (same for \(B\) and \(B_0\))
together: \({\lvert A \rvert},{\lvert B \rvert}\ge 0.001 - {\lvert M \rvert}\) w. prob. \(\ge \Omega(1)\) (as desired)
• graph \(H\) with edge \(ij\) if \({\mathbb{E}}(X_i-X_j)^2\le \Delta\)
• Gaussian \(X=(X_1,\ldots,X_n)\) with \({\mathbb{E}}X=0\), \(d(i,j)={\mathbb{E}}(X_i-X_j)^2\)
• matching \(M\subseteq E(H)\) with \(X_j-X_i\ge 2\) whenever \((i,j)\in M(X)\)
remains to prove: \({\mathbb{E}}{\lvert M \rvert} \le n\cdot (\Delta \cdot \sqrt{\log n})^{\Omega(1)}\)
will relate \(M\) to maxima of Gaussian processes defined by \(X\)
chaining inequality:
let \(\Phi(k)=\sum_{i} {\mathbb{E}}{{Y_i}^{(k)}}\) (potential). then,
\(\Phi(k+1) \ge \Phi(k)+\color{#d33682}{{\mathbb{E}}{\lvert M \rvert}} - O(n)\cdot \max_{{\{ i,j \}}\in E(H^k)}\underbrace{({\mathbb{E}}(X_i-X_j)^2)^{1/2}}_{\color{#d33682}{\le \sqrt{k\cdot \Delta}~(\ast)} }\)
bound \(\color{#d33682}{(\ast)}\) by \(\Delta\)-ineq. for pseudo-distr’n over \({\{0,1\}}^n\)
upper bound on matching size follows by combining these inequalities and the upper bound \(\Phi(k)\le n \cdot O(\sqrt{ \log n})\)
assume: \(M\) is always perfect matching (general case almost the same)
key inequalities: for every edge \(ij\) of \(H\),
summing over \(M\),
🙁 no control over \(L_i\) and \(R_j\) except \({\mathbb{E}}L_i={\mathbb{E}}R_j = 1/2\) (by symmetry)
🙂 variance bound on \(Y\) variables \(\leadsto\) can treat \(L_i,R_j\) as constants
taking expectation and using variance bound
new bird’s eye view of algorithm
- quadratic sampling: initial partial assignment
- pruning step: refine partial assignment
- exact min-cut: complete partial assignment optimally
both 2nd and 3rd step use deg-4 SOS properties
key innovation: control pruning step by relating it to maxima of Gaussian processes and exploit deep properties of them