Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

**unconditional computational lower bounds**

*for classical combinationatorial optimization problems*- examples: Max Cut, Traveling Salesman

*in restricted but powerful model of computational*- generalizes best known algorithms
- all possible linear and semidefinite relaxations
- first super-polynomial lower bounds in this model

general program goes back to Yannakakis’88 for refuting flawed P=NP proofs

**proof strategy:**

- show that
**sum-of-squares (SOS) algorithm**is an*“optimal algorithm”*in this model - use
*known lower bounds for SOS algorithm*to derive lower bounds for all algorithms in this model [Grigoriev, Schoenebeck, Tulsiani, Barak-Chan-Kothari]

concrete theorem (“optimality of SOS”):For every approx ratio \(\alpha\) and constraint satisfaction problem \(\Pi\):

Ifanypolynomial-size semidefinite relaxationshas approx. ratio \(\alpha\) for \(\Pi\), thenlevel-\(O(1)\) SOShas approx. ratio \(\alpha\)

connection to optimization / convex geometry

- explicit \(0/1\) polytopes in \({\mathbb{R}}^n\) that are not (close to) linear images of spectrahedra in \({\mathbb{R}}^{N\times N}\) for \(N\le \exp({n^{0.1}})\) (intersection of affine subspace and set of \(N\times N\) psd matrices)
- strong separation between rank and “positive-semidefinite rank” of matrices

Yannakakis’s model:[Yannakakis’88]

- formalizes intuitive notion of LP relaxations for problems
- restricted enough to allow for unconditional lower bounds (indep. of P vs NP) [Fiorini-Massar-Pokutta-Tiwary de Wolf’12, Braun-Pokutta-S., Braverman-Moitra, Chan-Lee-Raghavendra-S.,Rothvoß]
- extends to SDP relaxations [Fiorini-Massar-Pokutta-Tiwary-de Wolf, Gouveia-Parrilo-Thomas]

\[
\left\{
\begin{gathered}
\sum_i {\lvert x_i \rvert}\le 1\\
x\in {\mathbb{R}}^n
\end{gathered}
\right\}
\]

\(\ell_1\)-unit ball has \(2^n\) facets \(\leadsto\) no small direct LP formulation

**but:** \(\ell_1\)-unit ball is *projection of polytope with \(2n+1\) facets*

\[
\left\{\begin{gathered}
-y \le x \le y\\
\sum_i y_i \le 1 \\
x,y\in {\mathbb{R}}^n
\end{gathered}
\right\}
\]

maximize \(f_G(x)=\sum_{ij\in E(G)}(x_i-x_j)^2\) over \({\{0,1\}}^n\)

**Max Cut as linear optimization problem:**

maximize \(\sum_{ij\in E(G)}(X_{ii}+X_{jj}-2X_{ij})\)

over \(X\in \mathrm{CUT}_n=\mathrm{convex hull}{\left\{ {x {x}^\intercal} \mid x \in {\{0,1\}}^n \right\}}\)

\(\mathrm{CUT}_n\) has exponential number of facets

\(\leadsto\) no small direct LP formulation

**general size-\(S(n)\) LP formulation of Max Cut**

polytope \(P\subseteq {\mathbb{R}}^{S(n)}\) defined by \(\le S(n)\) linear inequalities that projects to \(\mathrm{CUT}_n\)

sometimes called: *extended LP formulation* for \(\mathrm{CUT}_n\)

often exponential savings by projection: \(\ell_1\)-norm unit ball, Held-Karp TSP relaxation, LP/SDP hierarchies

**question: can we prove size lower boundsfor LP formulations for Max Cut?**

(implied by NP\(\neq\)P/poly)

**lower bounds on general LP formulations for Max Cut**

exact formulation requires size \(S(n)\ge 2^{\Omega(n)}\) [Fiorini-Massar-Pokutta-Tiwary-de Wolf’12]

approx. ratio \(\gt 1/2\) requires size \(S(n)\ge 2^{n^{\Omega(1)}}\) [Chan-Lee-Raghavendra-S.’13,Kothari-Meka-Raghavendra’16]

*what about SDP relaxations?*

(best known algorithms for Max Cut based on SDP (sum-of-squares) relaxations)

**general size-\(S(n)\) SDP formulation of Max Cut***spectrahedron \(P\subseteq {\mathbb{R}}^{S(n)\times S(n)}\) defined by intersecting affine subspace with psd cone* that projects to \(\mathrm{CUT}_n\)

**lower bounds for SDP formul’s for Max Cut** [Lee-Raghavendra-S.’15]

exact formulation requires size \(S(n)\ge 2^{n^{\Omega(1)}}\)

approx. ratio \(\gt 16/17\) requires \(S(n)\ge n^{\omega(1)}\)

(matches best NP-hardness)

for \(S(n)=n^\ell\), approx ratio no better than for \(O(\ell)\)-deg SOS

\(\leadsto\) SOS is optimal poly-size SDP approx algorithm for Max Cut

certify nonnegativity of function by decompositing it into components that are obviously nonnegative

def’n:deg-\(\ell\) SOS certificatefor \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) consists of \(g_1,\ldots,g_r{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) with \(\deg g_i\le \ell/2\) such that\[f=g_1^2+\cdots+g_r^2\]

deg-2 SOS captures best known approximation alg’m for Max Cut

Goemans–Williamson approximation for Max Cut:

for every graph \(G\) and \(c\gt0\), if \(\max_{{\{0,1\}}^n} f_G\le c\), then \(c -0.878 f_G\) has deg-2 SOS certificate

let \(U\) be linear subspace of real-valued functions on \({\{0,1\}}^n\)

(instead of subspace of low-degree functions)

def’n\(U\)-SOS certificatefor \(f{\colon}{\{0,1\}}^n\to{\mathbb{R}}\) consists of \(g_1,\ldots,g_r\in U\) such that \(f=g_1^2+\cdots+g_r^2\)

given \(f\) and explicit basis for \(U\), in time \((\dim U)^{O(1)}\) can find \(U\)-SOS certificate for \(f\) if it exists

*connection to projections of spectrahedra:*

lemma:cut polytope \(\mathrm{CUT}_n\) is projection of \(N\times N\)-dim. spectrahedron iff exists \(N^2\)-dim subspace \(U\) such that every nonegative deg-2 \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) has \(U\)-SOS certificate

*do low-dim certificates imply low-deg certificates?*

*issue:* every nonneg. \(f\) has \(1\)-dim SOS certif’te (take \(U\) as span of \(\sqrt f\))

*solution:* need to consider families of nonnegative functions

(e.g., all nonnegative quadratic functions)

def’n:family of nonnegative functions \({\mathcal F}\) has\(S(n)\)-dim. subspace certificatesif for every \(n\in {\mathbb{N}}\), exists \(S(n)\)-dim. subspace \(U_n\) such that every \(f\in {\mathcal F}\cap {\mathbb{R}}^{{\{0,1\}}^n}\) has \(U\)-SOS certificate

let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative quadratic function

\(\mathrm{extensions}(f)=\) all \(g{\colon}{\{0,1\}}^n \to {\mathbb{R}}\) such that \(g\) computes \(f\) on subset \(S\subseteq [n]\) of coordinates (\(g\) same as \(f\) up to renaming variables)

SOS optimality theorem:for every \(f\in {\{0,1\}}^m\to {\mathbb{R}}\) and \(\ell\in {\mathbb{N}}\):

if \(\mathrm{extensions}(f)\) have \(n^\ell\)-dimensional subspace certificates, then \(f\) has deg-\(10\ell\) SOS certificate

contrapositive allows us to translate lower bounds for SOS to general SDP formulations

e.g.: \((\sum_i x_i -n/2)^2-1/4\) has no deg-\(n/100\) SOS certificate

\(\leadsto\) \(\mathrm{CUT}_n\) requires superpolynomial SDP formulations

*also preserve problem structure:* if \(f\) is “Max Cut function”, then \(\mathrm{extensions}(f)\) are also “Max Cut functions”

also get bounds for superconstant \(\ell\) but

*proof:* (direction “\(\Rightarrow\)”)
suppose \(\mathrm{CUT}_n\) is projection of spect’dron

\[\textstyle S={\left\{ A_0 + \sum_{i=1}^k z_i\cdot A_i \succeq 0 \mid z\in {\mathbb{R}}^k \right\}}\subseteq {\mathbb{R}}^{N\times N}\]

every vertex \({x {x}^\intercal}\) of \(\mathrm{CUT}_n\) is projection of some \(Q(x)\in S\)

for every quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\), exists \(Y_f\in {\mathbb{R}}^{N\times N}\) such that

\[{\mathop{\mathrm{Tr}}}Y_f Q(x)=f(x)\color{white}{\text{ for all }}x\in {\{0,1\}}^n\]

choose \(U\) as span of \(x\mapsto Q^{1/2}(x)_{ij}\)

*to show:* every nonneg quadratic \(f\) has \(U\)-SOS certificate

suppose quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) is nonnegative

\(\leadsto\) \(\min_{Q \in S}{\mathop{\mathrm{Tr}}}Y_f Q\ge 0\) (because \(S\) projects to \(\mathrm{CUT}\)_n)

\(\leadsto\) \(\exists P\succeq 0\) such that \({\mathop{\mathrm{Tr}}}Y_fQ={\mathop{\mathrm{Tr}}}PQ\) for all \(Q\in S\) (by SDP duality)

\(\leadsto\) \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)={\left\| P^{1/2} Q^{1/2}(x) \right\|}_F^2\) for all \(x\in {\{0,1\}}^n\)

\(\leadsto\) \(f\) has \(U\)-SOS certificate

\({\square}\)

matrix-valued function \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) is *positive*

if \(Q(x)\succeq 0\) for all \(x\in {\{0,1\}}^n\)

characterization of subspace certificates in terms of positive matrix functions

lemmafor every \(N\)-dim subspace \(U\) of \({\mathbb{R}}^{{\{0,1\}}^n}\), exists positive \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) such that \(f\) has \(U\)-SOS certificate if and only if \(\exists P\succeq 0\) with \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)\)

converse also holds (up to quadratic loss in \(N\))

let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative

let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function

let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\) be such that \(f(x_S)={\mathop{\mathrm{Tr}}}P_S Q(x)\)

*to show:*
\(f+{\varepsilon}\) has deg-\(10\ell\) SOS certificate

*enough:*
every level-\(10\ell\) pseudo-distr’n \(\mu\) has \({\tilde{\mathbb{E}}}_\mu f\ge -{\varepsilon}\)

relative density \(\mu'= \mu/2^{-m}\) satisfies

\[
{\mathbb{E}}_\mu f
= {\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) f(x_S)
= \color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}
\]

where \(x\in {\{0,1\}}^n\) and \(S\subseteq[n]\) with \({\lvert S \rvert} = m\) are random*enough:* \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\)

(\(f\) disappeared)

let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function

let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\)

let \(\mu\) be level-\(\ell\) pseuod-distr’n on \({\{0,1\}}^m\) and \(\mu'=\mu\cdot 2^{n}\)

*enough:* \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\quad (\ast)\)

*claim:* \((\ast)\) holds if \(x\mapsto Q^{1/2}(x)\) has degree at most \(\ell\)

proof: \(x\mapsto {\mathop{\mathrm{Tr}}}P_S Q(x)={\| P^{1/2}_SQ^{1/2}(x) \|}_F^2\) is deg-\(\ell\) SOS for all \(S\)

and \(\mu\) is level-\(\ell\) pseudo-distr’n

*wishful thinking:* approximate \(Q\) by \(\tilde Q\) in \((\ast)\) such that \(x\mapsto \tilde Q^{1/2}(x)\) has “small degree”?

theorem:\(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\)

\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\)

**theme:**
*approximator need not be more complex than test*

theorem:\(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\),

\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\) \((*)\)

*intuition:*

- start with “simplest possible” choice \(\tilde Q(x)={\mathrm{Id}}\)
- as long as \((*)\) not satisfied update \(\tilde Q\) in “simplest possible” way toward improving \((*)\)

**simplicity for \(\tilde Q\):** version of entropy *(from quantum information)*

*formalizing intuition:* let \(F(x)={\mathbb{E}}_S \mu'(x_S) P_S\). then, \((*)\) is same as

\[
\color{red}{{\langle F,Q \rangle}}\ge\color{green}{ {\langle F,\tilde Q \rangle}} -{\varepsilon}/2\quad (*)
\]

**simplicity of test:** \(F\) has degree \(O_\mu(1)\) *(independent of \(n\)!)*

**new goal:** find maximum entropy \(\tilde Q\) satisfying \((*)\)

*closed form sol’n:* \(\tilde Q(x)\propto e^{-\lambda F(x)/{\varepsilon}}\), where \(\lambda\)=entropy-defect(\(Q\))

(entropy defect \(O(\log n)\) because \(Q(x)\) only \(n^\ell\)-dimensional)

*related:* (matrix) multiplicative weights, exponential families, gradient / mirror descent

**remember actual goal:** \(\deg \tilde Q ^{1/2} \le O_{\mu,{\varepsilon}}(\log n)\)

SOS algorithm can simulate general SDP formulation

can intepret general SDP formulations as

**quantum state**with*small entropy defect*learn simplest quantum state via

**matrix multiplicative weights***(entropy maximization)*show that this quantum state admits approximation as

**low-degree sum-of-squares***(matrix exponentials)*