Lowerbounds on the size of semidefinite relaxations

Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

overview

results

unconditional computational lower bounds

for classical combinationatorial optimization problems
- examples: Max Cut, Traveling Salesman
in restricted but powerful model of computational
- generalizes best known algorithms
- all possible linear and semidefinite relaxations
- first super-polynomial lower bounds in this model

general program goes back to Yannakakis’88 for refuting flawed P=NP proofs

proof strategy:

show that sum-of-squares (SOS) algorithm is an “optimal algorithm” in this model
use known lower bounds for SOS algorithm to derive lower bounds for all algorithms in this model [Grigoriev, Schoenebeck, Tulsiani, Barak-Chan-Kothari]

concrete theorem (“optimality of SOS”): For every approx ratio \(\alpha\) and constraint satisfaction problem \(\Pi\):
If any polynomial-size semidefinite relaxations has approx. ratio \(\alpha\) for \(\Pi\), then level-\(O(1)\) SOS has approx. ratio \(\alpha\)

connection to optimization / convex geometry
explicit \(0/1\) polytopes in \({\mathbb{R}}^n\) that are not (close to) linear images of spectrahedra in \({\mathbb{R}}^{N\times N}\) for \(N\le \exp({n^{0.1}})\) (intersection of affine subspace and set of \(N\times N\) psd matrices)
strong separation between rank and “positive-semidefinite rank” of matrices

Yannakakis’s LP model

Yannakakis’s model: [Yannakakis’88]
formalizes intuitive notion of LP relaxations for problems
restricted enough to allow for unconditional lower bounds (indep. of P vs NP) [Fiorini-Massar-Pokutta-Tiwary de Wolf’12, Braun-Pokutta-S., Braverman-Moitra, Chan-Lee-Raghavendra-S.,Rothvoß]
extends to SDP relaxations [Fiorini-Massar-Pokutta-Tiwary-de Wolf, Gouveia-Parrilo-Thomas]

example: \(\ell_1\)-unit ball

\[ \left\{ \begin{gathered} \sum_i {\lvert x_i \rvert}\le 1\\ x\in {\mathbb{R}}^n \end{gathered} \right\} \]

\(\ell_1\)-unit ball has \(2^n\) facets \(\leadsto\) no small direct LP formulation

but: \(\ell_1\)-unit ball is projection of polytope with \(2n+1\) facets

\[ \left\{\begin{gathered} -y \le x \le y\\ \sum_i y_i \le 1 \\ x,y\in {\mathbb{R}}^n \end{gathered} \right\} \]

LP formulation of Max Cut

maximize \(f_G(x)=\sum_{ij\in E(G)}(x_i-x_j)^2\) over \({\{0,1\}}^n\)

Max Cut as linear optimization problem:
maximize \(\sum_{ij\in E(G)}(X_{ii}+X_{jj}-2X_{ij})\)
over \(X\in \mathrm{CUT}_n=\mathrm{convex hull}{\left\{ {x {x}^\intercal} \mid x \in {\{0,1\}}^n \right\}}\)

\(\mathrm{CUT}_n\) has exponential number of facets
\(\leadsto\) no small direct LP formulation

general size-\(S(n)\) LP formulation of Max Cut
polytope \(P\subseteq {\mathbb{R}}^{S(n)}\) defined by \(\le S(n)\) linear inequalities that projects to \(\mathrm{CUT}_n\)

sometimes called: extended LP formulation for \(\mathrm{CUT}_n\)

often exponential savings by projection: \(\ell_1\)-norm unit ball, Held-Karp TSP relaxation, LP/SDP hierarchies

question: can we prove size lower bounds
for LP formulations for Max Cut?

(implied by NP\(\neq\)P/poly)

lower bounds on general LP formulations for Max Cut

exact formulation requires size \(S(n)\ge 2^{\Omega(n)}\) [Fiorini-Massar-Pokutta-Tiwary-de Wolf’12]

approx. ratio \(\gt 1/2\) requires size \(S(n)\ge 2^{n^{\Omega(1)}}\) [Chan-Lee-Raghavendra-S.’13,Kothari-Meka-Raghavendra’16]

what about SDP relaxations?
(best known algorithms for Max Cut based on SDP (sum-of-squares) relaxations)

SDP formulations of Max Cut

general size-\(S(n)\) SDP formulation of Max Cut
spectrahedron \(P\subseteq {\mathbb{R}}^{S(n)\times S(n)}\) defined by intersecting affine subspace with psd cone that projects to \(\mathrm{CUT}_n\)

lower bounds for SDP formul’s for Max Cut [Lee-Raghavendra-S.’15]

exact formulation requires size \(S(n)\ge 2^{n^{\Omega(1)}}\)

approx. ratio \(\gt 16/17\) requires \(S(n)\ge n^{\omega(1)}\)
(matches best NP-hardness)

for \(S(n)=n^\ell\), approx ratio no better than for \(O(\ell)\)-deg SOS
\(\leadsto\) SOS is optimal poly-size SDP approx algorithm for Max Cut

proofs

recall: SOS certificates

certify nonnegativity of function by decompositing it into components that are obviously nonnegative

def’n: deg-\(\ell\) SOS certificate for \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) consists of \(g_1,\ldots,g_r{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) with \(\deg g_i\le \ell/2\) such that
\[f=g_1^2+\cdots+g_r^2\]

deg-2 SOS captures best known approximation alg’m for Max Cut

Goemans–Williamson approximation for Max Cut:
for every graph \(G\) and \(c\gt0\), if \(\max_{{\{0,1\}}^n} f_G\le c\), then \(c -0.878 f_G\) has deg-2 SOS certificate

general subspace certificates

let \(U\) be linear subspace of real-valued functions on \({\{0,1\}}^n\)
(instead of subspace of low-degree functions)

def’n \(U\)-SOS certificate for \(f{\colon}{\{0,1\}}^n\to{\mathbb{R}}\) consists of \(g_1,\ldots,g_r\in U\) such that \(f=g_1^2+\cdots+g_r^2\)

given \(f\) and explicit basis for \(U\), in time \((\dim U)^{O(1)}\) can find \(U\)-SOS certificate for \(f\) if it exists

connection to projections of spectrahedra:

lemma: cut polytope \(\mathrm{CUT}_n\) is projection of \(N\times N\)-dim. spectrahedron iff exists \(N^2\)-dim subspace \(U\) such that every nonegative deg-2 \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) has \(U\)-SOS certificate

efficient subspace certificates

do low-dim certificates imply low-deg certificates?

issue: every nonneg. \(f\) has \(1\)-dim SOS certif’te (take \(U\) as span of \(\sqrt f\))

solution: need to consider families of nonnegative functions
(e.g., all nonnegative quadratic functions)

def’n: family of nonnegative functions \({\mathcal F}\) has \(S(n)\)-dim. subspace certificates if for every \(n\in {\mathbb{N}}\), exists \(S(n)\)-dim. subspace \(U_n\) such that every \(f\in {\mathcal F}\cap {\mathbb{R}}^{{\{0,1\}}^n}\) has \(U\)-SOS certificate

general lower bound theorem

let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative quadratic function

\(\mathrm{extensions}(f)=\) all \(g{\colon}{\{0,1\}}^n \to {\mathbb{R}}\) such that \(g\) computes \(f\) on subset \(S\subseteq [n]\) of coordinates (\(g\) same as \(f\) up to renaming variables)

SOS optimality theorem: for every \(f\in {\{0,1\}}^m\to {\mathbb{R}}\) and \(\ell\in {\mathbb{N}}\):
if \(\mathrm{extensions}(f)\) have \(n^\ell\)-dimensional subspace certificates, then \(f\) has deg-\(10\ell\) SOS certificate

contrapositive allows us to translate lower bounds for SOS to general SDP formulations

e.g.: \((\sum_i x_i -n/2)^2-1/4\) has no deg-\(n/100\) SOS certificate
\(\leadsto\) \(\mathrm{CUT}_n\) requires superpolynomial SDP formulations

also preserve problem structure: if \(f\) is “Max Cut function”, then \(\mathrm{extensions}(f)\) are also “Max Cut functions”

also get bounds for superconstant \(\ell\) but

proofs

proof: (direction “\(\Rightarrow\)”) suppose \(\mathrm{CUT}_n\) is projection of spect’dron

\[\textstyle S={\left\{ A_0 + \sum_{i=1}^k z_i\cdot A_i \succeq 0 \mid z\in {\mathbb{R}}^k \right\}}\subseteq {\mathbb{R}}^{N\times N}\]

every vertex \({x {x}^\intercal}\) of \(\mathrm{CUT}_n\) is projection of some \(Q(x)\in S\)

for every quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\), exists \(Y_f\in {\mathbb{R}}^{N\times N}\) such that

\[{\mathop{\mathrm{Tr}}}Y_f Q(x)=f(x)\color{white}{\text{ for all }}x\in {\{0,1\}}^n\]

choose \(U\) as span of \(x\mapsto Q^{1/2}(x)_{ij}\)

to show: every nonneg quadratic \(f\) has \(U\)-SOS certificate

suppose quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) is nonnegative

\(\leadsto\) \(\min_{Q \in S}{\mathop{\mathrm{Tr}}}Y_f Q\ge 0\) (because \(S\) projects to \(\mathrm{CUT}\)_n)

\(\leadsto\) \(\exists P\succeq 0\) such that \({\mathop{\mathrm{Tr}}}Y_fQ={\mathop{\mathrm{Tr}}}PQ\) for all \(Q\in S\) (by SDP duality)

\(\leadsto\) \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)={\left\| P^{1/2} Q^{1/2}(x) \right\|}_F^2\) for all \(x\in {\{0,1\}}^n\)

\(\leadsto\) \(f\) has \(U\)-SOS certificate

\({\square}\)

positive matrix functions

matrix-valued function \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) is positive
if \(Q(x)\succeq 0\) for all \(x\in {\{0,1\}}^n\)

characterization of subspace certificates in terms of positive matrix functions

lemma for every \(N\)-dim subspace \(U\) of \({\mathbb{R}}^{{\{0,1\}}^n}\), exists positive \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) such that \(f\) has \(U\)-SOS certificate if and only if \(\exists P\succeq 0\) with \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)\)

converse also holds (up to quadratic loss in \(N\))

proof strategy

let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative

let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function
let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\) be such that \(f(x_S)={\mathop{\mathrm{Tr}}}P_S Q(x)\)

to show: \(f+{\varepsilon}\) has deg-\(10\ell\) SOS certificate

enough: every level-\(10\ell\) pseudo-distr’n \(\mu\) has \({\tilde{\mathbb{E}}}_\mu f\ge -{\varepsilon}\)

relative density \(\mu'= \mu/2^{-m}\) satisfies

\[ {\mathbb{E}}_\mu f = {\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) f(x_S) = \color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \]

where \(x\in {\{0,1\}}^n\) and \(S\subseteq[n]\) with \({\lvert S \rvert} = m\) are random

enough: \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\)

(\(f\) disappeared)

let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function
let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\)
let \(\mu\) be level-\(\ell\) pseuod-distr’n on \({\{0,1\}}^m\) and \(\mu'=\mu\cdot 2^{n}\)

enough: \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\quad (\ast)\)

claim: \((\ast)\) holds if \(x\mapsto Q^{1/2}(x)\) has degree at most \(\ell\)

proof: \(x\mapsto {\mathop{\mathrm{Tr}}}P_S Q(x)={\| P^{1/2}_SQ^{1/2}(x) \|}_F^2\) is deg-\(\ell\) SOS for all \(S\)
and \(\mu\) is level-\(\ell\) pseudo-distr’n

wishful thinking: approximate \(Q\) by \(\tilde Q\) in \((\ast)\) such that \(x\mapsto \tilde Q^{1/2}(x)\) has “small degree”?

theorem: \(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\)
\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\)

theme: approximator need not be more complex than test

learning simple approximators

theorem: \(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\),
\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\) \((*)\)

intuition:

start with “simplest possible” choice \(\tilde Q(x)={\mathrm{Id}}\)
as long as \((*)\) not satisfied update \(\tilde Q\) in “simplest possible” way toward improving \((*)\)

simplicity for \(\tilde Q\): version of entropy (from quantum information)

formalizing intuition: let \(F(x)={\mathbb{E}}_S \mu'(x_S) P_S\). then, \((*)\) is same as

\[ \color{red}{{\langle F,Q \rangle}}\ge\color{green}{ {\langle F,\tilde Q \rangle}} -{\varepsilon}/2\quad (*) \]

simplicity of test: \(F\) has degree \(O_\mu(1)\) (independent of \(n\)!)

new goal: find maximum entropy \(\tilde Q\) satisfying \((*)\)

closed form sol’n: \(\tilde Q(x)\propto e^{-\lambda F(x)/{\varepsilon}}\), where \(\lambda\)=entropy-defect(\(Q\))
(entropy defect \(O(\log n)\) because \(Q(x)\) only \(n^\ell\)-dimensional)

related: (matrix) multiplicative weights, exponential families, gradient / mirror descent

remember actual goal: \(\deg \tilde Q ^{1/2} \le O_{\mu,{\varepsilon}}(\log n)\)

conclusion

SOS algorithm can simulate general SDP formulation

can intepret general SDP formulations as quantum state with small entropy defect
learn simplest quantum state via matrix multiplicative weights (entropy maximization)
show that this quantum state admits approximation as low-degree sum-of-squares (matrix exponentials)