Lowerbounds on the size of semidefinite relaxations

Boaz Barak and David Steurer

UCSD winter school on Sum-of-Squares, January 2017

overview

results

unconditional computational lower bounds

  • for classical combinationatorial optimization problems
    • examples: Max Cut, Traveling Salesman
  • in restricted but powerful model of computational
    • generalizes best known algorithms
    • all possible linear and semidefinite relaxations
    • first super-polynomial lower bounds in this model

general program goes back to Yannakakis’88 for refuting flawed P=NP proofs

proof strategy:

  • show that sum-of-squares (SOS) algorithm is an “optimal algorithm” in this model
  • use known lower bounds for SOS algorithm to derive lower bounds for all algorithms in this model [Grigoriev, Schoenebeck, Tulsiani, Barak-Chan-Kothari]

concrete theorem (“optimality of SOS”): For every approx ratio \(\alpha\) and constraint satisfaction problem \(\Pi\):
If any polynomial-size semidefinite relaxations has approx. ratio \(\alpha\) for \(\Pi\), then level-\(O(1)\) SOS has approx. ratio \(\alpha\)

connection to optimization / convex geometry

  • explicit \(0/1\) polytopes in \({\mathbb{R}}^n\) that are not (close to) linear images of spectrahedra in \({\mathbb{R}}^{N\times N}\) for \(N\le \exp({n^{0.1}})\) (intersection of affine subspace and set of \(N\times N\) psd matrices)
  • strong separation between rank and “positive-semidefinite rank” of matrices

Yannakakis’s LP model

Yannakakis’s model: [Yannakakis’88]

  • formalizes intuitive notion of LP relaxations for problems
  • restricted enough to allow for unconditional lower bounds (indep. of P vs NP) [Fiorini-Massar-Pokutta-Tiwary de Wolf’12, Braun-Pokutta-S., Braverman-Moitra, Chan-Lee-Raghavendra-S.,Rothvoß]
  • extends to SDP relaxations [Fiorini-Massar-Pokutta-Tiwary-de Wolf, Gouveia-Parrilo-Thomas]

example: \(\ell_1\)-unit ball

\[ \left\{ \begin{gathered} \sum_i {\lvert x_i \rvert}\le 1\\ x\in {\mathbb{R}}^n \end{gathered} \right\} \]

\(\ell_1\)-unit ball has \(2^n\) facets \(\leadsto\) no small direct LP formulation

but: \(\ell_1\)-unit ball is projection of polytope with \(2n+1\) facets

\[ \left\{\begin{gathered} -y \le x \le y\\ \sum_i y_i \le 1 \\ x,y\in {\mathbb{R}}^n \end{gathered} \right\} \]

LP formulation of Max Cut

maximize \(f_G(x)=\sum_{ij\in E(G)}(x_i-x_j)^2\) over \({\{0,1\}}^n\)

Max Cut as linear optimization problem:
maximize \(\sum_{ij\in E(G)}(X_{ii}+X_{jj}-2X_{ij})\)
over \(X\in \mathrm{CUT}_n=\mathrm{convex hull}{\left\{ {x {x}^\intercal} \mid x \in {\{0,1\}}^n \right\}}\)

\(\mathrm{CUT}_n\) has exponential number of facets
\(\leadsto\) no small direct LP formulation

general size-\(S(n)\) LP formulation of Max Cut
polytope \(P\subseteq {\mathbb{R}}^{S(n)}\) defined by \(\le S(n)\) linear inequalities that projects to \(\mathrm{CUT}_n\)

sometimes called: extended LP formulation for \(\mathrm{CUT}_n\)

often exponential savings by projection: \(\ell_1\)-norm unit ball, Held-Karp TSP relaxation, LP/SDP hierarchies

question: can we prove size lower bounds
for LP formulations for Max Cut?

(implied by NP\(\neq\)P/poly)

lower bounds on general LP formulations for Max Cut

exact formulation requires size \(S(n)\ge 2^{\Omega(n)}\) [Fiorini-Massar-Pokutta-Tiwary-de Wolf’12]

approx. ratio \(\gt 1/2\) requires size \(S(n)\ge 2^{n^{\Omega(1)}}\) [Chan-Lee-Raghavendra-S.’13,Kothari-Meka-Raghavendra’16]

what about SDP relaxations?
(best known algorithms for Max Cut based on SDP (sum-of-squares) relaxations)

SDP formulations of Max Cut

general size-\(S(n)\) SDP formulation of Max Cut
spectrahedron \(P\subseteq {\mathbb{R}}^{S(n)\times S(n)}\) defined by intersecting affine subspace with psd cone that projects to \(\mathrm{CUT}_n\)

lower bounds for SDP formul’s for Max Cut [Lee-Raghavendra-S.’15]

exact formulation requires size \(S(n)\ge 2^{n^{\Omega(1)}}\)

approx. ratio \(\gt 16/17\) requires \(S(n)\ge n^{\omega(1)}\)
(matches best NP-hardness)

for \(S(n)=n^\ell\), approx ratio no better than for \(O(\ell)\)-deg SOS
\(\leadsto\) SOS is optimal poly-size SDP approx algorithm for Max Cut

proofs

recall: SOS certificates

certify nonnegativity of function by decompositing it into components that are obviously nonnegative

def’n: deg-\(\ell\) SOS certificate for \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) consists of \(g_1,\ldots,g_r{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) with \(\deg g_i\le \ell/2\) such that

\[f=g_1^2+\cdots+g_r^2\]

deg-2 SOS captures best known approximation alg’m for Max Cut

Goemans–Williamson approximation for Max Cut:
for every graph \(G\) and \(c\gt0\), if \(\max_{{\{0,1\}}^n} f_G\le c\), then \(c -0.878 f_G\) has deg-2 SOS certificate

general subspace certificates

let \(U\) be linear subspace of real-valued functions on \({\{0,1\}}^n\)
(instead of subspace of low-degree functions)

def’n \(U\)-SOS certificate for \(f{\colon}{\{0,1\}}^n\to{\mathbb{R}}\) consists of \(g_1,\ldots,g_r\in U\) such that \(f=g_1^2+\cdots+g_r^2\)

given \(f\) and explicit basis for \(U\), in time \((\dim U)^{O(1)}\) can find \(U\)-SOS certificate for \(f\) if it exists

connection to projections of spectrahedra:

lemma: cut polytope \(\mathrm{CUT}_n\) is projection of \(N\times N\)-dim. spectrahedron iff exists \(N^2\)-dim subspace \(U\) such that every nonegative deg-2 \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) has \(U\)-SOS certificate

efficient subspace certificates

do low-dim certificates imply low-deg certificates?

issue: every nonneg. \(f\) has \(1\)-dim SOS certif’te (take \(U\) as span of \(\sqrt f\))

solution: need to consider families of nonnegative functions
(e.g., all nonnegative quadratic functions)

def’n: family of nonnegative functions \({\mathcal F}\) has \(S(n)\)-dim. subspace certificates if for every \(n\in {\mathbb{N}}\), exists \(S(n)\)-dim. subspace \(U_n\) such that every \(f\in {\mathcal F}\cap {\mathbb{R}}^{{\{0,1\}}^n}\) has \(U\)-SOS certificate

general lower bound theorem

let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative quadratic function

\(\mathrm{extensions}(f)=\) all \(g{\colon}{\{0,1\}}^n \to {\mathbb{R}}\) such that \(g\) computes \(f\) on subset \(S\subseteq [n]\) of coordinates (\(g\) same as \(f\) up to renaming variables)

SOS optimality theorem: for every \(f\in {\{0,1\}}^m\to {\mathbb{R}}\) and \(\ell\in {\mathbb{N}}\):
if \(\mathrm{extensions}(f)\) have \(n^\ell\)-dimensional subspace certificates, then \(f\) has deg-\(10\ell\) SOS certificate

contrapositive allows us to translate lower bounds for SOS to general SDP formulations

e.g.: \((\sum_i x_i -n/2)^2-1/4\) has no deg-\(n/100\) SOS certificate
\(\leadsto\) \(\mathrm{CUT}_n\) requires superpolynomial SDP formulations

also preserve problem structure: if \(f\) is “Max Cut function”, then \(\mathrm{extensions}(f)\) are also “Max Cut functions”

also get bounds for superconstant \(\ell\) but

proofs

proof: (direction \(\Rightarrow\)) suppose \(\mathrm{CUT}_n\) is projection of spect’dron

\[\textstyle S={\left\{ A_0 + \sum_{i=1}^k z_i\cdot A_i \succeq 0 \mid z\in {\mathbb{R}}^k \right\}}\subseteq {\mathbb{R}}^{N\times N}\]

every vertex \({x {x}^\intercal}\) of \(\mathrm{CUT}_n\) is projection of some \(Q(x)\in S\)

for every quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\), exists \(Y_f\in {\mathbb{R}}^{N\times N}\) such that

\[{\mathop{\mathrm{Tr}}}Y_f Q(x)=f(x)\color{white}{\text{ for all }}x\in {\{0,1\}}^n\]

choose \(U\) as span of \(x\mapsto Q^{1/2}(x)_{ij}\)

to show: every nonneg quadratic \(f\) has \(U\)-SOS certificate

suppose quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) is nonnegative

\(\leadsto\) \(\min_{Q \in S}{\mathop{\mathrm{Tr}}}Y_f Q\ge 0\) (because \(S\) projects to \(\mathrm{CUT}\)_n)

\(\leadsto\) \(\exists P\succeq 0\) such that \({\mathop{\mathrm{Tr}}}Y_fQ={\mathop{\mathrm{Tr}}}PQ\) for all \(Q\in S\) (by SDP duality)

\(\leadsto\) \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)={\left\| P^{1/2} Q^{1/2}(x) \right\|}_F^2\) for all \(x\in {\{0,1\}}^n\)

\(\leadsto\) \(f\) has \(U\)-SOS certificate

\({\square}\)

positive matrix functions

matrix-valued function \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) is positive
if \(Q(x)\succeq 0\) for all \(x\in {\{0,1\}}^n\)

characterization of subspace certificates in terms of positive matrix functions

lemma for every \(N\)-dim subspace \(U\) of \({\mathbb{R}}^{{\{0,1\}}^n}\), exists positive \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) such that \(f\) has \(U\)-SOS certificate if and only if \(\exists P\succeq 0\) with \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)\)

converse also holds (up to quadratic loss in \(N\))

proof strategy

let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative

let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function
let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\) be such that \(f(x_S)={\mathop{\mathrm{Tr}}}P_S Q(x)\)

to show: \(f+{\varepsilon}\) has deg-\(10\ell\) SOS certificate

enough: every level-\(10\ell\) pseudo-distr’n \(\mu\) has \({\tilde{\mathbb{E}}}_\mu f\ge -{\varepsilon}\)

relative density \(\mu'= \mu/2^{-m}\) satisfies

\[ {\mathbb{E}}_\mu f = {\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) f(x_S) = \color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \]
where \(x\in {\{0,1\}}^n\) and \(S\subseteq[n]\) with \({\lvert S \rvert} = m\) are random

enough: \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\)

(\(f\) disappeared)

let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function
let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\)
let \(\mu\) be level-\(\ell\) pseuod-distr’n on \({\{0,1\}}^m\) and \(\mu'=\mu\cdot 2^{n}\)

enough: \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\quad (\ast)\)

claim: \((\ast)\) holds if \(x\mapsto Q^{1/2}(x)\) has degree at most \(\ell\)

proof: \(x\mapsto {\mathop{\mathrm{Tr}}}P_S Q(x)={\| P^{1/2}_SQ^{1/2}(x) \|}_F^2\) is deg-\(\ell\) SOS for all \(S\)
and \(\mu\) is level-\(\ell\) pseudo-distr’n

wishful thinking: approximate \(Q\) by \(\tilde Q\) in \((\ast)\) such that \(x\mapsto \tilde Q^{1/2}(x)\) has “small degree”?

theorem: \(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\)
\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\)

theme: approximator need not be more complex than test

learning simple approximators

theorem: \(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\),
\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\) \((*)\)

intuition:

  • start with “simplest possible” choice \(\tilde Q(x)={\mathrm{Id}}\)
  • as long as \((*)\) not satisfied update \(\tilde Q\) in “simplest possible” way toward improving \((*)\)

simplicity for \(\tilde Q\): version of entropy (from quantum information)

formalizing intuition: let \(F(x)={\mathbb{E}}_S \mu'(x_S) P_S\). then, \((*)\) is same as

\[ \color{red}{{\langle F,Q \rangle}}\ge\color{green}{ {\langle F,\tilde Q \rangle}} -{\varepsilon}/2\quad (*) \]

simplicity of test: \(F\) has degree \(O_\mu(1)\) (independent of \(n\)!)

new goal: find maximum entropy \(\tilde Q\) satisfying \((*)\)

closed form sol’n: \(\tilde Q(x)\propto e^{-\lambda F(x)/{\varepsilon}}\), where \(\lambda\)=entropy-defect(\(Q\))
(entropy defect \(O(\log n)\) because \(Q(x)\) only \(n^\ell\)-dimensional)

related: (matrix) multiplicative weights, exponential families, gradient / mirror descent

remember actual goal: \(\deg \tilde Q ^{1/2} \le O_{\mu,{\varepsilon}}(\log n)\)

conclusion

SOS algorithm can simulate general SDP formulation

  • can intepret general SDP formulations as quantum state with small entropy defect

  • learn simplest quantum state via matrix multiplicative weights (entropy maximization)

  • show that this quantum state admits approximation as low-degree sum-of-squares (matrix exponentials)

Opps, you cannot play draw N guess with this browser!