Boaz Barak and David Steurer
UCSD winter school on Sum-of-Squares, January 2017
unconditional computational lower bounds
general program goes back to Yannakakis’88 for refuting flawed P=NP proofs
proof strategy:
concrete theorem (“optimality of SOS”): For every approx ratio \(\alpha\) and constraint satisfaction problem \(\Pi\):
If any polynomial-size semidefinite relaxations has approx. ratio \(\alpha\) for \(\Pi\), then level-\(O(1)\) SOS has approx. ratio \(\alpha\)
connection to optimization / convex geometry
- explicit \(0/1\) polytopes in \({\mathbb{R}}^n\) that are not (close to) linear images of spectrahedra in \({\mathbb{R}}^{N\times N}\) for \(N\le \exp({n^{0.1}})\) (intersection of affine subspace and set of \(N\times N\) psd matrices)
- strong separation between rank and “positive-semidefinite rank” of matrices
Yannakakis’s model: [Yannakakis’88]
- formalizes intuitive notion of LP relaxations for problems
- restricted enough to allow for unconditional lower bounds (indep. of P vs NP) [Fiorini-Massar-Pokutta-Tiwary de Wolf’12, Braun-Pokutta-S., Braverman-Moitra, Chan-Lee-Raghavendra-S.,Rothvoß]
- extends to SDP relaxations [Fiorini-Massar-Pokutta-Tiwary-de Wolf, Gouveia-Parrilo-Thomas]
\(\ell_1\)-unit ball has \(2^n\) facets \(\leadsto\) no small direct LP formulation
but: \(\ell_1\)-unit ball is projection of polytope with \(2n+1\) facets
maximize \(f_G(x)=\sum_{ij\in E(G)}(x_i-x_j)^2\) over \({\{0,1\}}^n\)
Max Cut as linear optimization problem:
maximize \(\sum_{ij\in E(G)}(X_{ii}+X_{jj}-2X_{ij})\)
over \(X\in \mathrm{CUT}_n=\mathrm{convex hull}{\left\{ {x {x}^\intercal} \mid x \in {\{0,1\}}^n \right\}}\)
\(\mathrm{CUT}_n\) has exponential number of facets
\(\leadsto\) no small direct LP formulation
general size-\(S(n)\) LP formulation of Max Cut
polytope \(P\subseteq {\mathbb{R}}^{S(n)}\) defined by \(\le S(n)\) linear inequalities that projects to \(\mathrm{CUT}_n\)
sometimes called: extended LP formulation for \(\mathrm{CUT}_n\)
often exponential savings by projection: \(\ell_1\)-norm unit ball, Held-Karp TSP relaxation, LP/SDP hierarchies
question: can we prove size lower bounds
for LP formulations for Max Cut?
(implied by NP\(\neq\)P/poly)
lower bounds on general LP formulations for Max Cut
exact formulation requires size \(S(n)\ge 2^{\Omega(n)}\) [Fiorini-Massar-Pokutta-Tiwary-de Wolf’12]
approx. ratio \(\gt 1/2\) requires size \(S(n)\ge 2^{n^{\Omega(1)}}\) [Chan-Lee-Raghavendra-S.’13,Kothari-Meka-Raghavendra’16]
what about SDP relaxations?
(best known algorithms for Max Cut based on SDP (sum-of-squares) relaxations)
general size-\(S(n)\) SDP formulation of Max Cut
spectrahedron \(P\subseteq {\mathbb{R}}^{S(n)\times S(n)}\) defined by intersecting affine subspace with psd cone that projects to \(\mathrm{CUT}_n\)
lower bounds for SDP formul’s for Max Cut [Lee-Raghavendra-S.’15]
exact formulation requires size \(S(n)\ge 2^{n^{\Omega(1)}}\)
approx. ratio \(\gt 16/17\) requires \(S(n)\ge n^{\omega(1)}\)
(matches best NP-hardness)
for \(S(n)=n^\ell\), approx ratio no better than for \(O(\ell)\)-deg SOS
\(\leadsto\) SOS is optimal poly-size SDP approx algorithm for Max Cut
certify nonnegativity of function by decompositing it into components that are obviously nonnegative
def’n: deg-\(\ell\) SOS certificate for \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) consists of \(g_1,\ldots,g_r{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) with \(\deg g_i\le \ell/2\) such that
\[f=g_1^2+\cdots+g_r^2\]
deg-2 SOS captures best known approximation alg’m for Max Cut
Goemans–Williamson approximation for Max Cut:
for every graph \(G\) and \(c\gt0\), if \(\max_{{\{0,1\}}^n} f_G\le c\), then \(c -0.878 f_G\) has deg-2 SOS certificate
let \(U\) be linear subspace of real-valued functions on \({\{0,1\}}^n\)
(instead of subspace of low-degree functions)
def’n \(U\)-SOS certificate for \(f{\colon}{\{0,1\}}^n\to{\mathbb{R}}\) consists of \(g_1,\ldots,g_r\in U\) such that \(f=g_1^2+\cdots+g_r^2\)
given \(f\) and explicit basis for \(U\), in time \((\dim U)^{O(1)}\) can find \(U\)-SOS certificate for \(f\) if it exists
connection to projections of spectrahedra:
lemma: cut polytope \(\mathrm{CUT}_n\) is projection of \(N\times N\)-dim. spectrahedron iff exists \(N^2\)-dim subspace \(U\) such that every nonegative deg-2 \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) has \(U\)-SOS certificate
do low-dim certificates imply low-deg certificates?
issue: every nonneg. \(f\) has \(1\)-dim SOS certif’te (take \(U\) as span of \(\sqrt f\))
solution: need to consider families of nonnegative functions
(e.g., all nonnegative quadratic functions)
def’n: family of nonnegative functions \({\mathcal F}\) has \(S(n)\)-dim. subspace certificates if for every \(n\in {\mathbb{N}}\), exists \(S(n)\)-dim. subspace \(U_n\) such that every \(f\in {\mathcal F}\cap {\mathbb{R}}^{{\{0,1\}}^n}\) has \(U\)-SOS certificate
let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative quadratic function
\(\mathrm{extensions}(f)=\) all \(g{\colon}{\{0,1\}}^n \to {\mathbb{R}}\) such that \(g\) computes \(f\) on subset \(S\subseteq [n]\) of coordinates (\(g\) same as \(f\) up to renaming variables)
SOS optimality theorem: for every \(f\in {\{0,1\}}^m\to {\mathbb{R}}\) and \(\ell\in {\mathbb{N}}\):
if \(\mathrm{extensions}(f)\) have \(n^\ell\)-dimensional subspace certificates, then \(f\) has deg-\(10\ell\) SOS certificate
contrapositive allows us to translate lower bounds for SOS to general SDP formulations
e.g.: \((\sum_i x_i -n/2)^2-1/4\) has no deg-\(n/100\) SOS certificate
\(\leadsto\) \(\mathrm{CUT}_n\) requires superpolynomial SDP formulations
also preserve problem structure: if \(f\) is “Max Cut function”, then \(\mathrm{extensions}(f)\) are also “Max Cut functions”
also get bounds for superconstant \(\ell\) but
proof: (direction “\(\Rightarrow\)”) suppose \(\mathrm{CUT}_n\) is projection of spect’dron
every vertex \({x {x}^\intercal}\) of \(\mathrm{CUT}_n\) is projection of some \(Q(x)\in S\)
for every quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\), exists \(Y_f\in {\mathbb{R}}^{N\times N}\) such that
choose \(U\) as span of \(x\mapsto Q^{1/2}(x)_{ij}\)
to show: every nonneg quadratic \(f\) has \(U\)-SOS certificate
suppose quadratic \(f{\colon}{\{0,1\}}^n\to {\mathbb{R}}\) is nonnegative
\(\leadsto\) \(\min_{Q \in S}{\mathop{\mathrm{Tr}}}Y_f Q\ge 0\) (because \(S\) projects to \(\mathrm{CUT}\)_n)
\(\leadsto\) \(\exists P\succeq 0\) such that \({\mathop{\mathrm{Tr}}}Y_fQ={\mathop{\mathrm{Tr}}}PQ\) for all \(Q\in S\) (by SDP duality)
\(\leadsto\) \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)={\left\| P^{1/2} Q^{1/2}(x) \right\|}_F^2\) for all \(x\in {\{0,1\}}^n\)
\(\leadsto\) \(f\) has \(U\)-SOS certificate
\({\square}\)
matrix-valued function \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) is positive
if \(Q(x)\succeq 0\) for all \(x\in {\{0,1\}}^n\)
characterization of subspace certificates in terms of positive matrix functions
lemma for every \(N\)-dim subspace \(U\) of \({\mathbb{R}}^{{\{0,1\}}^n}\), exists positive \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{N\times N}\) such that \(f\) has \(U\)-SOS certificate if and only if \(\exists P\succeq 0\) with \(f(x)={\mathop{\mathrm{Tr}}}P Q(x)\)
converse also holds (up to quadratic loss in \(N\))
let \(f{\colon}{\{0,1\}}^m\to {\mathbb{R}}\) be nonnegative
let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function
let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\) be such that \(f(x_S)={\mathop{\mathrm{Tr}}}P_S Q(x)\)
to show: \(f+{\varepsilon}\) has deg-\(10\ell\) SOS certificate
enough: every level-\(10\ell\) pseudo-distr’n \(\mu\) has \({\tilde{\mathbb{E}}}_\mu f\ge -{\varepsilon}\)
relative density \(\mu'= \mu/2^{-m}\) satisfies
enough: \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\)
(\(f\) disappeared)
let \(Q{\colon}{\{0,1\}}^n\to {\mathbb{R}}^{n^\ell\times n^\ell}\) be positive matrix function
let \({\{ P_S\succeq 0\mid S\subseteq [n], {\lvert S \rvert}=m \}}\)
let \(\mu\) be level-\(\ell\) pseuod-distr’n on \({\{0,1\}}^m\) and \(\mu'=\mu\cdot 2^{n}\)
enough: \(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)}\ge -{\varepsilon}\quad (\ast)\)
claim: \((\ast)\) holds if \(x\mapsto Q^{1/2}(x)\) has degree at most \(\ell\)
proof: \(x\mapsto {\mathop{\mathrm{Tr}}}P_S Q(x)={\| P^{1/2}_SQ^{1/2}(x) \|}_F^2\) is deg-\(\ell\) SOS for all \(S\)
and \(\mu\) is level-\(\ell\) pseudo-distr’n
wishful thinking: approximate \(Q\) by \(\tilde Q\) in \((\ast)\) such that \(x\mapsto \tilde Q^{1/2}(x)\) has “small degree”?
theorem: \(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\)
\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\)
theme: approximator need not be more complex than test
theorem: \(\exists\) \(\tilde Q.\) \({\mathbb{E}}{\mathop{\mathrm{Tr}}}\tilde Q = {\mathbb{E}}{\mathop{\mathrm{Tr}}}Q\), \(\deg \tilde Q^{1/2}\le O_{\mu,{\varepsilon}}(\log n)\),
\(\color{red}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S Q(x)} \ge \color{green}{{\mathbb{E}}_S {\mathbb{E}}_x \mu'(x_S) \cdot {\mathop{\mathrm{Tr}}}P_S \tilde Q(x)} -{\varepsilon}/2\) \((*)\)
intuition:
simplicity for \(\tilde Q\): version of entropy (from quantum information)
formalizing intuition: let \(F(x)={\mathbb{E}}_S \mu'(x_S) P_S\). then, \((*)\) is same as
simplicity of test: \(F\) has degree \(O_\mu(1)\) (independent of \(n\)!)
new goal: find maximum entropy \(\tilde Q\) satisfying \((*)\)
closed form sol’n: \(\tilde Q(x)\propto e^{-\lambda F(x)/{\varepsilon}}\), where \(\lambda\)=entropy-defect(\(Q\))
(entropy defect \(O(\log n)\) because \(Q(x)\) only \(n^\ell\)-dimensional)
related: (matrix) multiplicative weights, exponential families, gradient / mirror descent
remember actual goal: \(\deg \tilde Q ^{1/2} \le O_{\mu,{\varepsilon}}(\log n)\)
SOS algorithm can simulate general SDP formulation
can intepret general SDP formulations as quantum state with small entropy defect
learn simplest quantum state via matrix multiplicative weights (entropy maximization)
show that this quantum state admits approximation as low-degree sum-of-squares (matrix exponentials)