# Lower bound for planted clique problem

UCSD winter school on Sum-of-Squares, January 2017

### Computational Bayesian probabilities.

Last time: Grigoriev pdist for 3XOR: $${\tilde{\mathbb{E}}}\chi_S \in \{ \pm 1 , 1/2 \}$$.

Is there zero/one/half law for computational probabilities?
(Either prove, refute or random?)

$${\mathbb{P}}[ \color{red}{ P \neq NP} ] = 1/2$$?

$${\mathbb{P}}[ \color{red}{P\neq NP \;|\; 3SAT \not\in Dtime(n^2)} ] = 1/2$$?

Today: Natural example of pdist where $${\tilde{\mathbb{E}}}\chi_S \not\in \{ \pm 1, 1/2 \}$$.

As degree $$d$$ grows, expectations tend to $$\pm 1$$.

### Planted clique problem

Distinguish between
* $$G \sim G(n,1/2)$$: each edge w.p. $$1/2$$
* $$G' \sim G(n,1/2,k)$$: above + planted random $$k$$-clique

$${\mathbb{E}}|E_{G'}| = {\mathbb{E}}|E_G| + \Theta(k^2)$$

Cor: If $$k^2 \gg \sqrt{n^2}$$ can distinguish w.h.p.

Can we do better?

If guess $$d$$ vertices, can look at $$n/2^d$$ sized common neighborhood

Best known tradeoff: $$k=O(\sqrt{n/2^d})$$ in $$n^{O(d)}$$ time.

### SOS and planted clique

Distinguish $$G\sim G(n,1/2)$$ from $$G'\sim G(n,1/2,k)$$.

Thm: (B-Hopkins-Kelner-Kothari-Moitra-Potechin) If $$d=O(1)$$ and $$k \lt n^{1/2-\epsilon}$$, w.h.p. over $$G$$, $$\exists$$ $$d$$-pdist $$\mu$$ over $${\{0,1\}}^n$$ s.t. $${\tilde{\mathbb{E}}}_\mu |x| =k$$ but $${\tilde{\mathbb{E}}}_\mu x_ix_j = 0$$ for all $$(i,j)\not\in E_G$$.

Improving on (Meka-Potechin-Wigderson’14),(Deshpande-Montanari’15),(Hopkins-Kothari-Potechin-Raghavendra-Schramm’16)

Proof: If problem hard, same pdist for $$G$$ and $$G'$$.

Need to come up with bets for $${\mathbb{E}}_{G'} x_A = {\mathbb{P}}_{G'}[ A \subseteq CLIQUE ]$$

1st attempt: Naive pdist
$${\tilde{\mathbb{E}}}x_A = 0$$ if $$A$$ not a clique.
$${\tilde{\mathbb{E}}}x_A = c\left(\tfrac{k}{n}\right)^{|A|}$$ otherwise.

$$c=c(|A|)$$ chosen s.t. $${\tilde{\mathbb{E}}}(\sum x_i)^t = k^t$$ for $$t=1..d$$

### Naive pdist is “Bayesianially incorrect”

Naive pdist: $${\tilde{\mathbb{E}}}x_A = 0$$ if $$A$$ not clique, $${\tilde{\mathbb{E}}}x_A = c\left(\tfrac{k}{n}\right)^{|A|}$$ o/w

Suppose $$k = \epsilon\sqrt{n}$$ and $$deg(1)=\tfrac{n}{2}+t\sqrt{n}$$ for $$t\gg 1$$.

Naive bet: $${\mathbb{P}}[ 1 \in CLIQUE ] = {\tilde{\mathbb{E}}}x_1 = \tfrac{k}{n}$$.

Bayes condition: $${\mathbb{P}}[ 1 \in CLIQUE | deg(1) = \tfrac{n}{2} + t\sqrt{n}] \sim \tfrac{k}{n}e^{\epsilon t}$$

$$x_1=0 \color{red}{\Rightarrow} deg(1) \sim N(n/2,n/2)$$
$$x_1=1 \color{red}{\Rightarrow} deg(1) \sim N(n/2 + \epsilon\sqrt{k},n/2)$$

$${\mathbb{P}}[ deg(1) = \tfrac{n}{2}+t\sqrt{n} | x_1 = 0] \sim \exp(-t^2)$$
$${\mathbb{P}}[ deg(1) = \tfrac{n}{2}+t\sqrt{n} | x_1 = 0] \sim \exp(-(t-\epsilon)^2)$$

ratio is $$\exp(2\epsilon t)$$!

### Is Bayesian incorrect $$=$$ non SOS?

NO!

Lemma: If $$k\lt\sqrt{n}/10$$ then w.h.p. over $$G\sim G(n,1/2)$$ Naive pdist is valid degree $$2$$ pdist.

Proof: For naive $${\tilde{\mathbb{E}}}x_i^2 = {\tilde{\mathbb{E}}}x_i = \tfrac{n}{k}$$ for all $$i$$, $${\tilde{\mathbb{E}}}x_ix_j$$ eqs $$0$$ if $$(i,j) \not\in E_G$$ and $$2\tfrac{k^2}{n^2}$$ o/w, and so:

$\left( {\tilde{\mathbb{E}}}x_i x_j \right)_{i,j} = \tfrac{k}{n}I + 2\tfrac{k^2}{n^2}A = \tfrac{k}{n}(I + \tfrac{k}{n}A)$

where $$A=A(G)$$ is adjacency matrix.

Theorem: w.h.p. $$\| A_G \|_{spectral} \leq 2\sqrt{n}$$.

### Deg 4 helps

In contrast:

Lemma: If $$k \gg n^{1/3}$$ then w.h.p. over $$G\sim G(n,1/2)$$ Naive pdist is not valid degree $$4$$ pdist.

### Naive deg 4 is not pdist

Proof: Let $$r(x)= \sum R_{1,i}x_i$$ where $$R_{i,i}=0$$ and $$R_{i,j}$$ eqs $$+1$$ if $$i\sim j$$ and $$-1$$ otherwise, and let $$p(x)=(x_1-\alpha r(x)^2)$$.

We will show $${\mathbb{E}}_G {\tilde{\mathbb{E}}}p^2 \lt 0$$.

${\tilde{\mathbb{E}}}p(x)^2 = \color{purple}{{\tilde{\mathbb{E}}}x_1^2} - 2\alpha \color{red}{{\tilde{\mathbb{E}}}x_1 r(x)^2} + \alpha^2\color{green}{{\tilde{\mathbb{E}}}r(x)^4}$

Claim 0: $$\color{purple}{{\tilde{\mathbb{E}}}_{\mu(G)} x_1^1} = \tfrac{k}{n}$$

Claim 1: $$\color{red}{{\tilde{\mathbb{E}}}x_1 r(x)^2} \gtrsim \tfrac{k^3}{n}$$

Proof: $${\tilde{\mathbb{E}}}x_1 r(x)^2 = \sum_{i,j} R_{1,i}R_{1,j} {\tilde{\mathbb{E}}}x_1x_ix_j \sim n^2\tfrac{k^3}{n^3} = \tfrac{k^3}{n}$$

Since $${\tilde{\mathbb{E}}}x_1x_ix_j \neq 0 \;\Rightarrow\; R_{1,i}=R_{1,j}=1$$

### Naive deg 4 is not pdist

Proof: Let $$r(x)= \sum R_{1,i}x_i$$ where $$R_{i,i}=0$$ and $$R_{i,j}$$ eqs $$+1$$ if $$i\sim j$$ and $$-1$$ otherwise, and let $$p(x)=(x_1-\alpha r(x)^2)$$.

${\tilde{\mathbb{E}}}p(x)^2 = \color{purple}{{\tilde{\mathbb{E}}}x_1^2} - 2\alpha \color{red}{{\tilde{\mathbb{E}}}x_1 r(x)^2} + \alpha^2\color{green}{{\tilde{\mathbb{E}}}r(x)^4}$

Claim 0: $$\color{purple}{{\tilde{\mathbb{E}}}_{\mu(G)} x_1^1} = \tfrac{k}{n}$$

Claim 1: $$\color{red}{{\tilde{\mathbb{E}}}x_1 r(x)^2} \gtrsim \tfrac{k^3}{n}$$

Claim 2: $${\mathbb{E}}_G \color{green}{{\tilde{\mathbb{E}}}_{\mu (G)} r(x)^4} \lesssim k^2$$

Proof: $${\mathbb{E}}_G {\tilde{\mathbb{E}}}_{\mu(G)} r(x)^4 = {\mathbb{E}}_G \sum_{i_1,i_2,i_3,i_4\gt1} R_{1,i_1}R_{1,i_2} R_{1,i_3} R_{1,i_4} {\tilde{\mathbb{E}}}_{\mu(G)} x_{i_1}x_{i_2}x_{i_3}x_{i_4}$$

Key observation: Event $$\{i_1,i_2,i_3,i_4\}$$ is clique is independent of $$R_{1,i_1},\ldots,R_{1,i_4}$$.

$$\Rightarrow {\mathbb{E}}_G {\tilde{\mathbb{E}}}_{mu(G)} r(x)^4 \sim \sum_{i,j} {\tilde{\mathbb{E}}}_{mu(G)} x_i^2x_j^2 = \sim n^2\tfrac{k^2}{n^2} = k^2 \; ∎$$

### Putting it all together

Proof: Let $$r(x)= \sum R_{1,i}x_i$$ where $$R_{i,i}=0$$ and $$R_{i,j}$$ eqs $$+1$$ if $$i\sim j$$ and $$-1$$ otherwise, and let $$p(x)=(x_1-\alpha r(x)^2)$$.

${\tilde{\mathbb{E}}}p(x)^2 = \color{purple}{{\tilde{\mathbb{E}}}x_1^2} - 2\alpha \color{red}{{\tilde{\mathbb{E}}}x_1 r(x)^2} + \alpha^2\color{green}{{\tilde{\mathbb{E}}}r(x)^4}$

Claim 0: $$\color{purple}{{\tilde{\mathbb{E}}}_{\mu(G)} x_1^1} = \tfrac{k}{n}$$

Claim 1: $$\color{red}{{\tilde{\mathbb{E}}}x_1 r(x)^2} \gtrsim \tfrac{k^3}{n}$$

Claim 2: $${\mathbb{E}}_G \color{green}{{\tilde{\mathbb{E}}}_{\mu (G)} r(x)^4} \lesssim k^2$$

To ensure $${\mathbb{E}}_G {\tilde{\mathbb{E}}}_{\mu(G)} p^2 \lt0$$, want $$\alpha \color{red}{\tfrac{k^3}{n}} \gg \color{purple}{\tfrac{k}{n}} + \alpha^2\color{green}{k^2}$$

Set $$\alpha = 10/k^2$$ and then need $$\tfrac{10}{k^2}\color{red}{\tfrac{k^3}{n}} \gg \tfrac{10^2}{k^4}\color{green}{k^2}$$. . . .

Works whenever $$k^{1/3} \gg n$$.

### Recap

Naive pdist is Bayesianally incorrect and not valid for deg $$\geq 4$$.

Problem: Only “Bayeisanally correct” dist is actual clique.

Our approach: Match only “simple” correlations.

Construct pdist $$\mu$$ such that if $$F:{\mathbb{R}}^{n^2}\rightarrow {\mathbb{R}}$$ is low degree in the graph and $$p:{\mathbb{R}}^n\rightarrow{\mathbb{R}}$$ is low degree in the clique then:

${\mathbb{E}}_{(G',x)\sim G(n,1/2,k)} F(G)p(x) = {\mathbb{E}}_G F(G) {\tilde{\mathbb{E}}}_{\mu (G)} p(x) \;\;(*)$

Example: $$F(G) = deg_G(1)$$, $$p(x)=x_1$$.

Call this condition pseudo calibration

### Pseudo calibrated pdist

$$\mu$$ needs to satisfy:

${\mathbb{E}}_{(G',x)\sim G(n,1/2,k)} F(G)p(x) = {\mathbb{E}}_G F(G) {\tilde{\mathbb{E}}}_{\mu (G)} p(x) \;\;(*)$

for every edge-poly $$F$$ of degree $$\leq \tau$$ and vertex-poly $$p$$ of deg $$\leq d$$.

For every $$F,p$$ we get a linear constraint on the map $$\mu:{\mathbb{R}}^{n^2+n^d}\rightarrow R$$ mapping $$G,p$$ to $${\tilde{\mathbb{E}}}_{\mu(G)} p$$.

If we restrict $$\mu$$ to degree at most $$\tau$$, then have $$n^{\tau +d}$$ constraints on $$n^{\tau + d}$$ degrees of freedom.

$$\Rightarrow$$ pseudo-distribution is fully determined!

### Pseudo calibrated pdist

We use unique $$\mu$$ s.t., map $$G \mapsto \mu(G)$$ has edge-degree $$\leq \tau$$ and

${\mathbb{E}}_{(G',x)\sim G(n,1/2,k)} F(G)p(x) = {\mathbb{E}}_G F(G) {\tilde{\mathbb{E}}}_{\mu (G)} p(x) \;\;(*)$

for every edge-poly $$F$$ of degree $$\leq \tau$$ and vertex-poly $$p$$ of deg $$\leq d$$.

Why restrict degree of $$\mu$$?
1. So it is efficiently computable.
2. Low degree map = higher entropy. (max entropy principle)
3. Because it works.

Hard part: Show that $$\mu$$ is a valid $$d$$-pdist.

### Proving psd’ness

Compute $$\mu$$ based on pseudo-calibration

Need: $$M={\tilde{\mathbb{E}}}_\mu x^{\otimes d}$$ is p.s.d

Naive approach: Show that $${\mathbb{E}}M \succeq \lambda I$$, w.h.p $${\| M \|}_{spectral} \preceq \lambda$$.

Not true.

Our approach: “Symbolic diagonalization” of $$M$$.

### Summary

SOS can’t do much better than $$\sqrt{n}$$ for planted clique.

New “creativity free” approach to designing pseudo-distributions.

Open question: “Creativity free” approach for analyzing pseudo-distributions? Universal rounding algorithm?

Opps, you cannot play draw N guess with this browser!