A Robust Version of Heged\H{u}s's Lemma, with Applications

Heged\H{u}s's lemma is the following combinatorial statement regarding polynomials over finite fields. Over a field $\mathbb{F}$ of characteristic $p>0$ and for $q$ a power of $p$, the lemma says that any multilinear polynomial $P\in \mathbb{F}[x_1,\ldots,x_n]$ of degree less than $q$ that vanishes at all points in $\{0,1\}^n$ of some fixed Hamming weight $k\in [q,n-q]$ must also vanish at all points in $\{0,1\}^n$ of weight $k + q$. This lemma was used by Heged\H{u}s (2009) to give a solution to \emph{Galvin's problem}, an extremal problem about set systems; by Alon, Kumar and Volk (2018) to improve the best-known multilinear circuit lower bounds; and by Hrube\v{s}, Ramamoorthy, Rao and Yehudayoff (2019) to prove optimal lower bounds against depth-$2$ threshold circuits for computing some symmetric functions. In this paper, we formulate a robust version of Heged\H{u}s's lemma. Informally, this version says that if a polynomial of degree $o(q)$ vanishes at most points of weight $k$, then it vanishes at many points of weight $k+q$. We prove this lemma and give three different applications.

The engine that drives the proofs of many of these results is our understanding of combinatorial and algebraic properties of polynomials. In this paper, we investigate another such naturally stated property of polynomials de ned over the Boolean cube {0, 1} and strengthen known results in this direction. We then apply this result to sharpen known results in theoretical computer science and combinatorics.
The question we address is related to how well low-degree polynomials can 'distinguish' However, if the eld F has positive characteristic and more speci cally if − is divisible by , then this simple polynomial no longer works and the answer is not so clear.
In this setting, a classical theorem of Lucas tells us that if is the largest power of dividing − , then there is a polynomial of degree that distinguishes between {0, 1} and {0, 1} . A very interesting lemma of Hegedűs [23] shows that this is tight even if we only require to be non-zero at some point of {0, 1} . More precisely, Hegedűs's lemma shows the following. , , such that ∈ [ , − ], and a power of . If ∈ F[ 1 , . . . , ] is any polynomial that vanishes at all ∈ {0, 1} but does not vanish at some ∈ {0, 1} + , then deg( ) ≥ . 1 The lemma is usually stated [23,5,25] for a more restricted choice of parameters. However, the known proofs extend to yield the stronger statement given here. A proof of a more general statement can be found in [44,Theorem 1.5].
This lemma was rst proved in [23] using Gröbner basis techniques. An elementary proof of this was recently given by the author and independently by Alon (see [25]) using the Combinatorial Nullstellensatz.
Hegedűs's lemma has been used to resolve various questions in both combinatorics and theoretical computer science.
Hegedűs used this lemma to give an alternate solution to a problem of Galvin, which is stated as follows. Given a positive integer divisible by 4, what is the smallest size = ( ) of a family F of ( /2)-sized subsets of [ ] such that for any ⊆ [ ] of size /2, there is a ∈ F with | ∩ | = /4? It is easy to see that ( ) ≤ /2 for any . A matching lower bound was given by Enomoto, Frankl, Ito and Nomora [19] in the case that := ( /4) is odd. Hegedűs used the above lemma to give an alternate proof of a lower bound of in the case that is an odd prime. His proof was subsequently strengthened to a linear lower bound for all by Alon et al. [5] and more recently to a near-tight lower bound of ( /2) − ( ) for all by Hrubeš et al. [25]. Both these results used the lemma above.
Alon et al. [5] also used Hegedűs's lemma to prove bounds for generalizations of Galvin's problem. Using this, they were able to prove improved lower bounds against syntatically multilinear algebraic circuits. These are algebraic circuits that compute multilinear polynomials in a "transparently multilinear" way (see e.g. [40] for more). Alon et al. used Hegedűs's lemma to prove near-quadratic lower bounds against syntactically multilinear algebraic circuits computing certain explicitly de ned multilinear polynomials, improving on an earlierΩ( 4/3 ) lower bound of Raz, Shpilka and Yehudayo [37].
Hrubeš et al. [25] also used Hegedűs's lemma to answer the following question of Ku-

Main Result.
Our main result in this paper is a 'robust' strengthening of Hegedűs's lemma.
Proving 'robust' or 'stability' versions of known results is standard research direction in combinatorics. Such questions are usually drawn from the following template. Given the fact that objects that satisfy a certain property have some xed structure, we ask if a similar structure is shared by objects that 'almost' or 'somewhat' satisfy the property.
In our setting, we ask if we can recover the degree lower bound in Hegedűs's lemma even if we have a polynomial that 'approximately' distinguishes between {0, 1} and {0, 1} + : this means that the polynomial vanishes at 'most' points of weight but is non-zero at 'many' 2 The Majority function is the Boolean function which accepts exactly those inputs that have more 1s than 0s.
points of weight + . Our main lemma is that under suitable de nitions of 'most' and 'many', we can recover (up to constant factors) the same degree lower bound as in Lemma 1.1 above. 1. To keep the exposition informal, we have not speci ed exactly what is in the above lemma. However, we note below that the chosen is nearly the best possible in the sense that if is appreciably increased, then there is a sampling-based construction of a polynomial of degree ( ) satisfying the hypothesis of the above lemma (see Section 3.3).

L E M M A 1 . 2 (Main Result (Informal)). Assume that F is a eld of characteristic . Let be
2. The reader might wonder why the lemma above is a strengthening of Hegedűs's lemma, given that we require the polynomial to be non-zero at many points of weight + , which is a seemingly stronger condition than required in Lemma 1.1. However, this is in fact a weaker condition. This is because of the following simple algebraic fact: if there is a polynomial of degree at most satisfying the hypothesis of Lemma 1.1 (i.e. vanishing at all points of weight but not at some point of weight + ), then there is also a polynomial of degree at most that vanishes at all points of weight but does not vanish at a signi cant fraction (at least a (1 − 1/ ) fraction) of points of weight + . We give a short proof of this in Appendix A. Hence, the above lemma is indeed a generalization of Lemma 1.1 (up to the constant-factor losses in the degree lower bound).
Applications. Our investigations into robust versions of Hegedűs's lemma were motivated by questions in computational complexity theory. Using our main result, we are able to sharpen and strengthen known results in complexity as well as combinatorics.
1. Degree bounds for the Coin Problem: For a parameter ∈ [0, 1/2], we de ne the -coin problem as follows. We are given independent tosses of a coin, which is promised to either be of bias 1/2 (i.e. unbiased) or (1/2) − , and we are required to guess which of these is the case with a high degree of accuracy, say with error probability at most . (See De nition 4.1 for the formal de nition.) The coin problem has been studied in a variety of settings in complexity theory (see, e.g. [3,46,47,39,12,15]) and for various reasons such as understanding the power of randomness in bounded-depth circuits, the limitations of blackbox hardness ampli cation, and devising pseudorandom generators for bounded-width branching programs. More recently, Limaye et al. [31] proved optimal lower bounds on the size of AC 0 [⊕] 3 circuits solving the -coin problem with constant error, strengthening an earlier lower bound of Shaltiel and Viola [39]. This led to the rst class of explicit functions for which we have tight (up to polynomial factors) AC 0 [⊕] lower bounds. These bounds were in turn used by Golovnev, Ilango, Impagliazzo, Kabanets, Kolokolova and Tal [20] to resolve a long-standing open problem regarding the complexity of MCSP in the AC 0 [⊕] model, and by Potukuchi [36] to prove lower bounds for Andreev's problem.
A key result in the lower bound of Limaye et al. [31] was a tight lower bound on the degree of any polynomial ∈ F[ 1 , . . . , ] that solves the -coin problem with constant error: they showed that any such polynomial must have degree at least Ω(1/ ). As noted by Agrawal [2], this is essentially equivalent to a recent result of Chattopadhyay, Hatami, Lovett and Tal [13] on the level-1 Fourier coe cients of low-degree polynomials over nite elds, which in turn is connected to an intriguing new approach [13] toward constructing pseudorandom generators secure against AC 0 [⊕].
Using the robust Hegedűs lemma, we are able to strengthen the degree lower bound of [31] to a tight degree lower bound for all errors. Speci cally, we show that over any eld F of xed positive characteristic , any polynomial that solves the -coin problem with error must have degree Ω( 1 log(1/ )), which is tight for all and .
2. Probabilistic degrees of symmetric functions: In a landmark paper [38], Razborov showed how to use polynomial approximations to prove lower bounds against AC 0 [⊕]. The notion of polynomial approximation introduced (implicitly) in his result goes by the name of probabilistic polynomials, and is de ned as follows. An -error probabilistic polynomial of degree for a Boolean function : {0, 1} → {0, 1} is a random polynomial of degree at most that agrees with at each point with probability at least 1 − . The -error probabilistic degree of is the least for which this holds. (Roughly speaking, a low-degree probabilistic polynomial for is an e cient randomized algorithm for , where we think of polynomials as algorithms and degree as a measure of e ciency.) Many applications of polynomial approximation in complexity theory [8] and algorithm design [50] use probabilistic polynomials and speci cally bounds on the probabilistic degrees of various symmetric Boolean functions. 4 Motivated by this, in a recent result with Tripathi and Venkitesh [43], we gave a near-tight characterization on the probabilistic degree of every symmetric Boolean function. Unfortunately, however, our upper and lower bounds were separated by logarithmic factors. This can be crucial: in certain algorithmic applications (see, e.g., [4, Footnote, Page 138]), the appearance or non-appearance of an additional logarithmic factor in the degree can be the di erence between (say) a truly subquadratic running time of 2− and a running time of 2− (1) , which might be less interesting. 4 Recall that a Boolean function : {0, 1} → {0, 1} is said to be symmetric if its output depends only on the Hamming weight of its input.
In the case of characteristic 0 (or growing with ), such gaps look hard to close since we don't even understand completely the probabilistic degree of simple functions like the OR function [34,22,10]. However, in positive ( xed) characteristic, there are no obvious barrriers. Yet, even in this case, the probabilistic degree of very simple symmetric Boolean functions like the Exact Threshold functions (functions that accept inputs of exactly one Hamming weight) remained unresolved until this paper.
In this paper, we resolve this question and more. We are able to give a tight (up to constants) lower bound (matching the upper bounds in [43]) on the probabilistic degree of every symmetric function over elds of positive ( xed) characteristic.
3. Robust version of Galvin's problem: Given that Hegedűs's lemma was used to solve Galvin's problem, it is only natural that we consider the question of using the robust version to solve a robust version of Galvin's problem. More precisely, we consider the minimum size = ( , ) to be the minimum size of a family F of ( /2)-sized subsets of [ ] such that for all but an -fraction of sets of size /2, there is a set ∈ F such that | ∩ | = /4. Proof Outline. We observe that the main lemma (Lemma 1.2) is quite similar to classical polynomial approximation results of Razborov [38] and Smolensky [41,42] (see also [45]). The main di erence is that while these results hold for polynomials approximating some function on the whole cube {0, 1} , the lemma deals with polynomial approximations that are more 'local' in that they are restricted on just two layers of the cube. Nevertheless, we can show that the basic proof strategy of Smolensky (or more speci cally a variant as in [6,29]) can be used to prove our lemma as well.
The main point of di erence from these standard proofs is the employment of a result from discrete geometry due to Nie and Wang [35], that allows us to bound the size of the closure 5 of a small set of points in the cube. This is a well-studied object in coding theory [48] and combinatorics [14,26,35], and turns out to be a crucial ingredient in our proof.
For the application to the coin problem, we show that if a polynomial solves the coin problem (see De nition 4.1 for the formal de nition of this), then it can be used to distinguish 5 The degree-closure cl ( ) of a set is the set of points where any degree-polynomial vanishing throughout is forced to vanish.
between Hamming weights and + for and as in Lemma 1.2. This reduction is done by a simple sampling argument. The degree lower bound in Lemma 1.2 then implies the desired degree lower bound on the degree of .
In the other applications to probabilistic degree and the robust version of Galvin's problem, the idea is to follow the proofs of the previous best results in this direction and apply the main lemma at suitable points. We defer more details to the actual proofs.

Preliminaries
We use the notation [ , ] to denote an interval in R as well as an interval in Z. The distinction will be clear from context. (1 − ) + 2 /3 .

Symmetric Boolean functions
Let be a growing integer parameter which will always be the number of input variables. We use B to denote the set of all symmetric Boolean functions on variables. Note that each symmetric  Standard decomposition of a symmetric Boolean function [33]. inputs such that | | ≡ (mod ). In the special case that = 0, we also use MOD . We de ne the -error probabilistic degree of , denoted pdeg F ( ), to be the least such that has an -error probabilistic polynomial of degree at most .

Probabilistic polynomials
When the eld F is clear from context, we use pdeg ( ) instead of pdeg F ( ).

(Composition) For any Boolean function on variables and any Boolean functions
The rst item above is not entirely obvious, as the polynomial is not necessarily Boolean-valued at points when ( ) ≠ ( ). Hence, it is not clear that composing with a polynomial that computes the Boolean Majority function achieves error-reduction. The second and third items above are trivial.
Building on work of Alman and Williams [4] and Lu [33], Tripathi, Venkitesh and the author [43] gave upper bounds on the probabilistic degree of any symmetric function. We recall below the statement in the case of xed positive characteristic.

T H E O R E M 2 . 5 (Known upper bounds on probabilistic degree of symmetric functions [43]).
Let F be a eld of constant characteristic > 0 and ∈ N be a growing parameter. Let ∈ B be arbitrary and let ( , ℎ) be a standard decomposition of . Then we have the following for any > 0.
If per( ) is a power of , then can be exactly represented 6 as a polynomial of degree at most per( ), and hence pdeg F ( ) ≤ per( ),

A string lemma
Given a function : → {0, 1} where ⊆ N is an interval, we think of as a string from the set {0, 1} | | in the natural way. For an interval ⊆ , we denote by | the substring of obtained by restriction to .
The following simple lemma can be found, e.g. as a special case of [9, Theorem 3.1]. For completeness, we give a short proof in Appendix B.
6 While this is not part of the formal theorem statement from [43], it follows readily from the proof.
Then there exists a string ∈ {0, 1} + such that is a power of (i.e. = for some ≥ 2).
Then = and the assumption = implies = . By Lemma 2.6, there exists a string such that = for ≥ 2 and therefore per( ) < . This contradicts our assumption on .

T H E O R E M 2 . 8 (Lucas's theorem). Let , be any non-negative integers and any prime. Then
The following is a standard application of Lucas's theorem, essentially observed by Lu [33] and Hegedűs [23], showing that Hegedűs's lemma is tight.  Recall that, for any alphabet Σ, the notation Σ + denotes the set of non-empty strings over this alphabet.

The Main Lemma
In this section, we prove the main lemma, which is a robust version of Lemma 1.1.

L E M M A 3 .1 (A Robust Version of Hegedűs's Lemma).
Assume that F is a eld of characteristic . Let be a growing parameter and assume we have positive integer parameters , such that 100 < < − 100 and is a power of .
One can ask if the above lemma can be proved under weaker assumptions: speci cally, if the upper bound in (1a) can be relaxed. It turns out that it cannot (up to changing the constant in the exponent) because for larger error parameters, there is a sampling-based construction of a polynomial with smaller degree that is zero on most of {0, 1} and non-zero on most of {0, 1} . We discuss this construction in Section 3.3.
We rst prove a special case of the lemma which corresponds to the case when = + = /2 and su ciently larger than √ . This case su ces for most of our applications. The general case is a straightforward reduction to this special case.

R E M A R K 3 . 3.
By negating inputs (i.e. replacing with 1 − for each ), the above lemma also implies the analogous statements where /2 − and /2 are replaced by /2 + and /2 respectively.
Before we prove this lemma, we need to collect some technical facts and lemmas.
The following is standard. See, e.g., [ which implies the right inequality in the statement of the claim. We have used the inequality 1 − ≤ to deduce the nal inequality above.
For the left inequality, we similarly have where the nal inequality follows from the fact that Given a set ⊆ {0, 1} , and a parameter ≤ , we de ne I ( ) to be the set of all multilinear polynomials of degree at most that vanish at all points of . Further, we de ne the degree-closure of , denoted cl ( ) as follows.
Note that cl ( ) ⊇ but could be much bigger than . The following result of Nie and Wang [35] gives a bound on |cl ( )| in terms of | |. (This particular form is noted and essentially proved in [35], and is explicitly stated and proved in [29, The inequality stated in the lemma is tight for certain sets of size (a good example of such a set is any Hamming ball of radius ). However, when | | is much smaller than , the parameters can be tightened. A tight form of this lemma, that gives the best possible parameters depending on | |, was proved in earlier work of Keevash and Sudakov [26] (see also the works of Clements and Lindström [14], Wei [48], Heijnen and Pellikaan [24], and Beelen and Dutta [7] that prove similar results). However, we don't need this general form of the lemma here.
We now begin the proof of the Lemma 3.2.
Given polynomials 1 , 2 as above, we construct the polynomial to be the multilinear polynomial obtained by computing the formal product · 1 · 2 and replacing by for each We observe that ( ) = 0 for all | | < . This is based on a case analysis of whether | | ≡ (mod ) or not. In the latter case, we see that 1 ( ) = 0 and hence ( ) = 0. In the former case, we have either ∈ {0, 1} − \ 0 , in which case ( ) = 0, or not, in which case On the other hand, we note that is a non-zero polynomial. This is because by (Q2.3), we know that there is some ∈ {0, 1} \ 1 where 2 ( ) ≠ 0. Further, 1 ( ) ≠ 0 and ( ) ≠ 0 by (Q1.1) and the de nition of 1 respectively. Hence, ( ) ≠ 0, implying that is a non-zero multilinear polynomial.
By Fact 3.4, we thus know that has degree at least . In particular, we obtain Hence, to nish the proof of the lemma, it su ces to prove the following claims.
since by hypothesis we have To do this, we use Theorem 3.6. Note that we have where the third inequality is a consequence of Lemma 3.5 (with = and = ( + 1) for various ) and the nal inequality uses ≤ −2 .
On the other hand, the parameter from the statement of Theorem 3.6 can be lower bounded as follows.
where the second inequality follows from Lemma 3.5 (with = and = + 2 1 ) and the nal inequality uses the fact that 1 > /30 = Putting the above together with (4) immediately yields Using Theorem 3.6, we thus obtain where the last inequality follows from Stirling's approximation. Having shown (3), the claim now follows.

The General Case
We start with some preliminaries.
We rst show a simple 'error-reduction' procedure for polynomials. In particular, the above holds for a uniformly random chosen from {0, 1} . Hence, we have We are now ready to prove the main lemma in its full generality. We consider now two cases.  Let be a large constant that will be xed below. By Lemma 3.10, we know that there is a probabilistic polynomial ( ) of degree at most · deg( ) such that for each ∈ { , }, we The proof will proceed by another restriction to variables, where is de ned to be the largest even integer such that 100 ≤ 2 . We assume that is greater than a large enough absolute constant, since otherwise is upper bounded by a xed constant, in which case the degree bound to be proved is trivial. Note that := 2 / ≥ 100 by de nition. We also have = ( 2 /100) − 2, which implies that ≤ 100 + (1)/ 2 ≤ 101, as long as is greater than a large enough absolute constant.
Relabel the variables so that is a polynomial in 1 , . . . , . Let be a uniformly random where the rst inequality uses (5).
By Markov's inequality as above, there is a xed choice of ( ) , , and such that the corresponding polynomial is a polynomial on variables satisfying ( ) ≤

Tightness of the Main Lemma (Lemma 3.1)
In this section, we discuss the near-optimality of Lemma 3.1 w.r.t. to the various parameters.
First of all, we note that the degree lower bound obtained cannot be larger than , because by Corollary 2.9, it follows that there is a degree-polynomial that vanishes at all points of weight but no points of weight .
So, the statement of Lemma 3.1 proves a lower bound on the degree that nearly (up to constant factors) matches this trivial upper bound, under the weaker assumption that the polynomial is forced to be zero only on most (say a 1 − fraction) of {0, 1} and non-zero on most (say a 1 − fraction) of {0, 1} . (Lemma 3.1 is a stronger statement, but we will show that even this weaker statement is tight.) In this section, we show that the value of cannot be increased beyond = exp(− ( 2 / )), if we want to prove a lower bound of Ω( ) on the degree. More precisely, we show the following.
Reducing the coe cients modulo , we obtain a polynomial˜∈ F[ 1 , . . . , ] with the same property. Fix this˜.
We de ne ( 1 , . . . , ) to be the polynomial˜( 1 , . . . , ). Note that 8 This lemma has a trivial proof via univariate polynomial interpolation if we only want the polynomial to have rational coefficients. However, here it important that has integer coefficients.   in [44].

R E M A R K 3 .1 5.
As in the case of the main lemma, the degree lower bound obtained above is tight, using the same reasoning as in Section 3.3. P R O O F . W.l.o.g. assume = + .

Let
= / and = / − 1. Our aim will be to show using the polynomial that there is a polynomial on variables that distinguishes between Hamming weights and := + . We will then appeal to Lemma 3.1 to get the degree lower bound.
It is easy to check that 100 < < − 100 as 100 where we used the hypotheses that 200 < < − 200 .
Each co-ordinate of is repeated times to get an ∈ {0, 1} .
A uniformly random permutation is applied to the coordinates of to get .
Finally, we de ne the probabilistic polynomial ( ) := ( ). For a xed permutation , each coordinate of is a polynomial of degree at most 1 in the variables 1 , . . . , , and hence, deg( ) ≤ deg( ). We will show that there is some polynomial in the support of that has the desired properties.
To nd a suitable xing of , we consider two cases.
Putting (7), (8), (10) and (9) together gives us that in both cases we have This is a special case of the Boole-Bonferroni inequalities, which are closely related to the Principle of Inclusion-Exclusion.
To apply Lemma 3.1 to , we need to relate the above bounds to quantities de ned in terms of := / and := / . We claim that Assuming these inequalities, we observe that satis es the hypotheses of Lemma 3.1. Applying this lemma gives us nishing the proof of Lemma 3.13.
It remains to prove (12), which is a simple calculation.
where the nal inequality uses the fact that ≤ ≤ 0.01.

Tight Degree Lower Bounds for the Coin Problem
We start with a de nition. In earlier work [31], we showed that this was tight for constant . That is, we showed that any polynomial that solves the -coin problem with error at most 1/10 (say) must have degree Ω(1/ ). This was also implied by an independent result of Chattopadhyay, Hosseini, Lovett and Tal [13] (see [2]). Both proofs relied on slight strengthenings of Smolensky's [41] lower bound on polynomials approximating the Majority function. It is not clear from these proofs, however, if this continues to be true for subconstant . The main lemma (Lemma 3.1), or even its simpler version Lemma 3.2, shows that this is indeed true.

T H E O R E M 4 . 2 (Tight Degree Lower Bound for the -coin problem for all errors). Assume
F has characteristic and , are parameters going to 0. Let ≥ 1 be any positive integer.

P R O O F .
We assume that is smaller than some small enough constant 0 (for larger , we can just appeal to the lower bound of [31]).
Assume for now that = 1/ for some integer ≥ 1. Fix to be the least even integer such that ≥ 2 log(1/ ) for a large constant and := is a power of the characteristic .

Tight Probabilistic Degree Lower bounds for Positive Characteristic
We start with some basic notation and de nitions and then state our result.
Throughout this section, let F be a eld of xed (i.e. independent of ) characteristic > 0.
The main theorem of this section characterizes (up to constant factors) the -error probabilistic degree of every symmetric function and for almost all interesting values of .

T H E O R E M 4 . 3 (Probabilistic Degree lower bounds over positive characteristic).
Let ∈ N be a growing parameter. Let ∈ B be arbitrary and let ( , ℎ) be a standard decomposition of (see Section 2 for the de nition). Then for any ∈ [1/2 , 1/3], we have Here the Ω(·) notation hides constants depending on the characteristic of the eld F.
Note that this matches the upper bound construction from Theorem 2.5.

D E F I N I T I O N 4 . 4 (Restrictions).
Given functions ∈ B and ∈ B where ≤ , we say that is a restriction of if there is some ∈ [0, − ] such that the identity holds for every ∈ {0, 1} . Or equivalently, that can be obtained from by setting some inputs to 0 and 1 respectively. 10 We will use the following obvious fact freely.

O B S E R VAT I O N 4 . 5.
If is a restriction of , then for any > 0, pdeg ( ) ≤ pdeg ( ).
In earlier work with Tripathi and Venkitesh [43], we showed the following near-optimal lower bound on the probabilistic degrees of Threshold functions. (The corresponding lemma in [43] is only stated for ≤ /2. However, as Thr +1− ( ) = 1 − Thr (1 − 1 , . . . , 1 − ), the above lower bound holds for > /2 also.) 10 Note that exactly which inputs are set to 0 or 1 is not important, since we are dealing with symmetric Boolean functions.
The following classical results of Smolensky prove optimal lower bounds on the probabilistic degrees of some interesting classes of symmetric functions.

L E M M A 4 . 8 (Smolensky's lower bound for MOD functions [41]
). For 2 ≤ ≤ /2, any F such that char(F) is either zero or coprime to , any ∈ (1/2 , 1/(3 )), there exists an ∈ [0, − 1] such that We now show how to use our robust version of Hegedűs's lemma to prove Theorem 4.3. In fact, Lemma 3.2 will su ce for this application.

Strategy and two simple examples
The probabilistic degree lower bounds below will use the following corollary of Lemma 3.2.

C O R O L L A R Y 4 . 9.
Let be a growing parameter and assume ∈ [2 − /100 , −200 ]. Assume is an integer such that is a power of and furthermore, = √ for some ∈ R such that 100 ≤ ≤ 1 2 · ln(1/ ). Let ℎ ∈ B be any function such that Spec ℎ( /2 ) ≠ Spec ℎ( /2 − ). Then, pdeg (ℎ) = Ω( ). To illustrate the usefulness of Corollary 4.9, we prove optimal lower bounds on the probabilistic degrees for two interesting classes of functions (both of which will be subsumed by Known lower bounds (Lemmas 4.7 and 4.8) can be used to prove similar lower bounds to the one given above, but with additional log-factor losses (see Lemma 4.8, which requires the error to be subconstant, and [43]). However, we do not know how to prove the above tight (up to constants) lower bound without appealing to Lemma 3.2. In particular, we do not know how to prove the above in characteristic 0.

P R O O F .
We use Corollary 4.9. We will use EThr /2 and MOD to construct functions that distinguish between weights /2 and /2 − for suitable = Ω( √ ). Corollary 4.9 then implies the required lower bound.

Proof of Theorem 4.3
The proof of this theorem closely follows our probabilistic degree lower bounds in [43] with careful modi cations to avoid the log-factor losses therein.
Let ∈ B be arbitrary and let ( , ℎ) be a standard decomposition of .
We start with a lemma that proves lower bounds on pdeg ( ) as long as per( ) is large.  Note that by the bounds on assumed above Using Corollary 4.9, we hence get On the other hand, if > −10000 2 , we proceed as follows. We construct as above, but we may no longer have ≥ 20 √ as implied by (14). However, for By error reduction (Fact 2.4 item 1), the same lower bound holds for pdeg ( ) as well.
The next lemma allows us to prove a weak lower bound on pdeg ( ) depending only on its periodic part .
is a power of . In this case, we rst choose parameters , with the following properties. (P1) ∈ [ ] with ≥ 20 and ≡ (mod 2).
(P2) 1/3 ≥ ≥ max{ , 1/2 }. We will show below how to nd , satisfying these properties. Assuming this for now, we rst prove the lower bound on pdeg ( ).  )) and is a restriction of , the same lower bound holds for pdeg ( ) as well. This proves the lemma modulo the existence of , as above. We justify this now.
The parameter is set to 1/3.
Note that as observed above, we have ≤ /100, and hence, the above analysis subsumes all cases.
In each case, the veri cation of properties (P1)-(P4) is a routine computation. (We assume here that is greater than a suitably large constant, since otherwise the statement of the lemma is trivial.) This concludes the proof.
We do this based on a case analysis based on the relative magnitudes of log(1/ ) and .

Note that as
By Lemma 4.13, it su ces to show a lower bound of Ω(per( ) + pdeg (ℎ)).
The analysis splits into two simple cases.
This nishes the proof.

A Robust Version of Galvin's Problem
We recall here a combinatorial theorem of Hegedűs [23] regarding set systems. The theorem (and also our robust generalization given below) is easier to prove in the language of indicator vectors, so we state it in this language.
Using the robust version of Hegedűs's lemma, we can prove tight robust versions of the above statement.

R E M A R K 4 .1 5.
We can prove a robust generalization (stated below) in a slightly more general setting where the th inner product ( ) , is supposed to take a value (which is not necessarily ). Similar to Theorem 4.14 above, it is easy to note that our robust version is tight up to constant factors.
However, if we consider the robust version of the original statement of Theorem 4.14 (where all the inner products take value ), then while our lower bound continues to hold, it is not clear whether it is tight (except in the settings where is either a constant or 2 −Ω( ) ). We conjecture that it is.
We now prove a robust version of Theorem 4.14.
We need the following standard bound on binomial coe cients. For completeness, we include the proof in Appendix C.
Given the above, we can prove Theorem 4.16 as follows.
Hence, we may assume that is smaller than any xed constant. We can also assume that ≥ 2 − for a small enough constant . Assume that ≤ √︁ log(1/ ).
for a large enough constant . Informally speaking, the reason for this inequality is as follows: the expected value of ( ) , is ( /4) − /2 and any number ≡ (mod ) is far from this expectation. To prove this, let = /2 − . Note that = Ω( ) as long as is small enough in relation to , which happens if is assumed to be a small enough constant. Using the fact that  | | = ⇒ ( ) = 0 ( ) = 1.
As the above linear system is over F ⊆ F, we note that we may assume that ∈ F [ 1 , . . . , ].
From now on, we assume that F = F .
Let , denote the vector space of all multilinear polynomials of degree at most that vanish at all points in {0, 1} . Let be a uniformly random element of , . For any ∈ {0, 1} \ , standard linear algebra implies that ( ) is a uniformly random element of F = F . In particular, for any ∈ {0, 1} + , we see that In particular, there is a ∈ , that is non-zero at at least a (1 − 1/ ) fraction of points in {0, 1} + . This yields the statement of the claim.

B. Proof of Lemma 2.6 (the string lemma)
We begin by recalling the statement of the lemma.
Then there exists a string ∈ {0, 1} + such that is a power of (i.e. = for some ≥ 2).

P R O O F .
Assume that | | = , | | = and | | = + = . We will show in fact that both and are powers of the same non-empty string . This will clearly imply the lemma.
The proof is by induction on the length of . The base case of the induction corresponds to = 2, which is obvious.
We now proceed with the inductive case. Assume w.l.o.g. that ≤ . As = , we see that the rst symbols in match those of , and hence we have = for some ∈ {0, 1} − . If = , this implies that = and we are immediately done. Otherwise, we see that = = for a non-empty string . Hence, we have = . By the induction hypothesis, we know that both and are powers of some non-empty . Hence, so is . This concludes the proof.

C. Proof of Claim 4.17
We rst restate the claim.
The claim then follows by a simple induction on − .
To prove (16), we proceed as follows. By an expansion of binomial coe cients in terms of factorials, we see that