Testing Distributions of Huge Objects

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings. Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings). Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings) as well as for the distance between objects (resp., strings), and the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings. We study the query complexity of testing in this new model, focusing on three directions. First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model. Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings). Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions and pairs of identical distributions.


Introduction
In the last couple of decades, the area of property testing has attracted much attention (see, e.g., a recent textbook [13]). Loosely speaking, property testing typically refers to sub-linear time probabilistic algorithms for deciding whether a given object has a predetermined property or is far from any object having this property. Such algorithms, called testers, obtain local views of the object by making adequate queries; that is, the object is modeled as a function and testers get oracle access to this function (and thus may be expected to work in time that is sub-linear in the size of the object).
The foregoing description fits much of the research in the area (see [13,), but not the part that deals with testing properties of distributions (aka distribution testing, see [13,Chap. 11] and [7]). In this context, a tester get samples from the tested distribution and sub-linearity means sub-linearity in the size of the distribution's domain. 1 Each element in the domain is considered to be small, and is assumed to be processed at unit time.
In this work we consider distributions over sets of huge (or very large) objects, and aim at complexities that are sublinear in the size of these objects. As an illustrative example, think of the distribution of DNA-sequences in a large population. We wish to sample this distribution and query each sampled sequence at locations of our choice rather than read the entire sample.
One key issue is the definition of the distance between such distributions (i.e., distributions of huge objects). A natural choice, which we use, is the earth mover's distance under the (relative) Hamming measure. Under this measure, the distance between distributions reflects the probability mass of the difference when weighted according to the Hamming distance between strings (see Definition 1.1).

The new model
We consider properties of distributions over sets of objects that are represented by n-bit long strings (or possibly n-symbol long sequences); that is, each object has size n. (In Section 5 this will be extended to properties of tuples of distributions.) Each of these objects is considered huge, and so we do not read it in full but rather probe (or query) it at locations of our choice. Hence, the tester is an algorithm that may ask for few samples, and queries each sample at locations of its choice. This is modeled as getting oracle access to several oracles, where each of these oracles is selected independently according to the tested distribution (see Definition 1.2). We shall be mainly interested in the total number of queries (made into these samples), whereas the number of samples will be a secondary consideration.
The distance between such distributions, P and Q (over the same domain Ω = {0, 1} n ), is defined as the earth mover's distance under the Hamming measure; that is, the cost of transforming the distribution P to the distribution Q, where the cost of transforming a string x to a string y equals their relative Hamming distance. Definition 1.1 (distance between distributions over huge objects): For two strings x, y ∈ {0, 1} n , let ∆ H (x, y) denote the relative Hamming distance between them; that is, ∆ H (x, y) = 1 n · |{i ∈ [n] : x i = y i }|. (1) We say that q : N × (0, 1] → N is the query complexity of T if q(n, ) is the maximum number of queries that T makes on input parameters n and . If the tester accepts every distribution in D with probability 1, then we say that it has one-sided error.
We may assume, without loss of generality, that the tester queries each of its samples, and that it never makes the same query twice. Hence, q(n, ) ∈ [s(n, ), s(n, ) · n].
The sample (resp., query) complexity of testing the property D (in the DoHO model) is the minimal sample (resp., query) complexity of a tester for D (in the DoHO model). Note that the tester achieving the minimal sample complexity is not necessarily the one achieving the minimal query complexity. As stated before, we shall focus on minimizing the query complexity, while using the sample complexity as a yardstick.
Generalization. The entire definitional treatment can be extended to n-long sequences over an alphabet Σ, where above (in Definitions 1.1 and 1.2) we used Σ = {0, 1}.

The standard notions of testing as special cases (and other observations)
We first observe that both the standard model of property testing (of strings) and the standard model of distribution testing are special cases of Definition 1.2.
Standard property testing (of strings): Specifically, we refer to testing properties of n-bit strings (equiv., Boolean functions over [n]).
This special case corresponds to trivial distributions, where each distribution is concentrated on a single n-bit long string. Hence, a standard tester of query complexity q can be viewed as a tester in the sense of Definition 1.2 that has sample complexity 1 and query complexity q.
Standard distribution testing: Specifically, we refer to testing distributions over Σ.
This special case corresponds to the case of n = 1, where each distribution is over Σ. Hence, a standard distribution tester of sample complexity s can be viewed as a tester in the sense of Definition 1.2 that has sample complexity s and query complexity q = s. Indeed, here we used the generalization of the definitional treatment to sequences over Σ. The basic version, which refers to bit sequences, can be used too (with a small overhead). 2 Needless to say, the point of this paper is going beyond these standard notions. In particular, we seek testers (for the DoHO model) with query complexity q(n, ) = o(n) · s(n, ), where s(n, ) > 1 is the sample complexity in the DoHO model. Furthermore, our focus is on cases in which s(n, ) is relatively small (e.g., s(n, ) = poly(n/ ) and even s(n, ) = o(n) · poly(1/ )), since in these cases a factor of n matters more.
We mention that the sample complexity in the DoHO model is upper-bounded by the sample complexity in the standard distribution testing model. This is the case because the distance between pairs of distributions according to Definition 1.1 is upper-bounded by the total variation distance between them (see the discussion following Definition 1.1).
Observation 1.3 (on the sample complexity of testing distributions in two models): The sample complexity of testing a property D of distributions over {0, 1} n in the DoHO model is upper-bounded by the sample complexity of testing D in the standard distribution testing model.
We mention that for some properties D the sample complexity in the DoHO model may be much lower than in the standard distribution testing model, because in these cases the distance measure in the DoHO model is much smaller than the total variation distance. 3 Needless to say, this is not true in general, and we shall focus on cases in which the two sample complexities are closely related. In other words, we are not interested in the possible gap between the sample complexities (in the two models), but rather in the query complexity in the DoHO model. Furthermore, we are willing to increase the sample complexity of a tester towards reducing its query complexity in the DoHO model (e.g., see our tester for uniformity).

Our Results
We present three types of results. The first type consists of general results that relate the query complexity of testing in the DoHO model to the query and/or sample complexity of related properties in the standard (distribution and/or string) testing models. The second type consists of results for properties that have been studied (some extensively) in the standard distribution testing model. The third type consists of results for new properties that arise naturally in the DoHO model.

Some general bounds on the query complexity of testing in the DoHO model
A natural class of properties of distribution over huge objects is the class of all distributions that are supported by strings that have a specific property (of strings). That is, for a property of bit strings Π = {Π n } n∈N such that Π n ⊆ {0, 1} n , let D Π = {D n } n∈N such that D n denotes the set of all distributions that have a support that is subset of Π n . We observe that the query complexity of testing the set of distributions D Π (in the DoHO model) is related to the query complexity of testing the set of strings Π (in the standard model of testing properties of strings).
Theorem 1.4 (from testing strings for membership in Π to testing distributions for membership in D Π ): If the query complexity of testing Π is q, then the query complexity of testing D Π in the DoHO model is at most q such that q (n, ) = O(1/ ) · q(n, /2).
While the proof of Theorem 1.4 is simple, we believe it is instructive towards getting familiar with the DoHO model. We thus include it here, while mentioning that some ramifications of it appear in Appendix A.2.

Proof:
The main observation is that if the tested distribution P (whose domain is {0, 1} n ) is -far from D n (according to Definition 1.1), then, with probability at least /2, an object x selected according to P is /2-H-far from Π n . Hence, with high constant probability, a sample of size O(1/ ) will contain at least one string that is /2-H-far from Π n . If we have a one-sided error tester T for Π, then we can detect this event (and reject) by running T (with proximity parameter /2) on each sampled string. If we only have a two-sided error tester for Π, then we invoke it O(log(1/ )) times on each sample, and reject if the majority rule regarding any of these samples is rejecting. Hence, in total we make O( −1 log(1/ )) · q(n, /2) queries.
An opposite extreme. Theorem 1.4 applies to any property Π of strings and concerns the set of all distributions that are supported by Π (i.e., all distributions P that satisfy {x : P (x) > 0} ⊆ Π). Hence, Theorem 1.4 focuses on the support of the distributions and pays no attention to all other aspect of the distributions. The other extreme is to focus on properties of distributions that are invariant under relabeling of the strings (i.e., label-invariant properties of distributions). 4 We consider several such specific properties in Section 1.3.2, but in the current section we seek more general results. Our guiding question is the following.
Open Problem 1.5 (a key challenge, relaxed formulation): 5 For which label-invariant properties of distributions does it hold that testing them in the DoHO model has query complexity poly(1/ ) · O(s(n, /2)), where s is the sample complexity of testing them in the DoHO model?
Jumping ahead, we mention that in Section 1.3.2 we identify two label-invariant properties for which the relation between the query complexity and the sample complexity is as stated in Problem 1.5, and one for which this relation does not hold. More generally, we show that a relaxed form of such a relation (in which s is the sample complexity in the standard model) is satisfied for any property that is closed under mapping, where a property of distribution D is closed under mapping if, for every distribution P : Theorem 1.6 (testing distributions that are closed under mapping (see Theorem 2.2)): Suppose that D = {D n } is testable with sample complexity s(n, ) in the standard model, and that each D n is closed under mapping. Then, D is testable in the DoHO model with query complexity O( −1 · s(n, /2)).
Recall that a tester of sample complexity s in the standard distribution testing model constitutes a tester of sample complexity s in the DoHO model, alas this tester has query complexity n · s (whereas our focus is on the case that n poly( −1 log s(n, /2))). We wonder whether a result similar to Theorem 1.6 holds when s is the sample complexity in the DoHO model. 6 A middle ground between properties that contain all distributions that are supported by a specific set of strings and label-invariant properties of distributions is provided by properties of distributions that are label-invariant only on their support, where the support of a property of distributions is the union of the supports of all distributions in this property. That is, for a property D n of distributions over n-bit strings, we say that D n is label-invariant over its support if, for every 4 Recall that a property of distributions over {0, 1} n is called label-invariant if, for every bijection π : {0, 1} n → {0, 1} n and every distribution P , it holds that P is in the property if and only if π(P ) is in the property, where Q = π(P ) is the distribution defined by Q(y) = P (π −1 (y)). We mention that label-invariant properties of distributions are often called symmetric properties. 5 Less relaxed formulations may require query complexity O(s(n, /2)/ ) or even O(s(n, )). On the other hand, one may ease the requirement by comparing the query complexity in the DoHO model to the sample complexity in the standard model. 6 Such a result was wrongly claimed in Revision 1 of our ECCC TR21-133. Partial progress towards such a result is presented in Appendix A.3. bijection π : {0, 1} n → {0, 1} n that preserves the support of D n (i.e., x is in the support if and only if π(x) is in the support), it holds that the distribution P : {0, 1} n → [0, 1] is in D n if and only if π(P ) is in D n . Indeed, generalizing Problem 1.5, one may ask Open Problem 1.7 (a more general challenge): For which properties of distributions that are label-invariant over their support does it hold that testing them in the DoHO model has query complexity poly(1/ ) · O(s(n, /2) · q(n, /2)), where s is the sample complexity of testing them in the DoHO model and q is the query complexity of testing their support?
The next theorem identifies a sufficient condition for a positive answer. Specifically, it requires that the support of the property, denoted S, has a (relaxed) self-correction procedure of query complexity q. We mention that such procedures may exist only in case the strings in S are pairwise far apart. Loosely speaking, on input i ∈ [n] and oracle access to an n-bit string x, the selfcorrection procedure is required to return x i if x ∈ S, to reject if x is far from S, and otherwise it should either reject or return the i th bit of the string in S that is closest to x. Theorem 1.8 (self-correction-based testers in the DoHO model, loosely stated (see Theorem 3.1)): Let D be a property of distributions over bit strings that is label-invariant over its support. Then, ignoring polylogarithmic factors, the query complexity of testing D in the DoHO model is upperbounded by the product of the sample complexity of testing D in the standard model and the query complexity of testing and self-correcting the support of D.
One natural example to which Theorem 1.8 is applicable is a set of all distributions that are each have a support that contains few low-degree multi-variate polynomials; for size bound s(n) and the degree bound d(n), we get query complexity poly(d(n)/ ) · O(s(n)).

Testing previously studied properties of distributions
Turning back to label-invariant properties of distributions, we consider several such properties that were studied previously in the context of the standard distribution testing model. Specifically, we consider the properties of having bounded support size (see, e.g., [18]), being uniform over a subset of specified size (see, e.g., [2]), and being m-grained (see, e.g., [15]). 7 Theorem 1.9 (testers for support size, uniformity, and m-grained in the DoHO model (see Corollary 2.3)): For any m, the following properties of distributions over {0, 1} n can be tested in the DoHO model using poly(1/ ) · O(m) queries: 1. All distributions having support size at most m.
2. All distributions that are uniform over some set of size m.
3. All distributions that are m-grained.
Theorem 1.9 is proved by using Theorem 1.6. The foregoing upper bounds are quite tight. They also provide positive and negative cases regarding Problem 1.5 (see discussion following Theorem 1.10).
Theorem 1.10 (lower bounds on testing support size, uniformity, and m-grained in the DoHO model (see Propositions 2.8, 2.10 and 2.9)): 1. For every m ≤ 2 n−Ω(n) , testing whether a distribution over {0, 1} n has support size at most m requires Ω(m/ log m) samples. 2. For every constant c < 1 and m ≤ n, testing whether a distribution over {0, 1} n is uniform over some subset of size m requires Ω(m c ) queries. 3. For every constant c < 1 and m ≤ 2 n−Ω(n) , testing whether a distribution over {0, 1} n is m-grained requires Ω(m c ) samples.
Note that Parts 1 and 3 assert lower bounds on the sample complexity in the DoHO model, which imply the same lower bounds on the query complexity in this model. Combining the first part of Theorems 1.9 and 1.10 yields a property that satisfies the requirement of Problem 1.5; that is, the query complexity in the DoHO model is closely related to the sample complexity (in this model).
On the other hand, combining Part 2 of Theorem 1.10 with the tester of [2,9] yields a property that does not satisfy the requirement in Problem 1.5, since this tester uses O(m 2/3 / 2 ) samples (even in the standard distribution testing model). 8 Tuples of distributions. In Section 5 we extend the DoHO model to testing tuples (e.g., pairs) of distributions, and consider the archetypical problem of testing equality of distributions (cf. [4,5]).
In this case, we obtain another natural property that satisfies the requirement of Problem 1.5.
Theorem 1.11 (a tester for equality of distributions (see Theorem 5.2)): For any m, n ∈ N and > 0, given a pair of distributions over {0, 1} n that have support size at most m, we can distinguish between the case that the distributions are identical and the case that they are -far from one another (according to Definition 1.1) using O(m 2/3 / 3 ) queries and O(m 2/3 / 2 ) samples.
We note that m 2/3 / 2 is a proxy for max(m 2/3 / 4/3 , m 1/2 / 2 ), which is a lower bound on the sample complexity of testing this property in the standard distribution testing model [21]. This lower bound can be extended to the DoHO model. Hence, in this case, the query complexity in the DoHO model is quite close to the sample complexity in this model.

Distributions as variations of an ideal object
A natural type of distributions over huge objects arises by considering random variations of some ideal objects. Here we assume that we have no access to the ideal object, but do have access to a sample of random variations of this object, and we may be interested both in properties of the ideal object and in properties of the distribution of variations. In Section 4, we consider three types of such variations, and provide testers for the corresponding properties.
1. Noisy versions of a string, where we bound the noise level.
In this case it is easy to recover bits of the original string, and test that the noisy versions respect the predetermined noise level.
2. Random cyclic-shifts of a string.
In this case we use a tester of cyclic-shifts (i.e., given two strings the tester checks whether one is a cyclic shift of the other).
3. Random isomorphic copies of a graph represented by its adjacency matrix.
In this case we use an isomorphism tester.
We stress that the testers employed in the last two cases have sublinear complexity; specifically, pairs of n-bit long strings are tested using n 0.5+o(1) queries.

Orientation and organization
As stated upfront, we seek testers that sample the distribution but do not read any of the samples entirely (and rather probe some of their bits).
In general, our proofs build on first principles, and are not technically complicated. Rather, each proof is based on one or few observations, which, once made, lead the way to obtaining the corresponding result. Hence, the essence of these proofs is finding the right point of view from which the observations arise.
Upper bounds. Some of our testers refer to label-invariant properties, and in this case it suffices to determine which samples are equal and which are different. Furthermore, viewing close samples as equal does not really create a problem, because we are working under Definition 1.1. Hence, testing equality between strings suffices, and it can be performed by probing few random locations in the strings. However, the analysis does not reduce to the foregoing comments, because we cannot afford to consider all strings in the (a priori unknown) support of the tested distribution. Instead, the analysis refers to the empirical distribution defined by the sequence of samples.
Lower bounds. Several of our lower bounds are obtained by transporting lower bounds from the standard distribution testing model. Typically, we transform distributions over an alphabet Σ to distributions over {0, 1} n by using an error correcting code C : Σ → {0, 1} n that has constant relative distance (i.e., ∆ H (C(σ), C(τ )) = Ω(1) for every σ = τ ∈ Σ). For example, when proving a lower bound on testing the support size we transform a random variable Z that ranges over Σ to the random variable Z = C(Z). Note that in such a case it does not suffice to observe that if Z is TV-far from having a support of size at most m, then C(Z) is far (under Definition 1.1) from being supported on (at most) m codewords. We have to argue that C(Z) is far from being supported on any (subset of at most) m strings.
Conventions. As evident from the last paragraph, it is often convenient to treat distributions as random variables; that is, rather than referring to the distribution P : Ω → [0, 1] we refer to the random variable X such that Pr[X = x] = P (x). We stress that always denotes the proximity parameter (for the testing task). Typically, the upper bounds specify the dependence on , whereas the lower bound refer to some fixed = Ω(1).
Organization. We start, in Section 2, with results that refer to a few natural properties of distributions that were studied previously in the context of the standard distribution testing model. We then turn to the general result captured by Theorem 1.8, and present its proof in Section 3. In Section 4 we study several types of distributions that arise naturally in the context of the DoHO model; that is, we consider distributions that capture random variations of some ideal objects. Lastly, in Section 5, we extend our treatment to testing tuples of distributions, and present a tester for the set of pairs of identical distributions.

Support Size, Uniformity, and Being Grained
In this section we consider three natural types of label-invariant properties (of distributions). These properties refer to the support size, being uniform (over some subset), and being m-grained (i.e., each string appears with probability that is an integer multiple of 1/m). Recall that D is a labelinvariant property of distributions over {0, 1} n if for every bijection π : {0, 1} n → {0, 1} n and every distribution X, it holds that X is in D if and only if π(X) is in D. Label-invariant properties of distributions are of general interest and are also natural in the DoHO model, in which we wish to avoid reading samples in full. In this section we explore the possibility of obtaining testers for such properties.
We first present testers for these properties (in the DoHO model), and later discuss related "triviality results" and lower bounds. Our testers (for the DoHO model) are derived by emulating testers for the standard (distribution testing) model. The lower bounds justify this choice retroactively.

Testers
Our (DoHO-model) testers for support size, being uniform (over some subset), and being m-grained are obtained from a general result that refers to arbitrary properties (of distributions) that satisfy the following condition.
Note that closure under mapping implies being label-invariant (i.e., for every bijection π : {0, 1} n → {0, 1} n , consider both the mapping π and π −1 ). Theorem 2.2 (testing distributions that are closed under mapping): Suppose that D = {D n } is testable with sample complexity s(n, ) in the standard model, and that each D n is closed under mapping. Then, D is testable in the DoHO model with query complexity O( −1 · s(n, /2)). Furthermore, the resulting tester uses 3 · s(n, /2) samples, makes O( −1 log(s(n, /2)/ )) uniformly distributed queries to each sample, and preserves one-sided error of the original tester.
The factor of 3 in the sample complexity is due to modest error reduction that is used to compensate for the small error that is introduced by our main strategy. Recall that a tester of sample complexity s in the standard distribution testing model constitutes a tester of sample complexity s in the DoHO model, alas this tester has query complexity n · s.

Proof:
The key observation is that, since D is closed under mapping, for any -subset J ⊆ [n], it holds that if X is in D, then X J 0 n− is in D, whereas we can test X J 0 n− for membership in D with queries per sample. Furthermore, as shown below, if X is -far from D, then the original tester would reject X J 0 n− , when invoked with proximity parameter /2. Specifically, in such a case, for a typical -subset J, we shall define a related random variable X such that (i) X J ≡ X J , (ii) X is /2-close to X, and (iii) the collision pattern of s = s(n, /2) samples of X J is statistically close to the collision pattern of s samples of X . Hence, if X is -far from D, then the collision pattern of s samples of X J is statistically close to a collision pattern of s samples of a distribution that the original tester should reject (whp), when invoked with proximity parameter /2.
The actual tester. Let T be the guaranteed tester of sample complexity s : N × [0, 1] → N. (Recall that T operates in the standard distribution testing model.) Hence, we may assume, without loss of generality, that T is label-invariant (see, e.g., [13,Thm. 11.12]), which means that it rules according to the collision pattern that it sees among its samples (i.e., the number of t-way collisions for each t ≥ 2). Using T , on input parameters n and , when given s = s(n, /2), samples, denoted x (1) , ...., x (s) , that are drawn independently from a tested distribution X, we proceed as follows.
1. We select a set J ⊆ [n] of size = O( −1 log(s/ )) uniformly at random and query each of the samples at each location in J. Hence, we obtain x That is, we invoke T on s samples of the distribution X J 0 n− , where these s samples are obtained by padding the strings x (1) J obtained in Step 1. As observed upfront, if X is in D, then so is X J 0 n− , for any choice of J. Hence, our tester accepts each distribution in D with probability that is lower-bounded by the corresponding lower bound of T . In particular, if T has one-sided error, then so does our tester.
We now turn to the analysis of the case that X is -far from D. In this case, we proceed with a mental experiment in which we define, for each choice of J, a random variable X = X (J) such that (i) X J ≡ X J , (ii) X is /2-close to X, and (iii) the collision pattern of s samples of X J is statistically close to the collision pattern of s samples of X . Note that Condition (ii) implies that X is /2-far from D, which means that T should reject s samples of X (whp), Condition (iii) implies that T should also reject s samples of X J 0 n− (whp), whereas Condition (i) implies that the same holds for samples of X J 0 n− , which in turn means that our tester rejects X (whp). In order to materialize the foregoing plan, we need a few definitions. Definitions and initial observations. For integers ≤ n and s, and a generic random variable X that ranges over {0, 1} n , we consider a sufficiently large s = O(s 2 · ), and use the following definitions.
• For an -subset J, we say that • For an -subset J, we say that a sequence of s strings (w (1) , ...., because the probability that some J-heavy string is not hit by any w (Here we used the fact that s = Ω(s 2 · ).) • We say that (w (1) , ...., w (s ) ) is good (for X) if it is J-good for a 1 − o(1) fraction of the -subsets J's. By an averaging argument, Actually, we shall only use the fact that there exists a good sequence of w (i) 's.
We fix an arbitrary good (for X) sequence (w (1) , ...., w (s ) ) for the rest of the proof. Recall that, with probability 1 − o(1) over the choice of J ∈ [n] , it holds that (w (1) , ..., w (s ) ) is J-good (for X), which means that all J-heavy strings (w.r.t X) appear among the J-restrictions of the w (i) 's. Fixing such a (typical) set J, let I = I(J) be a maximal set of indices i ∈ [s ] such that the w We stress that R contains all J-heavy strings (w.r.t X), which means that for every σ ∈ R it holds that Pr[X J = σ] < 0.01/s 2 . We now define X by selecting x ∼ X, and outputting for some i ∈ I, and outputting x itself otherwise (i.e., if x J ∈ R); that is, Note that X J ≡ X J . We claim that, for a typical J, it holds that X is /2-close to X.

Proof:
The key observation is that X differs from X only when X J ∈ {w (i) In this case, strings that are /4-H-close to {w (i) : i ∈ I(J)} contribute at most /4 units (to the distance between X and X (as in Definition 1.1)), and so we upper-bound the probability mass of strings which is o( ) by the definition of = O( −1 log(s/ )) (and s = O(s 2 / ), where we actually use s = poly(s/ )). Hence, with probability 1 − o(1) over the choice of J, it holds that the probability Recalling that X is -far from D, for a typical J, Claim 2.2.1 implies that X is /2-far from D, which implies that (with probability at least 2/3) the tester T rejects X (i.e., rejects when fed with s samples selected according to X ). However, we are interested in the probability that our tester (rather than T ) rejects X (rather than X ). Claim 2.2.2 (typically, our tester rejects X): Suppose that (w (1) , ..., w (s ) ) is J-good for X and that the corresponding X = X (J) is /2-far from D. Then, our tester rejects X with probability at least 0.66.

Proof:
Recalling that X J = X J , while relying on the hypothesis that (w (1) , ..., w (s ) ) is J-good (for X), we observe that the probability that our tester rejects X equals where the approximate equality is justified as follows (based on the definition of X ).
• On the one hand, the equality-relations between samples of X with a J-restriction in R are identical to those of their J-restrictions, because for each σ ∈ R there is a unique x in the support of X such that • On the other hand, the probability of collision among the J-restrictions of the other samples (i.e., those with a J-restriction in {0, 1} \ R) is upper-bounded by s 2 · 1 100·s 2 < 0.005, because these J-restrictions are all non-heavy. Needless to say, the collision probability between these (other) samples themselves can only be smaller.
It follows that our tester rejects X with probability at least 2 3 − 0.005 > 0.66, where the first term lower-bounds the probability that T rejects when presented with s samples of X .
Using the hypothesis that (w (1) , ..., w (s ) ) is good (for X), with probability 1−o(1) over the choice of J ∈ [n] , it holds that (w (1) , ..., w (s ) ) is J-good (for X) and (by Claim 2.2.1) the corresponding X = X (J) is /2-close to X. Using Claim 2.2.2, it follows that if X is -far from D, then our tester rejects X with probability at least 0.66 − o(1). Using mild error reduction (via three experiments), the theorem follows.  Moreover, all testers make the same uniformly distributed queries to each of their samples.
Proof: For Parts 1 and 3 we present testers for the standard model and apply Theorem 2.2, whereas for Part 2 we observe that the tester for m-grained distributions will do.
Let us start with Part 2. The key observation is that any distribution that is uniform over some m-subset is m-grained, whereas any distribution that is m-grained is log 2 m n -close (under Definition 1.1) to being uniform over some set of m elements (e.g., by modifying the first log 2 m bits in each string in the support). 9 Hence, for > 2 · log 2 m n , we test uniformity over m-subsets by testing for being m-grained (using proximity parameter /2). If ≤ 2 log 2 m n , then we can afford reading entirely each sample, since n = O( −1 log m). In the latter case we make O( −2 log 2 n) (rather than O( −1 log n)) queries to each sample.
Turning to Parts 1 and 3, it is tempting to use known (standard model) testers of complexity O( −2 m/ log m) for these properties (cf. [20]), while relying on the fact that these properties are label-invariant. However, these bounds hold only when the tested distribution ranges over a domain of size O(m), and so some additional argument is required. Furthermore, this may not allow us to argue that the tester for support-size has one-sided error. Instead, we present direct (standard model) testers of sample complexity O(m/ ) and O(m/ 2 ), respectively.
Testing support size. On input parameters n and , given Suppose that X is -TV-far from having support size at most m, and note that for any set S of at most m strings it holds that Pr[X ∈ S] > . Then, Testing the set of m-grained distributions. On input parameters n and , we set s = O(m log m) and s = O( −2 m log m). Given s + s samples, denoted x (1) , ...., x (s+s ) , that are drawn independently from a tested distribution X, we proceed in two steps.
1. We construct W = {w (i) : i ∈ [s]}, the set of strings seen in the first s samples.
(We may reject of |W | > m, but this is inessential.) 2. For each w ∈ W , we approximate Pr[X = w] by p w def = |{i ∈ [s ] : x (s+i) = w}|/s . We reject if we either encountered a sample not in W or one of the p w 's is not within a 1 ± 0.1 factor of a positive integer multiple of 1/m. Note that if X is m-grained, then, with high probability, W equals the support of X, and (whp) each of the p w 's is within a 1 ± 0.1 factor of a positive integer multiple of 1/m. On the other hand, suppose that X is accepted with high probability. Then, for any choice of W (as determined in Step 1), for each w ∈ W , it holds that Pr[X = w] = (1 ± 0.1 ) · p w , since p w is within a (1 ± 0.1 ) factor of a positive integer multiple of 1/m. Furthermore, Pr[X ∈ W ] < 0.1 . It follows that X is -TV-close to being m-grained. 9 Saying that X is m-grained means that it is uniform on a multiset {x (1) , . . . , x (m) } of n-bit strings. We modify X by replacing each x (i) by y (i) such that y (i) encodes the binary expansion of i − 1 in the first = log 2 m locations and equals x (i) otherwise. That is, we set y

Triviality results
An obvious case in which testing is trivial is the property of all distributions (on n-bit strings) that have support size 2 n . In this case, each distribution is infinitesimally close (under Definition 1.1) to being supported on all 2 n strings. A less obvious result is stated next.
Proof: We first show that, for every ∈ N, it holds that every distribution over {0, 1} n is n -close to a distribution that is supported by {0, 1} n− 0 . Next we show that each distribution of the latter type is 2 − -close to being 2 n -grained. Letting = log 2 n , the main claim follows.
In the first step, given an arbitrary distribution X, we consider the distribution X obtained by setting the last bits of X to zero; that is, let Then, X is ( /n)-close to X (according to Definition 1.1).
In the second step, we consider X obtained by letting Pr[X = x 0 ] equal 2 −n · 2 n · Pr[X = x 0 ] , and assigning the residual probability to (say) 1 n . Then, X is 2 n -grained and is at total variation distance at most 2 n− · 2 −n = 2 − from X , since the support size of X is at most 2 n− . Hence, X is ( n + 2 − )-close to X.
The furthermore claim follows by redefining X such that Pr[X = x 0 ] equal 2 −(n− ) · 2 n− · Pr[X = x 0 ] . In this case X is 2 n− -grained and is at total variation distance at most Non-triviality results. It is easy to see that any property of distributions that includes only distributions having a support of size 2 n−Ω(n) is non-trivial in the sense that not all distributions are close to it under Definition 1.1. This is the case because any such distribution is far from the uniform distribution over {0, 1} n (since, w.h.p., a uniformly distributed n-bit string is at Hamming distance Ω(n) from a set that contains 2 n−Ω(n) strings). Additional non-triviality results follow from the lower bounds presented in Section 2.3.

Lower bounds
We first consider three notions of uniformity: Uniformity over the entire potential support (i.e., all n-bit strings), uniformity over the the support of the distribution (where the size of the support is not specified), and uniformity over a support of a specified size. In all three cases (as well as in the results regarding testing support size and the set of grained distributions), we prove lower bounds on the sample (and query) complexity of testing the corresponding property in the DoHO model. As usual, the lower bounds refer to testing with = Ω(1); that is, to the case that the proximity parameter is set to some positive constant. Our proofs rely on the standard methodology by which a lower bound of L on the complexity of testing is proved by presenting two distributions X and Y that an algorithm of complexity L−1 cannot distinguish (with constant positive gap) 10 such that X has the property and Y is Ω(1)-far from having the property (cf. [13,Thm. 7.2]). In fact, typically, at least one of the two distributions will be claimed to exist using a probabilistic argument; that is, we shall actually prove that there exists two distribution x 0 and Y 0 (over {0, 1} n ) such that, for a random bijection π : {0, 1} n → {0, 1} n , setting X = π(X 0 ) and Y = π(Y 0 ) will do. Observation 2.5 (lower bound on testing uniformity over {0, 1} n ): For every c ∈ (0, 0.5) there exists > 0 such that testing with proximity parameter whether a distribution is uniform over {0, 1} n requires 2 c·n samples in the DoHO model.
Proof: Let S be a random 2 2c·n -subset of {0, 1} n , and X be uniform over S. Then, a sample of s = o(2 cn ) strings does not allow for distinguishing between X and the uniform distribution over {0, 1} n ; that is, for every decision procedure D : (Intuitively, this is the case because s random samples from a random set S are distributed almost identically to s random samples from the uniform distribution over n-bit strings.) 11 On the other hand, for every S as above, it holds that X is Ω(1)-far from the uniform distribution over {0, 1} n (according to Definition 1.1). This is the case because the probability mass of each x in the support of X must be distributed among 2 n /2 2cn strings, whereas most of these strings are at relative Hamming distance at least = Ω(1) from the support of X (provided that is chosen such that H 2 ( ) < 1 − 2c).
Observation 2.6 (lower bound on testing uniformity over an unspecified support size): For every c ∈ (0, 0.5) there exists > 0 such that testing with proximity parameter whether a distribution is uniform over some set requires 2 c·n samples in the DoHO model.

Proof:
We consider the following two families of distributions, where each of the distributions is parameterized by an 2 2c·n -subset of n-bit strings, denoted S.
1. X S is uniform on S.

With probability half, Y S is uniform on S, and otherwise it is uniform on
Now, on the one hand, for a random S, no algorithm can distinguish X S from Y S by using o(2 cn ) samples (cf. Footnote 11). On the other hand, we prove that Y S is far from being uniform on any set. Suppose that Y = Y S is δ-close to a distribution that is uniform on the set S ⊆ {0, 1} n . We shall show that δ = Ω(1), by considering two cases regarding S : 11 Formally, for every sequence ovi = (i1, ..., is) ∈ [N ] s , where N = 2 2cn , let ζ i (S) denote the output of D when fed with si 1 , ..., si s , where sj denotes the j th element of the N -set S. Then, whereas almost all paits of ζ i (S)'s are pairwise independent, because N = ω(s 2 ). Hence, where s 2 /N accounts for the fraction of non-disjoint pairs of i's.
Case 1: |S | ≤ 2 (0.5+c)·n (recall that c < 0.5). In this case, the probability mass assigned by Y to S \ S should be moved to S , whereas the average relative Hamming distance between a random element of S \ S and the set S is Ω(1). Specifically, letting U n denote the uniform distribution on {0, 1} n , we upper-bound the probability that U n ∈ S \ S is H-close to S by noting that |S \ S | > 2 n−1 , since |S| + |S | = o(2 n ), whereas |S | ≤ 2 (0.5+c)·n = 2 n−Ω(n) .
Case 2: |S | > 2 (0.5+c)·n . In this case, almost all the probability assigned by Y to S should be distributed among more than 2 (0.5+c)·n strings such that each of these strings is assigned equal weight. This implies that almost all the weight assigned by Y to S must be moved to strings that are at Hamming distance Ω(n) from S, since |S| = 2 2cn = 2 (0.5+c)·n−Ω(n) < 2 −Ω(n) · |S |.
Hence, in both cases, a significant probability weight of Y must be moved to strings that are Ω(1)-H-far from their origin. The claim follows. • The set of distributions that are uniform over some m-subset; • The set of m-grained distributions; • The set of distributions with support size at most m.
Stronger results are presented in Propositions 2.8 and 2.9.
Proof: As in the proof of Observation 2.5, observe that no algorithm can distinguish the uniform distribution over {0, 1} n from a distribution that is uniform over an m-subset unless its sees Ω( √ m) samples. However, the uniform distribution over {0, 1} n is far from any of the foregoing properties (also under Definition 1.1), since m ≤ 2 n−Ω(n) . Proof: We use the Ω(m/ log m) (sample complexity) lower bound of [19] that refers to testing distributions over [O(m)] for support size at most m, in the standard testing model (that is, under the total variation distance). This lower bound is proved in [19] by presenting two distributions, X and Y , that cannot be distinguished by a label-invariant algorithm that gets s = o(m/ log m) samples, where X has support size at most m and Y is far (in total variation distance) from having support size at most m. We use an error correcting code C : [O(m)] → {0, 1} n of constant relative distance, and consider the distributions X = C(X) and Y = C(Y ).
Evidently, a label-invariant algorithm that obtains m samples cannot distinguish X and Y . Actually, as in the previous proofs, we need to consider any algorithm that takes s samples, and we identify for each such algorithm two such distributions X and Y (which are relabelings of the original X and Y ) that are indistinguishable by it (cf. Footnote 11). On the other hand, X has support size at most m whereas we claim that Y is far from having support size at most m, under Definition 1.1. Intuitively, this is the case because reducing the support size of Y requires moving a constant amount of probability weight from elements in the support of Y , which resides on strings that are far away in Hamming distance, to fewer strings. Each such movement can be charged in proportion to the relative distance of the code C. The actual argument follows.
Let Z be a distribution that is closest to Y , under Definition 1.1, among all distributions that are supported on at most m strings, and let γ denote the distance between Y and Z. By Definition 1.1, this means that there exists a "weight relocation" function W : {0, 1} 2n → [0, 1] that satisfies z W (y , z) = Pr[Y = y ] for every y , and y W (y , z) = Pr[Z = z] for every z. Furthermore, y z W (y , z) · ∆ H (y , z) = γ, where we refer to this sum as the cost associated with W . Note that y z W (y , z) · InEq(y , z) is lower-bounded by the total variation distance between Y and Z, where InEq(y , z) = 1 if y = z and InEq(y , y ) = 0.
Let S denote the support of Z (so that W (y , z) = 0 for every z / ∈ Z), and let S be the subset of S that contains those strings that are (0.4·δ)-H-close to the code C. Recall that the support of Y is a subset of C (so that W (y , z) = 0 for every y / ∈ C). The cost associated with W is the sum of three terms. The first is y z∈S\S W (y , z) · ∆ H (y , z), the second is y z∈S \C W (y , z) · ∆ H (y , z) and the third is y z∈S ∩C W (y , z) · ∆ H (y , z). We analyze each separately, while letting R denote the support of Y .
• By the definition of S (and since the support of Y is a subset of C), for each y in the support of Y and each z ∈ S \ S , we have that ∆ H (y , z) > 0.4 · δ. Therefore, the first term is lower-bounded by y z∈S\S W (y , z) · 0.4 · δ.
• Turning to the second term, for each z ∈ S \ C, let cc(z) ∈ C be the codeword in C that is closest to z. By the definition of S we have that δ (z) We claim that (for every z ∈ S \C), at least half the probability mass that is relocated by W to z (from Y ) must come from codewords y (in the support of Y ) that are different from cc(z); that is, y ∈R\{cc(z)} W (y , z) ≥ 1 2 · y W (y , z). We prove that y ∈R\{cc(z)} W (y , z) ≥ W (cc(z), z), by showing that otherwise we could modify Z (and W ) to obtain a distribution Z with support size at most m (and a corresponding weight relocation function W ) such that Z is closer to Y than Z (i.e., W has lower cost than W ).
Specifically, Z is obtained by moving the probability mass that Z assigns z to the codeword cc(z); that is, Pr[Z = z] = 0 and Pr[Z = cc(z)] = Pr[Z = cc(z)] + Pr[Z = z] (and Pr[Z = z ] = Pr[Z = z ] for every z / ∈ {z, cc(z)}), while noting that Z has support size at most m. The weight relocation function W is define accordingly (i.e., for each y , we set W (y , z) = 0 and W (y , cc(z)) = W (y , cc(z)) + W (y , z) (leaving W (y , z ) = W (y , z ) for every z / ∈ {z, cc(z)})). Then, the cost of W (which upper-bounds the distance between Y and Z ) equals the cost of W minus y W (y , z) · ∆ H (y , z) plus y W (y , z) · ∆ H (y , cc(z)). Now, Using the counter hypothesis (i.e., W (cc(z), z) > y ∈R\{cc(z)} W (y , z)), we lower-bound Eq. (5) by y ∈R\{cc(z)} W (y , z) · δ, and reach a contradiction to the optimality of W (since the cost of W is smaller than the cost of W ).
Hence, for each z ∈ S \ C we have that y W (y , z) · ∆ H (y , z) ≥ 1 2 y W (y , z) · 0.6 · δ, implying that second term in the cost of W is lower-bounded by y z∈S \C W (y , z) · 0.3 · δ.
• Lastly, for each y in the support of Y and each z ∈ S ∩ C such that z = y , we have that ∆ H (y , z) ≥ δ. Therefore, the third term is lower-bounded by y z∈(S ∩C)\{y } W (y , z) · δ, which we rewrite as y z∈(S ∩C) W (y , z) · InEq(y , z) · δ.
To summarize, the distance γ between Y and Z, under Definition 1.1, is at least a 0.3δ factor of the total variation distance between these two distributions. We comment that the foregoing lower bound (for DoHO model) matches the best known lower bound for the standard distribution testing model [15]. See Section 2.4 for further discussion.
Proof: We use the Ω(m c ) lower bound of [15] that refers to testing whether a distribution over [O(m)] is m-grained, under the total variation distance. This lower bound is proved in [15] by presenting two (2m-grained) distributions, X and Y , that cannot be distinguished by a labelinvariant algorithm that gets s = o(m c ) samples, where X is m-grained and Y is far (in total variation distance) from being m-grained.
As in the proof of Proposition 2.8, applying an error correcting code C : [O(m)] → {0, 1} n to X and Y , we observe that X = C(X) is m-grained whereas Y = C(Y ) is far from being m-grained (also under Definition 1.1). 12 To see that Y is far from any distribution Z that is m-grained and is supported by a set S, we (define S and) employ the same case-analysis as in the proof of Proposition 2.8. (This shows that the distance (under Definition 1.1) between Y and Z is lower-bounded by a constant fraction of their total variation distance.) 13 Proposition 2.10 (lower bound on testing parameterized uniformity): For every constant c < 1 and m ≤ n, testing that a distribution over {0, 1} n is uniform over some m-subset requires Ω(m c ) queries in the DoHO model.
We stress that, unlike Proposition 2.9, which lower-bounds the sample complexity of testers, in Proposition 2.10 we only lower-bound their query complexity. 14 Proof: Let X and Y denote the distributions derived in the proof of Proposition 2.9. Recall that X is m-grained, whereas Y is far from being m-grained (under Definition 1.1). Note that Y is Ω(1)-far from being uniform over any set of size m, and observe that X is log 2 m n -close to a distribution X that is uniform over a set of size m. Specifically, we can transform X to X 12 In fact, as in the proof of Proposition 2.8, we actually consider adequare relabelings of X and Y . 13 Note that in the second case (i.e., probability mass relocated from Y to z ∈ S \ C), the potential replacement (of z by the codeword closest to it) preserves m-grained-ness. 14 We actually use m log m = o(n 1/c ), which follows from m ≤ n.
by modifying only the bits that reside in log 2 m locations, where the choice of these locations is arbitrary. 15 Hence, a potential tester that make o(n/ log m) queries is unlikely to hit these locations, if we select these locations uniformly at random. Using m ≤ n, we conclude that a potential tester that makes min(o(m c ), o(n/ log m)) = o(m c ) queries cannot distinguish between the distribution X and distribution Y , which implies that it fails to test uniformity in the DoHO model.

Conditional lower bounds
The lower bounds (for the DoHO model) presented in Proposition 2.9 and 2.10 build on the best known lower bound for testing the set of grained distributions in the standard distribution testing model. The following lower bounds on the complexity of testing in the DoHO model rely on a conjecture regarding the sample complexity of testing grained distributions in the standard model.  Proof: We would have liked to argue that the proof is analogous to the proof of Proposition 2.9, except that here we assume the existence of two distributions, X and Y , over [O(m)] that cannot be distinguished by a label-invariant algorithm that gets o(m/ log m) samples, where X is m-grained and Y is far (in total variation distance) from being m-grained. However, since Conjecture 2.11 does not quite imply the existence of such distributions X and Y , we apply a slightly more complex argument. Our starting point is the observation that Conjecture 2.11 implies the existence of multisets of distributions 16 , denoted X and Y, such that the following holds: 1. Each distribution in X is m-grained.

Each distribution in Y is TV-far from being m-grained.
3. No algorithm can distinguish between s = o(m/ log m) samples taken from a distribution X that is selected uniformly in X and s samples taken from a distribution Y that is selected uniformly in Y. 15 Saying that X is m-grained means that it is uniform on a multiset {x (1) , . . . , x (m) } of n-bit strings. We modify X by replacing each x (i) by y (i) such that y (i) encodes the binary expansion of i − 1 in the chosen locations and equals x (i) otherwise. That is, letting 1 < 2 < · · · < log 2 m denote the chosen locations, we set y (i) j to equal the j th bit in the binary expansion of i − 1 and set y (i) = x (i) if ∈ [n] \ { 1, 2, . . . , log 2 m }. 16 Actually, X and Y are distributions of distributions. However, to avoid confusion, we preferred to present them as multi-set and consider a uniformly selected element in them.
The foregoing observation is proved by applying the MiniMax Principle (cf. [12, Apdx A.1]). Specifically, we consider deterministic algorithms that, given s samples from a distribution Z, try to distinguish between the case that Z is m-grained and the case that Z is TV-far from being m-grained, and denote by c(A, Z) the probability that algorithm A is correct on Z (i.e., it correctly identifies Z's type). Then, Conjecture 2.11 asserts that, for every distribution A of algorithms (i.e., a randomized algorithm) that get s samples, there exists a distribution Z (which is either m-grained or far from m-grained) such that A errs on Z with probability greater than 1/3 (i.e., E A∼A [c(A, Z)] < 2/3). The minimax principle then implies that there exists a multiset Z of such distributions (which are each either m-grained or far from m-grained) on which each algorithm A that takes s samples errs on the average with probability greater than 1/3 (i.e., E Z∈Z [c(A, Z)] < 2/3). Analogously to [13,Exer. 7.3], we obtain X and Y as desired, where the indistinguishability gap is less than 1/2.
Consider the corresponding multisets X and Y , which are obtained by applying a (constantdistance) error correcting code C to the elements of each distribution in X and Y, respectively. we conclude that no algorithm that takes s samples can distinguish X from Y , where X (resp., Y ) is selected uniformly in X (resp., Y ), where the indistinguishability gap is less than 1/2. (As in the proof of Proposition 2.9, we show that each Y is far (under Definition 1.1) from being m-grained.) Observing that a distinguishing gap of less than 1/2 means that no algorithm (of low complexity) constitutes a tester with error probability at most 1/4 (rather than at most 1/3), the claim follows (using error reduction).
Theorem 2.13 (on testing parameterized uniformity in the DoHO model): Assuming Conjecture 2.11, for every m ≤ n, testing that a distribution over {0, 1} n is uniform over some m-subset requires Ω(m/ log m) queries in the DoHO model.
We stress that, unlike Theorem 2.12, which lower-bounds the sample complexity of testers, in Theorem 2.13 we only lower-bound their query complexity.
Proof: Let X and Y denote the multisets of distributions derived in the proof of Theorem 2.12. Recall that each distribution in X is m-grained, whereas each distribution in Y is far from being m-grained (under Definition 1.1). As in the proof of Proposition 2.10, note that each distribution in Y is Ω(1)-far from being uniform over a set of size m, and observe that each distribution in X is log 2 m n -close to being uniform over a set of size m. Specifically, we can make each distribution in X uniform by modifying only the bits that reside in log 2 m locations, where the choice of these locations is arbitrary. Hence, a potential tester that make o(n/ log m) queries is unlikely to hit these locations, if we select these locations uniformly at random. Using m ≤ n, we conclude that a potential tester that makes o(m/ log m) queries cannot distinguish between distributions in the modified multiset X and distributions in the multiset Y , which implies that it fails to test uniformity in the DoHO model.

Distributions on self-correctable/testable sets
In this section we prove Theorem 1.8, which refers to properties of distributions that are supported on a set of strings Π ⊆ {0, 1} n that has an efficient self-correction/testing procedure. In this case, label-invariance actually means being label-invariant when restricted to Π; that is, for every bijection π : Π → Π and every distribution X, it holds that X is in the property if and only if π(X) is in the property.
Our starting point is a label-invariant property of distributions, denoted D, and a property of strings, denoted Π, that has a relatively efficient tester and local self-corrector. Actually, we use a relaxed definition of self-correction, which allows to output a special failure symbol in case the input (oracle) is not in Π (but close to Π). 17 Indeed, proper behavior of the self-corrector is required only up to a specified distance from the set Π. Combining D and Π, we get a property of distributions, denoted D Π , that consists of all distributions in D that are supported by Π. We prove that the query complexity of testing D Π in the DoHO model is related to the sample complexity of testing D (in the standard model) and to the query complexity of the two foregoing procedures.
Theorem 3.1 (from standard distribution testing of D to testing D Π in the DoHO model, when Π is efficiently testable and self-correctable): • Let D be a label-invariant property of distributions over {0, 1} n , and suppose that D is testable in the standard model using s(n, ) = Ω(1/ ) samples. 18 • Let Π ⊆ {0, 1} n be a property of strings that is testable with query complexity q T (n, ) and self-correctable up to distance δ(n) with q C (n) queries; that is, there exists an oracle machine C that makes at most q C (n) queries such that for every x ∈ {0, 1} n and i ∈ [n] the following two conditions hold: • Suppose that every distribution in D is supported by a subset of size at most |Π|, and let D Π denote the set of all distributions in D that have a support that is a subset of Π. 19 Then, D Π is testable with query complexity q(n, ) = O(s(n, /2)) · q T (n, δ(n)) + q C (n) δ(n) and sample complexity s(n, /2), where = min( , δ(n)).
Warm-up (or a first attempt). Using the fact that Π has relative distance δ = δ(n), let us consider a tester that, given the sample-sequence Hence, in this case we correctly distinguish X in D Π from X that is -far from D Π although it is supported by Π. Of course, we can easily test that X is supported by Π (using the tester for Π -see Theorem 1.4), but the problem is that our samples may be close to Π and yet not reside in it. This is a problem because the foregoing analysis presupposed that the inequality between samples is reflected in their restrictions to a small subset I (i.e., that x (i) = x (j) typically implies x (i) . We address this problem by using the hypothesis regarding Π; that is, not only is Π testable (with proximity parameter using q T (n, ) queries), but it is also self-correctable to distance δ by using q C (n) queries. In particular, combining the tester for Π (applied with proximity parameter δ(n)), and the self-corrector (and employing error reduction 22 we can obtain an oracle machine C that satisfies the following for every x ∈ {0, 1} n and i ∈ [n] (for any integer parameter s): This combined machine has query complexity q C (n, s) = O(log s) · (q T (n, δ) + q C (n)). We are now ready to present an analyze our tester.
1. We test whether X is supported by Π with proximity parameter /2, where the distance here (and throughout the proof, unless stated explicitly otherwise) is according to Definition 1.1. If this test rejects, then we reject.
(Here and below, "supported by Π" means having a support that is a subset of Π.) We use the tester provided by Theorem A.2 (with proximity parameter /2), while noting that we can reuse some of the samples provided to T for this purpose (since s = Ω(1/ )). As noted following the statement of Theorem 3.1, the query complexity of testing Π is q T (n, ) = O(q T (n, δ(n))) + O(1/ ) · q C (n)). We infer that the query complexity of the current step is max( O(q T (n, /4)), O(q T (n, δ(n))/δ(n))). 23

For each i ∈ [s]
, we test whether x (i) is in Π, when setting the proximity parameter to δ = δ(n) and the error bound to o(1/s). If any of these checks rejects, then we reject. Otherwise, we may assume that each sample is δ(n)-H-close to Π.
Note that typically s O(1/ ) and δ , which implies that the current step is incomparable to Step 1. 24 The query complexity of the current step is O(s) · q T (n, δ). If any of these correction attempts fails (i.e., if any y (i) j equals ⊥), then we reject. Otherwise, we output the verdict of T (n, /2; CP(y (1) , . . . , y (s) )). We may assume, without loss of generality, that T (and so T ) errs with probability at most 0.1. Typically (i.e., when s 1/ ), the query complexity of T is dominated by the last step and equals s · · q C (n, s) = O(s log 2 s) · q T (n, δ(n)) + q C (n) δ(n) .
(Recall that the complexity of Step 1 is max( O(q T (n, /4)), O(q T (n, δ(n))/δ(n))), where q T (n, /4) = O(q T (n, δ)) + O(1/ ) · q C (n), whereas the complexity of Step 2 is O(s) · q T (n, δ).) Analysis of the proposed tester T . We start with the case that X is in D Π . In this case, the first two steps cause rejection with probability o(1), since all x (i) 's are in Π. Furthermore, in this case, with probability 1 − o(1), each y (i) equals the restriction of x (i) to the locations in I (i.e., y (i) = x (i) I ). As argued in the motivational discussion, if x (i) = x (j) , then x (i) and x (j) are δ-H-far apart from one another, and so Pr p∈[n] [x , by our choice of . We conclude that Pr I [CP(y (1) , . . . , y (s) )) = CP(x (1) , . . . , x (s) ))] = 1 − o(1), which implies that our tester accepts X with probability at least 0.9 − o(1).
We now turn to the case that X is -far from D Π (according to Definition 1.1). The easy case is that X is /2-far from being supported by Π, and this case leads Step 1 to reject with very high probability. We thus assume that X is /2-close to being supported by Π, and let corr(x) denotes the string in Π that is closest to x. Then, in expectation, corr(X) is /2-H-close to X, since the Hamming distance between x and corr(x) equals the Hamming distance between x and Π. Hence, corr(X) is /2-close to X, which implies that corr(X) is /2-far from D Π . By the next claim, this implies that the total variation distance between corr(X) and D is greater than /2. Claim 3.1.1 (distance to D Π vs TV-distance to D): Let X be a distribution supported by Π such that X is -far from D Π (according to Definition 1.1). Then, the total variation distance between X and D is greater than .
Proof: Assume, contrary to the claim, that X is -TV-close to some distribution Y in D. If Y in D Π , then we immediately reach a contradiction to the hypothesis of the claim by which X is at distance greater than from D Π (according to Definition 1.1). This is the case because (as noted in the introduction), the total variation distance between distributions upper-bounds the distance according to Definition 1.1. Hence, Y in D \ D Π . We claim that in such a case, based on Y , we can define a distribution Y in D Π such that X is -TV-close to Y , resulting once again in a contradiction. Thus, it remains to establish the existence of such a distribution Y .
Recall that by the premise of Theorem 3.1, the support size of Y is at most |Π|. Let S denote the support of Y , and S = S \ Π. Consider any subset Π of Π \ S such that |Π | = |S | (such a subset must exist because |S | ≤ |Π| and hence |S | = |S \ Π| ≤ |Π \ S |). Selecting any bijection φ between S and Π , we set Pr[Y = j] = Pr[Y = φ −1 (j)] for every j ∈ Π , and Pr[Y = j] = Pr[Y = j] for every j ∈ Π ∩ S . Note that the total variation distance between X and Y is upper-bounded by the total variation between X and Y , because the probability mass assigned by Y to Π is already charged to the TV-distance between Y and X (since Pr[Y ∈ S ] = Pr[Y ∈ Π ] and Pr[X ∈ S ] = 0). By applying Claim 3.1.1, we get that the total variation distance between corr(X) and D is greater than /2. It follows that, with probability at least 0.9, the (standard) tester for D (i.e., T ), rejects when given s = s(n, /2) samples of corr(X). Hence, with probability at least 0.9, a sequence of s samples of corr(X) yields a collision pattern that leads T to reject. Recall, however, that we invoke T on s samples of X, not of corr(X). Nevertheless, we show that our tester (i.e., T ) will reject with high probability also in this case. Proof: We consider s samples x (1) , . . . , x (s) taken from X. On the one hand, if any of these x (i) 's is δ-H-far from Π, then Step 2 rejects with very high probability. On the other hand, if x (i) is δ-H-close to Π, then Pr[C x (i) (j) ∈ {corr(x (i) ) j , ⊥}] = 1 − o(1/s 2 ) for every j ∈ [n], which means that T either obtains s samples of corr(X) I or rejects. Recall that, with very high probability, a sequence of s samples of corr(X) I has the same collision pattern as a sequence of s samples of corr(X), since corr(X) is supported by strings that are pairwise δ-H-far apart. Lastly, recall that the collision pattern of a sequence of s samples of corr(X) causes T to reject (whp). To summarize, letting C x (p 1 , . . . , p ) = (C x (p 1 ), . . . , C x (p )), we have which is 0.9 − o(1). We stress that the foregoing inequalities hold since we have ignored cases that cause rejection (e.g., x (i) being δ-H-far from Π and other cases in which C outputs ⊥).
Combining Claims 3.1.1 and 3.1.2, we infer that if corr(X) is /2-far from D Π , then T rejects X with high probability. Recalling that if X is -far from D Π , then either X is /2-far from being supported on Π (which causes Step 1 to reject (whp)) or corr(X) is /2-far from D Π , it follows that T rejects (whp) in any case.

Distributions as Materialization of an Ideal Object
As stated in the introduction, we consider three types of random variations of an ideal object: random noise applied to bits of a string (a.k.a perturbations), random cyclic-shifts of a string, and random isomorphic copies of a graph represented by a string. These types are studied in the following three subsections.

Perturbation
For two constant parameters η ∈ [0, 0.5) and δ ∈ [0, 1], and every string x * ∈ {0, 1} n , we consider all distributions in which each bit of x * is flipped with probability at most η and the outcome is at Hamming distance at most δ · n from x * . That is, D per η,δ (x * ) contains the distribution X if 1. For every i ∈ [n], it holds that Pr[X i = x * i ] ≤ η.

Proof:
The key observation is that if X is in D per η,δ (x * ), for some string x * ∈ {0, 1} n , then each bit of x * can be recovered with probability 1−2 −t by querying O(t) samples of X (at the corresponding location). This allows to estimate the flipping probability of individual bits in X as well as the distribution of the Hamming distance between X and x * . In view of this observation, the tester proceeds as follows (assuming η + 0.1 < 0.5, or else is set so that it satisfies this constraintsrecall that η is a constant). i =x i }| > (δ + 0.1 ) · |I|, then the tester rejects. Otherwise, the tester accepts.
Suppose X belongs to D per η,δ (x * ) for some x * ∈ {0, 1} n . First observe that for any choice of the subset I (in the first step of the algorithm), the following holds by applying the additive Chernoff bound and a union bound: With high constant probability, taken over the choice of the sampled strings selected in the second step, the tester does not reject in this step, and furthermore,x i = x * i for every i ∈ I. Next observe that for any choice of x (1) , . . . , x (m) (as selected in the third step of the algorithm), the following also holds by applying the additive Chernoff bound and a union bound: The probability, taken over the choice of I, that for some j we have that |{i ∈ I : x , is a small constant. (Note that here we are referring to x * and notx). By combining the two observations we get that the tester accepts with high constant probability (taken both over the choice of I and over the choice of the sample selected in the second step). Now suppose that X is -far from D per η,δ . For each i ∈ [n], let x i denote the more likely value of X i . Then one of the following two conditions must hold (or else we get that X is -close to D per η,δ (x )). 2. The probability that X is (δ + 0.2 )-H-far from x = x 1 · · · x n is at least 0.3 .
Suppose that the first condition holds. Then with high constant probability over the choice of I, for at least one of the indices i ∈ I, we have that Pr[X i = x i ] > η + /4. Assuming this event holds, with high constant probability over the choice of the sample selected in the second step of the algorithm, the algorithm rejects in this step. Furthermore, for any choice of I, if it does not contain any i for which Pr[X i = x i ] > η + /4, then with high constant probability,x i = x i for every i ∈ I. Now suppose that the second condition holds. Then with high constant probability, for at least one of the sample strings x (j) selected in the third step of the algorithm, x (j) is (δ + 0.2 )-H-far from x . Conditioned on this event, with high constant probability over the choice of I, we have that |{i ∈ I : x (j) i = x i }| > (δ + 0.1 ) · |I|. We hence conclude that if X is -far from D per η,δ , then with high constant probability, the algorithm rejects (either in the second step or in the third step).
Properties of the ideal object. For η and δ as above, and for a property of n-bit long strings Π, we let D per,Π η,δ = x * ∈Π D per η,δ (x * ). Building on the proof of Theorem 4.1, we get Theorem 4.2 (testing noisy versions of a string in a predetermined set): Let η ∈ [0, 0.5) and δ ∈ [0, 1] be constants, and Π be a property of n-bit strings that can be tested using Q(n, ) queries. Then, the property D per,Π η,δ can be tested using poly(1/ ) + O(Q(n, e/2)) queries.
Proof: We combine the tester presented in the proof of Theorem 4.1 with an emulation of the tester for Π. Specifically, each query made by the latter tester is emulated by making corresponding queries to O(log Q(n, e/2)) samples of the tested distribution (and taking a majority vote). Evidently, any distribution X in D per,Π η,δ is accepted with high probability, and in case X is /2-far from D per η,δ it is rejected with high probability (by the first step). Hence, we are left with the case that X is /2-close to D per η,δ (x * ) for some s that is /2-H-far from Π (since otherwise X is -close to D per,Π η,δ ). 25 Consequently, the emulated tester of Π will rejected with high probability.

Random cyclic shifts
For any string x * ∈ {0, 1} n , we consider all distributions that are obtained by random (cyclic) shifts of the string x * ; that is, D cyc (x * ) contains the distribution X if there exists a (related) random variable J ∈ {0, 1, . . . , n − 1} such that, for every j, with probability Pr[J = j] it holds that X i = x * (i+j)n for every i ∈ [n], where (i + j) n denotes i + j if i + j ∈ [n] and i + j − n otherwise (i.e., i + j > n). Analogously to Theorem 4.2, we can also test the ideal string for a predetermined property provided that this property is invariant under cyclic shifts.
Proof: For the sake of the presentation, we describe a slightly simpler tester that makes O( √ n/ 2 ) queries; the claimed tester can be obtained by employing Levin's Economical Work Investment Strategy [13,Sec. 8.2.4].
The tester is given oracle access to t = O(1/ ) samples, denoted x (1) , . . . , x (t) , and consists of checking that each x (i) is a cyclic shift of x (1) . Denoting the two strings by x and y, we check whether y is a cyclic shift of x by selecting m = O( √ n · log t) random position indices, denoted p 1 , . . . , p m , and = O( −1 · log(n/ )) offsets, denoted o 1 , . . . , o , querying both strings at locations (p j + o k ) n for every j ∈ [m] and k ∈ [ ], and accepting if and only if there exists j, j ∈ [m] such that x (p j +o k )n = y (p j +o k )n for every k ∈ [ ].
We first consider the case that X is in D cyc ; that is, suppose that X is in D cyc (x * ) for some x * ∈ {0, 1} n . In this case, each of the samples (i.e., x (i) ) is a cyclic shift of x * ; that is, for each i ∈ [t], there exists a shift σ i such that x . We conclude that, in this case (regardless of the choice of the x (i) 's and the o k 's), the tester accepts with probability at least 2/3. Suppose, on the other hand, that X is -far from D cyc . Fixing the first sample, denoted x (1) , it follows that with probability at least /2 it holds that (a sample of) X is ( /2)-H-far from being a shift of x (1) . Hence, with probability at least 0.9 over the choice of the x (i) 's, there exists an i ∈ [t] such that x (i) is ( /2)-H-far from being a shift of x (1) . It follows that, for each choice of p 1 , . . . , p m ∈ [n] and every j, j ∈ [m], it holds that |{k ∈ [n] : x (1) (k+p j )n = x (i) (k+p j )n }| > n/2, and consequently Recalling that m = O( √ n · log t) and using a suitable = O( −1 · log(n/ )), it follows that with probability at least 1−m 2 ·exp(− · ) > 0.9 (over the choice of o 1 , . . . , o k ) the tester detects that x (i) is not a cyclic shift of x (1) . Therefore, in this case (i.e., X -far from D cyc ), the tester rejects with probability at least 2/3. This completes the analysis of the slightly simpler tester, which performs t · m · = O( √ n/ 2 ) queries. The claimed tester (which performs O( √ n/ queries), follows by observing that if X is -far from D cyc , then, for some r ∈ [log(1/ )], with probability at least 2 r · /O(log(1/ )) it holds that (a sample of) X is 2 −r -H-far from being a shift of x (1) . Hence, it suffices to have O(log(1/ )) iterations such that in the r-th iteration we use t = O(1/ )/2 r and = O(2 r log(n/ )).
The property D cyc does not impose any constraint on the distribution over shifts. We next consider a natural variant, where this distribution is uniform. can be tested using O(n 2/3 / 3 ) queries.
Theorem 4.4 is proved by a reduction to a more general problem, and it is indeed possible that a more efficient tester exists.
Proof: We reduce the current problem to testing the equality between two distributions over {0, 1} n such that one of the distributions has support size at most n, while noting that a tester for the latter problem is provided in Theorem 5.2. Specifically, given s samples, denoted x (1) , . . . , x (s) , of a distribution X over n-bit strings, we consider the distribution Y def = D Ucyc (x (1) ), and test equality between X and Y , where we emulate samples to X by using x (2) , . . . , x (s) , and emulate samples to Y by using (random shifts of) x (1) . Note that Y has support of size at most n, which suffices when using the furthermore clause of Theorem 5.2.
The complexity of our tester equals the complexity of the tester of Theorem 5.2, and its analysis reduces to the latter. Specifically, if X is in D Ucyc , then, for every possible x (1) drawn from X, it holds that X ≡ D Ucyc (x (1) ), and it follows that our tester accepts (whp). On the other hand, if X is -far from D Ucyc , then for every x * it holds that X is -far from D Ucyc (x * ), and it follows that our tester rejects (whp).

Random isomorphic copies of a graph
Using a sublinear-query tester for graph isomorphism, we can adapt the ideas underlying the proof of Theorem 4.3 to test distributions of strings that describe the adjacency matrices of random isomorphic copies of a graph. That is, we consider n-bit long strings that describe the adjacency matrices of √ n-vertex graphs. Specifically, for every string x * ∈ {0, 1} n , we consider the graph G x * described by x * and any distribution on isomorphic copies of G x * ; that, D iso (x * ) contains the distribution X if X is a distribution over strings that describe graphs that are isomorphic to G x * .
Recall that testing isomorphism of k-vertex graphs in the dense graph model, which uses the adjacency matrix representation, has query complexity poly(1/ ) · O(k 5/4 ); see [11], where the dependence on is mentioned at the end of Section 1. In contrast, the query complexity of the tester of [17] is k 1+o(1) provided that = ω((log log k)/(log k) 1/2 ). Note that testing isomorphism in the dense graph model is reducible to testing D iso in the DoHO model. We also mention that, analogously to Theorem 4.2, one can also test the ideal string for a predetermined graph property (since a graph property is invariant under graph isomorphism).
Proof: Analogously to the proof of Theorem 4.3, the tester takes t = O(1/ ) samples, denoted x (1) , . . . , x (t) , and checks whether all x (i) 's describe graphs that are isomorphic to the graph described by x (1) . Hence, for each i ∈ {2, . . . , t}, we check whether G x (i) is isomorphic to G x (1) , by invoking a graph isomorphism tester for the dense graph model. Specifically, we use the tester presented in [11], while setting the proximity parameter to /2 (and the error probability of the test to o( )). Note that if X is in D iso , then each invocation of the isomorphism test accept with probability 1 − o( ). On the other hand, if X is -far from D iso , then, for any choice of x (1) and every i ∈ {2, . . . , t}, with probability at least /2 it holds that G x (i) is /2-far from being isomorphic to G x (1) , where the latter distance is in the dense graph model. Hence, the corresponding invocation of the graph isomorphism tester rejects (whp), and so does our tester.
What about the bounded-degree graph model? We could have adapted the proof strategy of Theorem 4.5 to bounded-degree graphs that are represented by their incidence functions. However, unfortunately, we do not know of a sublinear-query tester for graph isomorphism in that model (see [14]).

Tuples of Distributions
Our notion of testing properties of distributions over huge objects (as captured by Definition 1.2), extends easily to testing tuples of such distributions.

The definition
Following the convention stated in Section 1.4, we refer to distributions via the corresponding random variables. Definition 5.1 (testing properties of t-tuples of huge distributions): Let D be a property of t-tuples of distributions, which are each as in Definition 1.2, and s : N × (0, 1] → N. A tester, denoted T , of sample complexity s for the property D is a probabilistic machine that, on input parameters n and , and oracle access to a sequence of s(n, ) samples drawn from each of the t unknown distributions X (1) , . . . , X (t) ∈ {0, 1} n , satisfies the following two conditions.
2. The tester rejects tuples that are far from D: If (X (1) , . . . , X (t) ) is -far from D (i.e., for every (Y 1 , . . . , Y t ) in D the average distance (according to Definition 1.1) between X j and Y j , where j ∈ [t], is greater than ), then The query complexity of such tester is defined as in the case of testing a single distribution (i.e., t = 1). Indeed, Definition 1.2 is a special case of Definition 5.1 (i.e., t = 1).

Testing equality
This is indeed the archetypal example for the case of t = 2. Using any tester for the standard model, we obtain a tester for the DoHO model by querying all samples at a logarithmic (in the support size) number of locations. Hence, this tester requires an upper bound on the size of the supports of the tested distributions.

Proof:
The key observation is that if X is -far from Y (according to Definition 1.1), then, with high probability over the choice of a random O( −1 log m)-subset I ⊂ [n], the total variation distance between X I and Y I is at least 0.3 . This observation is proved in a few steps.
We start by letting x (1) , . . . , x (m ) (resp., y (1) , . . . , y (m ) ) denote the elements in the support of X (resp., Y ), where m ≤ m (resp., m ≤ m). Next, we note that for every j ∈ [m ] and k ∈ [m ], when selecting uniformly an O(t/ )-subset I, with probability at least 1−2 −t , the relative Hamming distance between x (We stress the order of quantifiers: With high probability over the choice of I, Eq. (7) We observe that for any choice of µ that maps X to Y (i.e., j∈ for every k ∈ [m ]), the main sum in the r.h.s of Eq. (10) is lower-bounded by the distance between X and Y (according to Definition 1.1; cf. Eq. (14)). Recalling that the latter distance is greater than , it follows that (for any µ that maps X to Y ) the l.h.s of Eq. (10) is greater than 0.5· −0.2 = 0.3 . On the other hand, we observe that the minimum over µ's that map X I to Y I of the l.h.s of Eq. (10) captures the distance between X I and Y I (according to Definition 1.1), which lower-bounds the total variation distance between X I and Y I . Hence, with probability 1 − o(1) over the choice of I, the total variation distance between X I and Y I is greater than 0.3 . In light of the above, our tester proceeds as follows. For s = O(max( −4/3 m 2/3 , −2 m 1/2 )), given oracle access to s samples, denoted x (1) , . . . , x (s) and y (1) , . . . , y (s) , of each of the two distributions, the tester selects an O( −1 log m)-subset I ⊂ [n] uniformly at random, and queries each sample at the bits in I. Denoting the resulting strings (i.e., the restrictions of the sampled strings to I) by Note that if X ≡ Y , then X I ≡ Y I always holds, and the standard tester accepts (whp). On the other hand, by the foregoing observation, if X is -far from Y (according to Definition 1.1), then, with high probability over the choice of I, it holds that X I is 0.3 -far from Y I (in total variation distance), and in this case the standard tester rejects (whp).
The foregoing establishes the main claim. Turning to the furthermore claim, note that we cannot afford a union bound over where the term 0.05 accounts for contribution of the j's that do not satisfy Eq. (11). That is, Eq. (8)&(9) is replaced by Eq. (12)& (13). Proceeding as in the proof of the main claim, we infer that if X is -far from Y (according to Definition 1.1), then, with high probability over the choice of I ∈ [n] O(t/ ) , it holds that X I is 0.25 -far from Y I (in total variation distance). The furthermore claim follows by (recalling that t = O(log(m/ )) and) observing that the equality tester (for the standard model) of [10] works also when the support size of only one of the tested distributions is upper-bounded. 26 Specifically, using the presentation of [13,, we observe that the support size is only used in the proof of [13,Cor. 11.21], when upper-bounding the total variation distance between two distributions by the norm-2 of their difference. But essentially the same upper bound (on the total variation distance) holds also if only the support of one of the distributions is upper-bounded. 27 which equals the total variation distance. Hence, the earth mover distance (w.r.t inequality) is upper-bounded by the total variation distance. On the other hand, the earth mover distance (w.r.t inequality) is lower-bounded by the total variation distance, since the latter measures the probability mass that has to be moved from S = {z : A.2 Ramifications regarding Theorem 1.4 We first restate the basic claim of Theorem 1.4 and improve it in the special case of "nice" query complexity bounds. Specifically, we prove the following. Proof: Recall that the proof of the main claim relied on the observation that if the tested distribution P is -far from D n (according to Definition 1.1), then, x ∼ P is /2-H-far from Π n with probability at least /2. (This is the case, since otherwise, letting f (x) be a string in Π n that is closest to x in Hamming distance yields a distribution Q(y) = x∈f −1 (y) P (x) that is in D Π and is ( 2 · 1 + (1 − 2 ) · 2 )-close to P .) The furthermore claim is proved by employing Levin's Economical Work Investment Strategy [13,Sec. 8.2.4]. Specifically, the key observation is that there exists i ∈ [ log 2 (16/ ) ] such that with probability at least 2 −i /(i + 3) 2 it holds that x ∼ X is 2 i−3 · -H-far from Π n . In this case, the query complexity is i≤ O(i 2 · 2 i ) · q(n, 2 i−3 ), where = log 2 (16/ ) . Using q(n, 2 i−3 ) ≤ (2 i−3 ) −c ·q(n, ), the foregoing sum is upper-bounded by is i≤ i 2 ·2 −(c−1)·i ·O(q(n, )), and the claim follows.
Generalization. Towards the following generalization of Theorem A.2, we consider a generalization of property testing of strings. In this generalization the property Π n is partitioned into m = m(n) parts and, when accepting, the tester also indicates the index of the part in which the object resides. For example, the set of low-degree multi-variate polynomials can be partitioned according to their value at a fixed point, and coupled with a generalized tester of low complexity. Generalizing Theorem A.2, we get - if X consists of selecting an index i ∈ [m] according to some distribution in D and outputting an element selected according to an arbitrary distribution that is supported by a subset of Π (i) n . Then, the query complexity of testing C is at most q such that q (n, ) = O(s(m(n), 0.3 )) · q(n, 0.3 ).
In particular, if Π n = ∪ i∈[m] Π (i) n such that each Π (i) n is testable with q(n, ) queries and the Π (i) n 's are δ-H-far apart, then we can obtain a generalized tester of query complexity O(m(n)) · q(n, δ) + O(q(n, )) for Π n .
Proof: We combine the tester for Π, denoted T , with the tester for D, while invoking both with proximity parameter /2, and reducing the error probability of T to o(1/s(m(n), 0.3 )). Hence, when invoked on input (n, 0.3 ) and given oracle access to x ∈ {0, 1} n , with probability at least 1 − o(1/s(m(n), 0.3 )), the tester T outputs i if x ∈ Π The key observation is that if X is -far from C (according to Definition 1.1), then either X is 0.7 -far from being distributed over Π n (according to Definition 1.1) or χ(X) is 0.3 -TV-far from D. Hence, we get an adequate tester that, on access to the samples x (1) , . . . , x (s) , where s = s(m(n), 0.3 ), invokes T on each of these samples, obtaining the answers a 1 , . . . , a s ∈ {0, 1, . . . , m(n)}, rejects if any of these a i 's equals 0, and outputs the verdict of the distribution tester (i.e., the D-tester) on (a 1 , . . . , a s ) ∈ [m(n)] s otherwise.
To see that the foregoing tester is correct, note that if X in C, then X = Y I such that I is in D and each Y i is supported by Π (i) n . It follows that, in this case, X is accepted with high probability. On the other hand, if X is accepted with high probability, then χ(X) is 0.3 -TV-close to a distribution in D, and, with probability at least 1 − 0.3 over the choice of x ∼ X, it holds that x is 0.3 -H-close to Π (χ(x)) n . It follows that X is -close to C.

A.3 Towards a stronger version of Theorem 1.6
Recall that, for any property D that is closed under mapping, Theorem 1.6 upper-bounds the query complexity of testing D in the DoHO model in terms of the sample complexity of testing D in the standard model. This leaves open the question of whether the query complexity of testing D in the DoHO model can be similarly upper-bounded in terms of the sample complexity of testing D in the DoHO model, which may be lower than the sample complexity of testing D in the standard model. A possible avenue towards establishing such a result is resolving positively the following open problem.
Open Problem A.4 (preservation of distances under a random relabeling): Suppose that D is a property of distributions over n-bit strings that is closed under mapping. Is it the case that if X isfar from D, then, with high probability over the choice of a random bijection π : {0, 1} n → {0, 1} n , it holds that π(X) is Ω( )-far from D? We stress that the distances here are according to Definition 1.1 and that the hidden constant in the Ω-notation is universal.
A positive answer to Problem A.4 would allow to convert a tester for D in the DoHO model into one that only considers the collision pattern among the samples. Specifically, given a collision pattern among s samples, the latter tester will generate at random a sequence of s samples that fits the given collision pattern, and invoke the original tester on this sequence of samples. In such a case, we can apply the strategy used in the proof of Theorem 1.6 to the resulting tester.
We were able to establish a positive answer to Problem A.4 in the special case that the support of X has size at most 2 (0.5−Ω(1))·n . In fact, in that case, we prove a stronger result (where, for simplicity, 0.49 stands for 0.5 − Ω(1)).
Proposition A.5 (a partial answer to Problem A.4): Suppose that D is a property of distributions over n-bit strings that is closed under mapping, and that X has support size at most 2 0.49n . Then, if X is -far from D in total variation distance, then, with high probability over the choice of a random bijection π : {0, 1} n → {0, 1} n , it holds that π(X) is Ω( )-far from D according to Definition 1.1.
The restriction on X is essential; see Section 2.2.

Proof:
The key observation is that, for some constant δ > 0, with high probability over the choice of a random bijection π : {0, 1} n → {0, 1} n , it holds that the elements in the support of π(X) are at relative Hamming distance at least δ. Fixing any such π, we let C denote the support of π(X) and note that min w =w ∈C {∆ H (w, w )} ≥ δ. Assuming that X = π(X) is -close to D according to Definition 1.1, we shall show that X is 2 δ · -close to D in total variation distance. (It follows that X is 2 δ · -close to D in total variation distance.) Specifically, we consider a distribution Y in D such that X is -close to Y according to Definition 1.1, and show that a related distribution Y that is also in D is 2 δ · -close to X in total variation distance. In particular, we shall replace Y by the distribution Y of the strings in C that are closest to Y .
Claim A.5.1 (the effect of correction to the closest element of C): Suppose that X is supported on a set C such that min w =w ∈C {∆ H (w, w )} ≥ δ, and that Y is -close to X according to Definition 1.1. Then, Y = corr(Y ) is 2 δ · -close to X in total variation distance, where corr(y) denotes a string in C that is closest to y.
Recalling that in our application Y is in D, it follows that corr(Y ) is in D, since D is closed under mapping. Hence, X is 2 δ · -close to D. Proof: Intuitively, replacing Y by corr(Y ) may increase the distance from X according to Definition 1.1, but not too much (i.e., for x ∈ C, it holds that ∆ H (x , corr(y)) ≤ 2 · ∆ H (x , y)). The key observation is that the distance of Y = corr(Y ) to X (according to Definition 1.1) is due solely to strings that are at Hamming distance at least δ. This implies that the total variation distance between Y and X is at least a δ fraction of the distance between Y and X according to Definition 1.1. Furthermore, we shall show that the total variation distance between Y and X is at least a δ/2 fraction of the distance between Y and X according to Definition 1.1. The actual proof follow.
For w x ,y 's as in Definition 1.1 (i.e., the minimum sequence of non-negative numbers that satisfies y w x ,y = Pr[X = x ] and x w x ,y = Pr[Y = y]), the hypothesis means that x ,y∈{0,1} n w x ,y · ∆ H (x , y) ≤ .
Recall that w x ,y > 0 only if x ∈ C, and that corr(y) denote a string in C that is closest to y. Then, the foregoing sum equals y∈{0,1} n w corr(y),y · ∆ H (corr(y), y) + y∈{0,1} n x ∈C\{corr(y)} w x ,y · ∆ H (x , y) two distributions by the norm-2 of their difference, whereas essentially the same upper bound holds also if only the support of one of the distributions is upper-bounded. Details follow. Our starting point is [13,Alg. 11.17], which is stated as referring to distributions over [n] but can be restated as referring to distributions over U . Recall that the actions of this algorithm only depend on the s samples it obtains from each distribution, whereas s is a free parameter. The same holds with respect to the analysis of this algorithm as an L 2 -distance approximator, which is provided in [13,Thm. 11.20].
The key point is that the analysis of [13,Alg. 11.17] as a very crude L 1 -distance approximator, provided in [13,Cor. 11.21], remains valid under the relaxed hypothesis (i.e., when only one of the two distributions is guaranteed to have support size at most n). This is because this upper bound (on the support size) is only used when upper-bounding the norm-1 (of the difference between the two distributions) by the norm-2 of the same difference. We observe that we only lose a factor of two when performing the argument on the smaller of the two supports, because at least half of the norm-1 of the difference is due to this smaller support. Specifically, let p : S → [0, 1] be the probability function representing one distribution and q : U → [0, 1] be the function representing the other distribution, where S ⊆ U . Then, i∈U |p(i) − q(i)| = 2 · i∈U :p(i)>q(i) where the first inequality is due to {i ∈ U : p(i) > q(i)} ⊆ {i ∈ U : p(i) > 0} = S. (Indeed, the first and last inequalities are the place where we go beyond the original proof of [13,Cor. 11.21].) Hence, p − q 1 ≤ 2 |S| · p − q 2 , where |S| ≤ n by our hypothesis. (In the original proof of [13,Cor. 11.21], which refers to p, q : [n] → [0, 1], one gets p − q 1 ≤ √ n · p − q 2 , but the difference is immaterial.) Next, we note that [13,Cor. 11.22(2)] remains valid under the relaxed hypothesis (i.e., when only one of the two distributions is guaranteed to have support size at most n). 32 We stress that this result will only be used when β ≥ n −1/2 (as presumed in the original text).
Lastly, we turn to [13,Alg. 11.24], which is stated as referring to distributions over [n] but can be restated as referring to distributions over U , while making n a free parameter (just as m in the original text). When analyzing this algorithm, we let n denote an upper bound on the size of the support of one of the two distributions, and apply the revised [13,Cor. 11.22(2)] (which holds in this case). Using m = min(n 2/3 / 4/3 , n) (as in the original text), the current claim follows (analogously to establishing [13,Thm. 11.26]).