Random subset heuristics applied to pseudoprimes

Suppose there are N tickets in an urn of which m < N are winning tickets. You have n tries to get a winning ticket. The tickets are replaced
in the urn after each draw. Assume m and n are much smaller than N. Show that the probability of failure p_fail is approximated by:

p_fail ~= exp(-mn/N).

p_fail = (1-m/N)^n

log(p_fail) = nlog(1-m/N)

If m << N: log(1-m/N)~= -m/N therefore

log(p_fail) ~= -mn/N.

So p_fail ~= e^(-mn/N).


Let S be an N-element set and let A and B be random subsets of S with |A|=m and |B|=n.

Assume that m and n are much smaller than N. Then the probability that A and B are disjoint is approximated by
(1-m/N)^n which is approximately exp(-mn/N) by the above expression for p_fail.

So A and B are disjoint with probability approximately exp(-mn/N).

Application: likelihood that sets of pseudoprimes of two types intersect in [1, x]

Let x = 10^15 and consider on the one part the base 2 pseudoprimes of which there are 1801533 below 10^15
and on the other part Lucas pseudoprimes according to Selfridge’s Method A of which there are 2402549 per
Dana Jacobsen’s site: http://ntheory.org/pseudoprimes.html.

m=1801533, n = 2402549, N = 10^15. The likelihood that two random subsets of sizes m and n of [1,10^15) are disjoint
is approximated by exp(-mn/N) = exp(-0.004330)=0.9957 .

For x from 10^9 to 10^15, the number of base 2 pseudoprimes varies from x^0.42 to x^0.42 and
the number of Lucas pseudoprimes varies from 10^0.42 to 10^0.43 .

As long as the number of both types of pseudoprimes stays well below sqrt(x), it is unlikely (a priori) that
the two sets will intersect in the interval [1,x).

The authors of “Lucas Pseudoprimes”, Baillie and Wagstaff, write that their method of combining two tests “seems to do slightly better than mere independence”.
If that trend continues, then even when both sets of pseudoprimes have close to sqrt(x) elements in [1,x], it might not be enough to
make it likely that the two sets of pseudoprimes intersect.

Update: a follow-up question concerns what happens with three random subsets of S? Let’s say |S|=N, that A, B and C are subsets of S with |A|=m, |B|=n, |C|=p. Assume m, n, p are much smaller that N. What is the probability that the intersection of A, B, and C is the empty set?

The probability that an element `a’ of S belongs to A, B, and C is m/N n/N p/N = mnp/(N^3). If b is another element of S, then the events a in the intersection of A, B, C and b in the intersection of A, B, and C are not independent, but if we may assume independence to get an approximation, then we have a binomial distribution with parameters P = mnp/(N^3) and size = N. This in turn can be approximated by a Poisson distribution with parameter mu = PN = mnp/(N^2). Then the Poisson random variable is zero, corresponding to an empty intersection, with probability exp(-mu) = exp(-mnp/N^2).

As a consequence of this, if A, B, and C are integer random subsets of [1,x) of size x^(2/3) , then m=n=p=x^(2/3) and the Poisson approximation gives a probability of A, B and C having an empty intersection of exp(-mu) = exp(-1) ~= 0.368 . Thus, if independence is roughly true of three classes of pseudoprimes, then three independent tests are much better than two independent tests.

meditationatae's avatar

By meditationatae

Canadian

Discover more from meditationatae

Subscribe now to keep reading and get access to the full archive.

Continue reading