How many different numbers are on a list?

Task
Create a list of n numbers that are randomly chosen among the whole numbers from 1 to n. Guess how many different numbers are on your list. Check your guess. Repeat this action several times for different values of n. (Use as big an n as your calculator can handle without overflowing its memory. You should be able to handle lists up to 500 items.) Can you give a rule for making a “good guess”? Can such a rule be theoretically justified?

Plan of action and its execution

1. How to generate a list L1 of (for example) 200 numbers between 1 and 200?
randInt(1,200,200)→L1 ENTER

2. How to count the number of different items in L1?
SortA(L1):sum(0≠ΔList(L1))+1 ENTER

Remark
To understand this formula, create a short list L1, so you can see how many different numbers are on it. Then run your program in “piecemeal” fashion, looking at what happens during each step.

L1	ENTER
SortA(L1)	ENTER
ΔList(L1)	ENTER
0 ≠ΔList(L1)	ENTER
sum(0≠ΔList(L1))+1	ENTER

3. Gather some data (an example)
            N             the length of the list of numbers chosen randomly between 1 and N;
            D             the number of different items on the list;
            P = D/N   presented as a percentage.

            N:        D:        P:
            100      62        62%
            100      63        63%
            200      131      66%
            200      124      62%
            …        …        …
Are you ready to make a prediction?

4. The theory behind these results
(a) The probability of missing a specific number, when you choose randomly one out of the numbers between 1 and n, is (n – 1)/n = 1 – 1/n.
(b) The probability of missing a specific number when you repeat such a choice n times is
(1 – 1/n)ⁿ.
(c) For reasonably big n, (1 – 1/n)ⁿ ≈ e^-1 ≈ .3678794412
(d) Thus the expected value of the numbers missing from the list is approximately e^-1*n,
and the expected value of the numbers occurring on the list is
n – e^-1*n = (1 – e^-1)*n ≈ .63*n.
So the ratio of D/N is often close to 63%, and all predictions slightly above 60% are reasonable.
****************************************************************************
Here is an example from the list in the screenshot above:

131/200 = .655

Webpage Maintained by Owen Ramsey
Lesson Index