• solution 1 :

A relatively easy and correct solution is to assign a random number to every element as you see them in the stream, and then always keep the top 1,000 numbered elements at all times. This is similar to how mysql does "ORDER BY RAND()" calls. This strategy works well, and only requires additionally storing the randomly generated number for each element.

  • solution 2 : reservoir sampling

array R[k];    // result
integer i, j;

// fill the reservoir array
for each i in 1 to k do
    R[i] := S[i]

// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
    j := random(1, i);   // important: inclusive range
    if j <= k then
        R[j] := S[i]


Fisher and Yates' original method

Modern version (Durstenfeld's version )

pick a random index [0..n], swap the a[index] with a[n-1], change n= n-1 and repeat.


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License