random
- solution 1 :
A relatively easy and correct solution is to assign a random number to every element as you see them in the stream, and then always keep the top 1,000 numbered elements at all times. This is similar to how mysql does "ORDER BY RAND()" calls. This strategy works well, and only requires additionally storing the randomly generated number for each element.
- solution 2 : reservoir sampling
http://en.wikipedia.org/wiki/Reservoir_sampling
array R[k]; // result
integer i, j;
// fill the reservoir array
for each i in 1 to k do
R[i] := S[i]
done;
// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
j := random(1, i); // important: inclusive range
if j <= k then
R[j] := S[i]
fi
done
http://gregable.com/2007/10/reservoir-sampling.html
http://alg-code.blogspot.com/2011/02/select-random-k-elements-from-linked.html
shuffle
Fisher and Yates' original method
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Pencil-and-paper_method
Modern version (Durstenfeld's version )
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Modern_method
pick a random index [0..n], swap the a[index] with a[n-1], change n= n-1 and repeat.