Data Structure And Algorithm

## 2. good websites

#### stanford cs library

http://cslibrary.stanford.edu/

## 3. classified

### hash table

resolve collision: chain or open address.

```http://en.wikipedia.org/wiki/Open_addressing
```

one example of hashing:
the Bible has only 29,131 distinct words. We'll follow the old lore of using a prime number near that for our hash table size, and the popular multiplier of 31. Our hash function maps a string to a positive integer less than NHASH:

```typedef struct node *nodeptr;
typedef struct node {
char *word;
int count;
nodeptr next;
} node;

#define NHASH 29989
#define MULT 31

nodeptr bin[NHASH];

unsigned int hash(char *p)
{
unsigned int h = 0
for ( ; *p; p++)
h = MULT * h + *p
return h % NHASH
}
```

Using unsigned integers ensures that h remains positive.

### random

http://stackoverflow.com/questions/136474/best-way-to-pick-a-random-subset-from-a-collection

``````void gensets(int m, int n)
{
initialize set S to be empty

size = 0;
while size < m do
t = bigrand() %n
if t is not in S
insert t into S
size ++
end

print the elements of S
}```
```
``````void genknuth(int m, int n)
{
for(int i=0; i<n; i++)
{
// select m of remaining n-i
if( (bigrand() % (n-i)) < m )
{
count << i << "\n";
m --;
}
}
}```
```

#### Random number

RAND_MAX is often 32767(16 bits). If you want to make use of the full 32 bits of mantissa then:
Code:

```const unsigned long RAND_MAX_PLUS_1 = RAND_MAX + 1;
unsigned long bigRand() {
return rand() * (RAND_MAX_PLUS_1) + rand();
}
```

the new RAND_MAX will be (RAND_MAX+1)2-1
Code:

```const unsigned long BIG_RAND_MAX = RAND_MAX_PLUS_1 * RAND_MAX_PLUS_1 - 1;
float rand0to1() {
return static_cast<float>(bigRand() ) / BIG_RAND_MAX;
}
```

(You can also make BIG_RAND_MAX a float or a double instead of an unsigned long).
Note if you want to make use of the precision of double, you could require 4 calls to rand().

```int rand_between_a_and_b(int a, int b)
{
return a+ bigrand() % (b-a+1);
}
```

void srand (unsigned int seed)
This function establishes seed as the seed for a new series of pseudo-random numbers. If you call rand before a seed has been established with srand, it uses the value 1 as a default seed.

To produce a different pseudo-random series each time your program is run, do srand (time (0)).

### heap

like BST, Insertion and deletion are still O(log(n)) , but lookup becomes O(n). It is not possible to find the next-higher node to a given node in 0(log(n)) time or to print out the nodes in sorted order in
O(n) time as in a BST.

the advantage is find max_value in O(1);

misc questions

## 5. trees

AVL vs Red-Black Tree

AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also take O(log n) time for the basic operations. AVL trees perform better than red-black trees for lookup-intensive applications

Both AVL trees and red-black trees are self-balancing binary search trees, so they are very similar mathematically. The operations to balance the trees are different, but both occur in constant time. The real difference between the two is the limiting height

The AVL tree is more rigidly balanced than Red-Black trees, leading to slower insertion and removal but faster retrieval.

http://attractivechaos.wordpress.com/2008/10/02/comparison-of-binary-search-trees/

http://stackoverflow.com/questions/5288320/why-is-stdmap-implemented-as-red-black-tree

paper : The Ubiquitous B-Tree

#### Trie vs Radix vs Suffix(Trie or Tree )

- trie(prefix tree): one edge is a single character.
- Radix tree(patricia tree): no single child, condensed.
- the difference between suffix tree and suffix trie
suffix trie. Instead of representing a word, every edge in suffix trie only represents a character. Thus suffix trie needs much more spaces. If we pack all nodes
which have only one child, the suffix trie is turned into a suffix tree.

#### b-tree b+tree

B-Tree is important data structure. It is widely used in modern file systems.
Some are implemented based on B+ tree, which is extended from B-tree. B-tree
is also widely used in database systems.

Some textbooks introduce B-tree with the the problem of how to access a
large block of data on magnetic disks or secondary storage devices. It is
also helpful to understand B-tree as a generalization of balanced binary search
tree.

- B+ trees don't store data pointer in interior nodes, they are ONLY stored in leaf nodes. This is not optional as in B-Tree. This means that interior nodes can fit more keys on block of memory.
- The leaf nodes of B+ trees are linked, so doing a linear scan of all keys will requires just one pass through all the leaf nodes. A B-tree, on the other hand, would require a traversal of every level in the tree. This property can be utilized for efficient search for B+ tree since data is stored only in leafs. ### distributed cache

http://blog.csdn.net/sparkliang/article/details/5279393

### parox

http://www.cnblogs.com/esingchan/p/3917718.html

### design MinuteHourCounter

chapter 15 of book The Art of Readable Code

``````  // TrailingBucketCounter  minute_counts(/* num_buckets = */ 60, /* secs_per_bucket = */ 1)
//  TrailingBucketCounter  hour_counts( /* num_buckets = */ 60, /* secs_per_bucket = */ 60)

class TrailingBucketCounter {
ConveyorQueue buckets;
const int secs_per_bucket;

time_t last_update_time; // the last time Update() was called

// Calculate how many buckets of time have passed and Shift() accordingly.
void Update(time_t now) {
int current_bucket = now / secs_per_bucket;
int last_update_bucket = last_update_time / secs_per_bucket;
buckets.Shift(current_bucket - last_update_bucket);
last_update_time = now;
}

public:
TrailingBucketCounter(int num_buckets, int secs_per_bucket) :
buckets(num_buckets),
secs_per_bucket(secs_per_bucket) {
}

void Add(int count, time_t now) {
Update(now);
}

int TrailingCount(time_t now) {
Update(now);
return buckets.TotalSum();
}
};

// A queue with a maximum number of slots, where old data gets shifted off the end.
class ConveyorQueue {
queue<int>q;
int max_items;
int total_sum; // sum of all items in q

public:
ConveyorQueue(int max_items) : max_items(max_items), total_sum(0) {
}

int TotalSum() {
}

void Shift(int num_shifted) {
//In case too many items shifted, just clear the queue.
if (num_shifted >= max_items) {
q = queue<int>(); // clear the queue
total_sum = 0;
return;
}

//Push all the needed zeros.
while (num_shifted > 0) {
q.push(0);
num_shifted--;
}

//Let all the excess items fall off.
while (q.size() > max_items) {
total_sum -= q.front();
q.pop();
}
}