Network

snmp

network programming

threads

linux-threads

socks

linux-sockets

IP Address

three major types of IP addressing (classful, subnetted classful and classless).

sub-net calculation table

bits value
0000,0000 0
1000,0000 128
1100,0000 192
1110,0000 224
1111,0000 240
1111,1000 248
1111,1100 252

you can't have 1111,1110, because if just one bit of hots, it doesn't qualify the requirement that we need to reserve at least 2 host address (all zero or all ones)

http://www.tcpipguide.com/free/t_IPSubnettingStep4DeterminingSubnetIdentifiersandSu-5.htm

Subnet Formula Calculations With less Than 8 Subnet Bits

summary , suppose subnet bits = n, subnet # =N.
the formula is …N*2(8-n)

In our Class C network with 3 subnet ID bits, the formula from the table is “x.y.z.N*25= x.y.z.N*32”. (why 32? 32=2(8-3))For this network, all subnets are of the form “211.77.20.N*32”, with N going from 0 to 7. (why 7? (7=2(3)-1)). So, subnet #5 is 211.77.20.(5*32), which is 211.77.20.160, as we saw before.

Similarly, in our Class B network with 5 subnet ID bits, the formula is x.y.N*23.0 = x.y.N*8.0. (why 8? 8=2(8-5)) In this case “x.y” is 166.113. Subnet #26 would have the address 166.113.(26*8).0, or 166.113.208.0.

Subnet Formula Calculations With More Than 8 Subnet Bits

When the number of subnet bits is greater than 8, we need to use / and %

summary , suppose subnet bits = n+8, subnet # =N.
the formula is x.y.N/2n.(N%2n)*2(8-n)

Let's take as an example our Class B network and suppose that we decided to use 10 bits for the subnet ID instead of 5.
In this case, n=10-8=2, the formula is “x.y.N/4.(N%4)*64”.
- Subnet #23 in this case (N=23), would have the address “166.113.23/4.(23%4)*64. The 23/4 becomes just 5 (the fractional.75 is dropped). 23 modulo 4 is 3, which is multiplied by 64 to get 192. So the subnet address is “166.113.5.192”.
- Subnet #709 (N=709) would be “116.113.709/4.(709%4)*64, which is 116.113.177.64.

Subnet Formula Calculations With More Than 16 Subnet Bits

summary , suppose subnet bits = n+16, subnet # =N.
the formula is x.N/2(8+n).(N/2n)%256.(N%2n)*2(8-n)

Okay, If you subnet a Class A address using 21 bits for the subnet ID, you are crossing two octet boundaries. so n=21-16=5.
The formula for subnet addresses in this case, is “x.N/213.(N/25)%256.(N%25)*23= x.N/8192.(N/32)%256.(N%32)*8”.

let's take an example and see how it works, for, say, subnet #987654. The first octet is of course 21. The second octet is 987654/8192, integer division. This is 120. The third octet is (987654/32)%256. The result of the division is 30864 (we drop the fraction). Then, we take 30864%256, which yields a remainder of 144. The fourth octet is (987654%32)*8. This is 6*8 or 48. So subnet address #987654 is 21.120.144.48.

host-id

If there are more than 8 bits in the host ID, this only works for the first 255 hosts, after which you have to “wrap around” and increase the value of the third octet. Consider again subnet #13 in our Class B example, which has a base address of 166.113.104.0. Host #214 on this subnet has address 166.113.104.214,
but host #314 isn't 166.113.104.314. It is 166.113.105.58 (how do we get this? host # 314/ 256 =1 add to 104=205, and 314%256 =58 added to 0 = 58)

(host #255 is 166.113.104.255, then host #256 is 166.113.105.0, and we count up 58 more (314-256) to get to #314, 166.113.105.58).

Broadcast Address:
- The broadcast address for a subnet is always one less than the base address of the subsequent subnet.
- or all host-id bits =1 in the subnet

net-id

http://www.tcpipguide.com/free/t_IPAddressClassABandCNetworkandHostCapacities.htm
http://en.wikipedia.org/wiki/Classful_network

class range ~ network
class A 0 ~126 0xxx,xxxx
class B 128~191 10xx,xxxx
class C 192~223 110x,xxxx
class D 224~239 1110,xxxx
class D 240~255 1111,xxxx

key concepts:

- Variable Length Subnet Masking (VLSM) is a technique where subnetting is performed multiple times in iteration, to allow a network to be divided into a hierarchy of subnetworks that vary in size. This allows an organization to much better match the size of its subnets to the requirements of its networks.
example : http://www.tcpipguide.com/free/t_IPVariableLengthSubnetMaskingVLSM-3.htm

- CIDR . In essence, classless addressing means that instead of breaking a particular network into subnets, we can aggregate networks into larger “supernets”. CIDR is sometimes called supernetting for this reason: it applies the principles of subnetting to larger networks. It is this aggregation of networks into supernets that allowed CIDR to resolve the problem of growing Internet routing tables.

VLSM deals with subnets of a single network in a private organization. CIDR takes the concept we just saw in VLSM to the Internet as a whole, by changing how organizational networks are allocated by replacing the single-level “classful” hierarchy with a multiple-layer hierarchy.

private addressed

A: 10.0.0.0/8 - 10.255.255.255/8
B: 172.16.0.0/12 - 172.31.255.255/12
C: 192.168.0.0/16 - 192.168.255.255/16


routing

Core Architecture

in the early stage of internet, only two levels. A special routing protocol called the Gateway-to-Gateway Protocol (GGP) was used within the core of the internetwork, while another protocol called the Exterior Gateway Protocol (EGP) was used between non-core and core routers. The non-core routers were sometimes single, stand-alone routers that connected a single network to the core, or they could be sets of routers for an organization.

This architecture served for a while, but itself did not scale very well as the Internet grew. The problem was mainly due to the fact that there was only a single level to the architecture: every router in the core had to communicate with every other. Even with peripheral routers being kept outside the core, the amount of traffic in the core kept growing.

AS Architecture

TCP/IP Interior Routing Protocols (RIP, OSPF, GGP, HELLO, IGRP, EIGRP)

  • two of the most popular TCP/IP interior routing protocols: the Routing Information Protocol (RIP) and Open Shortest Path First (OSPF).
  • two obsolete historical interior routing protocols; ( the Gateway-to-Gateway Protocol (GGP) and the HELLO Protocol. )
  • and two proprietary ones developed by networking leader Cisco Systems. (IGRP and EIGRP)
RIP : distance-vector algorithm

RIP versions 1 and 2 for IP version 4 and RIPng (next generation) for IP version 6

On a regular basis, each router in the internetwork sends out its routing table in a special message on each of the networks to which it is connected, using UDP. simple hop-count metric used in RIP.

RIP only supports a maximum of 15 hops between destinations, making it unsuitable for very large autonomous systems, and this cannot be changed.

technicals to prevent loop

count-to-infinity problem
http://technet.microsoft.com/en-us/library/cc940478.aspx

http://en.wikipedia.org/wiki/Split_horizon

- split horizon route advertisement rule: prohibits a router from advertising a route back out the interface from which it was learned.
- split horizon with poison reverse : advertise the route back to the router that is used to reach the destination, but marks the advertisement as unreachable.

OSPF : link-state or shortest path first( (SPF).) routing algorithm

- AS <->AS router: boundary router that runs exterior protocol on external side and OSPF in internal side;
the official name is : ASBR: The Autonomous System Boundary Router

- within AS
1) all peers in a single group

2) two-level hierarchy

The two-level hierarchy consists of the lower level containing individual areas, and the higher level that connects them together, which is called the backbone and is designated as “Area 0”. The routers are no longer all peers, but in fact play different roles depending on where they are located and how they are connected. There are three different labels applied to routers in this configuration:

* Internal Routers: These are routers that are only connected to other routers or networks within a single area. They maintain an LSDB for only that area, and really have no knowledge of the topology of other areas.

* Area Border Routers(ABR): These are routers that connect to routers or networks in more than one area. They maintain an LSDB for each area of which they are a part. They also participate in the backbone.

* Backbone Routers: These are routers that are part of the OSPF backbone. By definition, this includes all area border routers, since those routers pass routing information between areas. However, a backbone router may also be a router that connects only to other backbone (or area border) routers, and is therefore not part of any area (other than Area 0).

To summarize: an area border router is always also a backbone router, but a backbone router is not necessarily an area border router

Unlike RIP, OSPF does not send its information using the User Datagram Protocol (UDP). Instead, OSPF forms IP datagrams directly, packaging them using protocol number 89 for the IP Protocol field. OSPF defines five different message types, for various types of communication:

http://www.tcpipguide.com/free/t_OSPFGeneralOperationandMessageTypes.htm

DR,BDR

DR, BDR: A Designated Router/Backup DR. The key idea with a DR and backup DR (BDR) is that they are the ones to generate LSAs(Link State Advertisements)

http://networkninja.co.za/cisco-systems/open-shortest-path-first-ospf-fundamentals-dr-and-bdr/
DRs and BDRs are only useful on multi-access links because they reduce adjacencies. The concept of a DR is not used nor usefull on point-to-point links because there can only be one adjacency.

What 2 types of network have DR and BDR assigned? -- Broadcast & NBMA (no Point-to-Point/Multipoint types)

This means that instead of exchanging routing information with all other routers the routers exchange information with the DR and BDR. Then in turn the DR and BDR relay the information(LSDB) to other routers.

DR is selected based on priority+ IP address. NOTE: designated routers do not preempt, that means DRs are inherently seen as stable entities once elected into the position, even if a Router joins a network with a “greater” priority the DR will not change.

IS-IS

IS-IS is an Interior Gateway Protocol (IGP) . Both IS-IS and OSPF are link state protocols, and both use the same Dijkstra algorithm for computing the best path through the network. As a result, they are conceptually similar.

The protocol was defined in ISO/IEC 10589:2002 as an international standard within the Open Systems Interconnection (OSI) reference design. IS-IS is not an Internet standard, however IETF republished the standard in RFC 1142 for the Internet community. so the protocol is in the same stack level as IP, not as OSPF that packet is encapsulated in IP packet. a fact that may have allowed OSPF to be more widely used.

OSPF had achieved predominance as an IGP (Interior Gateway Protocol) routing protocol, particularly in medium-to-large-sized enterprise networks. IS-IS, in contrast, remained largely unknown by most network engineers and was used predominantly in the networks of certain very-large service providers.Detailed analysis tends to show that OSPF has traffic tuning features that are especially suitable to enterprise networks while ISIS has stability features especially suitable to ISP infrastructure.

others

GGP : distance-vector algorithm
HELLO protocol : uses a distance-vector algorithm. however, is that unlike RIP and GGP, HELLO does not use hop count as a metric. Instead, it attempts to select the best route by assessing network delays and choosing the path with the shortest delay.
IGRP : distance-vector routing protocol . IGRP overcomes two key limitations of RIP: the use of only hop count as a routing metric, and the hop count limit of 15.
RIP only allows the cost to reach a network to be expressed in terms of hop count, IGRP provides a much more sophisticated metric. In IGRP, the overall cost to reach a network is computed based on several individual metrics, including internetwork delay, bandwidth, reliability and load.
EIGRP is still a distance-vector protocol,

IGRP/EIGRP forms IP datagrams directly, packaging them using protocol number 9/88 for the IP Protocol field.

TCP/IP Exterior Gateway/Routing Protocols (BGP and EGP)

BGP : a path-vector algorithm, or Hybrid Routing Protocol Algorithms

BGP uses TCP as a reliable transport protocol, so it can take advantage of the many connection setup and maintenance features of that protocol.

Routers in the AS that are connected only to other routers within the AS are usually called internal routers, while those that connect to other ASes are called border routers in BGP, or can be named as boundary routers in OSPF.

EGP

Exterior Gateway Protocol (EGP). This is an obsolete protocol that was used for communication between non-core routers and the router core in the early Internet, and is described briefly for both completeness and historical interest.


TCP/IP in details

TCP sliding window

sequence number

initial sequence number was exchanged, then it incremented by the size of data sent in bytes.

http://www.tcpipguide.com/free/t_TCPSlidingWindowDataTransferandAcknowledgementMech-5.htm

sliding window

1. The window closes as the left edge advances to the right. This happens when data is sent and acknowledged. and it will cause the 2.
2. The window opens when the right edge moves to the right, allowing more data to be sent. This happens when the receiving process on the other end reads acknowledged data, freeing up space in its TCP receive buffer.
3. The window shrinks when the right edge moves to the left. The Host Requirements RFC strongly discourages this, but TCP must be able to cope with a peer that does this.

http://www.uic.rsu.ru/doc/inet/tcp_stevens/tcp_bulk.htm#20_3

congestion window

Slow start adds another window to the sender's TCP: the congestion window, called cwnd. When a new connection is established with a host on another network, the congestion window is initialized to one segment (i.e., the segment size announced by the other end). [Client maintains a congestion window (cwnd).
Initially the window is set to lower of the maximum TCP segment size and receiver's allowed window size. In most cases the segment size is smaller than receiver window, thus cwnd is set to the maximum TCP segment size, for instance MSS: 512, receiver's allowed windows size: 65535].

Each time an ACK is received, the congestion window is increased by one segment, (cwnd is maintained in bytes, but slow start always increments it by the segment size.) The sender can transmit up to the minimum of the congestion window and the advertised window. This behavior continues until the congestion window size (cwnd) reaches the size of the receiver's advertised window or until a loss occurs.

When a loss occurs half of the current cwnd is saved as a Slow Start Threshold (SSThresh) and slow start begins again from its initial cwnd. Once the cwnd reaches the SSThresh TCP goes into congestion avoidance mode where each ACK increases the cwnd by SS*SS/cwnd. This results in a linear increase of the cwnd.

The congestion window is flow control imposed by the sender, while the advertised window is flow control imposed by the receiver.


layer 2

True original Ethernet vs 802.3

http://www.erg.abdn.ac.uk/users/gorry/eg3567/lan-pages/llc.html

The IEEE 802.3 standard for Ethernet defines an additional data link layer protocol called the Logical Link Control (LLC) protocol. This operates on top of the MAC protocol defined in the original Ethernet standard (the "Blue Book").

llc-arch.gif

- 802.3 covers an entire set of CSMA/CD networks,
- 802.4 covers token bus networks,
- 802.5 covers token ring networks.

Common to all three of these is the 802.2 standard that defines the logical link control (LLC) common to many of the 802 networks.

IEEE 802.2/802.3 encapsulation (RFC 1042) vs Ethernet encapsulation (RFC 894).

original true Ethernet :
llc.gif

802.3 encapsulation:
mac.gif

Fortunately none of the valid 802 length values is the same as the Ethernet type values, making
the two frame formats distinguishable.

http://www.uic.rsu.ru/doc/inet/tcp_stevens/link_lay.htm

HDLC, SLIP,PPP

http://www.lincoln.edu/math/rmyrick/ComputerNetworks/InetReference/64.htm

Most serial links use HDLC or some varient of it. Dialup modem lines first used SLIP, but PPP is now preferred. Both are HDLC-based, but PPP is more elaborate. ISDN's D channel uses a slightly modified version of HDLC. Cisco routers' default serial link encapsulation is HDLC.

HDLC:
hdlc.png

SLIP:
slip.png

PPP:
ppp.png

in fact, Using the link control protocol, most implementations negotiate to omit the constant address and control fields and to reduce the size of the protocol field from 2 bytes to 1 byte. If we then compare the framing overhead in a PPP frame, versus the 2-byte framing overhead in a SLIP frame , we see that PPP adds three additional bytes: I byte for the protocol field, and 2 bytes for the CRC

Address Field—Contains the binary sequence 11111111. As PPP directly connects two nodes in a network, the
address field has no particular meaning.

PPP provides the following advantages over SLIP: (1) support for multiple protocols on a
single serial line, not just IP datagrams, (2) dynamic negotiation of the IP address for each end (using the IP network control protocol) ??? i don't understand this

type MTU bytes
PPP 296
Ethernet 1500
802.3 1496

SPT

SPT FLASH animation

http://www.cisco.com/image/gif/paws/10556/spanning_tree1.swf

4 states.

A switch port on a 2960 comes up with a default configuration on VLAN 1. What happens from the perspective of spanning-tree?

* First, the port comes up on blocking mode. This is to make sure that loops aren’t created without first listening to the network to see what’s going on.
* Next, if the port may be a root or designated port, the port is moved to the listening state. In this state, the port can only receives BPDUs only. so it can discover the other switches participating in STP.
* After the forwarding delay, the port goes into the learning state. In this state, the port can send and receive BPDUs.
* After the forwarding delay again, the port goes into the forwarding state. The port can now send and receive data.

http://www.cisco.com/univercd/cc/td/doc/product/rtrmgmt/sw_ntman/cwsimain/cwsi2/cwsiug2/vlan2/stpapp.htm
http://aconaway.com/2009/05/21/bcmsn-notes-stp-states/

misc

- Bridge ID is a field of 8 bytes, that includes a 2-byte priority and a 6-byte MAC address.
- The switch looks at three components of the BPDU to determinate the root port:

  • Lowest path cost to root bridge
  • Lowest sender Bridge ID
  • Lowest port priority/port ID

  1. traffic is classified
  2. Once traffic has been classified the next step is to ensure that it receives special treatment in the routers. This brings into focus scheduling and queuing.(WFQ,)
  3. Traffic shaping becomes necessary when Layer 3 traffic must be shaped to a desired set of rate parameters to enforce a maximum traffic rate. The result will be a smooth traffic stream1. Traffic shaping queues and forwards data streams (as opposed to dropping excess traffic) so as to conform to agreed upon Service Level Agreements (SLAs).
  4. Congestion avoidance could be defined as the ability to recognize and act upon congestion on the output direction of an interface so as to reduce or minimize the effects of that congestion.

flow control and QoS

flow control : manage the transmit of traffic between two devices; Flow control is concerned with pacing the rate at which frames or packets are transmitted. The ultimate goal of all flow-control mechanisms is to avoid receive buffer overruns, which improves the reliability of the delivery subsystem. By contrast, QoS is concerned with the treatment of frames or packets after they are received by a network device or end node. wrt queue management and queue schedule.

Ethernet level

Currently, no functionality is defined in the Ethernet specifications for the Pause Opcode to interact with the Priority field. So, the Pause Opcode affects all traffic classes simultaneously

flow control

Pause Opcode
send a MAC control package to broadcast address
http://en.wikipedia.org/wiki/Ethernet_flow_control

Tail-drop and the Pause Opcode often are used in concert. For example, when a receive queue fills, a Pause Opcode may be sent to stem the flow of new frames. If additional frames are received after the Pause Opcode is sent and while the receive queue is still full, those frames are dropped

Qos

802.1q 7 Cos.

  1. Network control information
  2. Voice applications
  3. Video applications
  4. Controlled load applications
  5. Excellent effort applications
  6. Best effort applications
  7. Background applications

IP level

Flow control

IP employs several flow-control mechanisms. Some are explicit, and others are implicit. All are reactive. The supported mechanisms include the following:

  • Tail-drop
  • Internet Control Message Protocol (ICMP) Source-Quench

Despite the fact that ICMP Source-Quench packets can be sent before a queue overrun occurs, ICMP Source-Quench is considered a reactive mechanism because some indication of congestion or potential congestion must trigger the transmission of an ICMP Source-Quench message. Thus, additional packets can be transmitted by the source nodes while the ICMP Source-Quench packets are in transit, and tail-drop can occur even after ICMP Source-Quench packets are sent.

  • Active Queue Management (AQM)
    • RED
    • WRED
    • DiffServ Compliant WRED

RFC 2309 defines the concept of AQM. implicit and reactive. Rather than merely dropping packets from the tail of a full queue, AQM employs algorithms that attempt to proactively avoid queue overruns by selectively dropping packets prior to queue overrun. The first such algorithm is called Random Early Detection (RED). More advanced versions of RED have since been developed. The most well known are Weighted RED (WRED) and DiffServ Compliant WRED.

Note that in the most generic sense, sending an ICMP Source-Quench message before queue overrun ocurs based on threshold settings could be considered a form of AQM. However, the most widely accepted definition of AQM does not include ICMP Source-Quench.

  • Explicit Congestion Notification (ECN)

When congestion is experienced by a packet in transit, the congested router sets the two ECN bits to 11. The destination node then notifies the source node. When the source node receives notification, the rate of transmission is slowed. However, ECN works only if the Transport Layer protocol supports ECN.

IP QoS

  • stateful model:Integrated Services Architecture (IntServ)

The IntServ model is characterized by application-based signaling that conveys a request for flow admission to the network. The signaling is typically accomplished via the Resource Reservation Protocol (RSVP).

  • stateless model is the Differentiated Services Architecture (DiffServ).

The DiffServ model does not require any signaling from the application prior to data transmission. Instead, the application "marks" each packet via the Differentiated Services Codepoint (DSCP) field to indicate the desired service level.

3 bits Precedence in ToS.
Routine Set routine precedence (0)
Priority Set priority precedence (1)
Immediate Set immediate precedence (2)
Flash Set Flash precedence (3)
Flash-override Set Flash override precedence (4)
Critical Set critical precedence (5)
Internet Set internetwork control precedence (6)
Network Set network control precedence (7)
IP Precedence 6 and 7 are reserved for network information (routing updates, hello packets, and so on). This leaves 6 remaining precedence settings for normal IP traffic flows

TCP level

TCP flow control

Congestion can be detected implicitly via TCP's acknowledgement mechanisms or timeout mechanisms (as applies to dropped packets) or explicitly via ICMP Source-Quench messages or the ECE bit in the TCP header.

When ECN is implemented, 1)TCP nodes convey their support for ECN by setting the two ECN bits in the IP header to 10 or 01. 2)A router may then change these bits to 11 when congestion occurs. Upon receipt, the destination node recognizes that congestion was experienced. The destination node then notifies the source node by setting to 1 the ECE bit in the TCP header of the next transmitted packet.

the primary TCP flow-control algorithms include:

  • slow start,
  • congestion avoidance,
  • fast retransmit,
  • fast recovery.

TCP QoS

TCP interacts with the QoS mechanisms implemented by IP. Additionally, TCP provides two explicit QoS mechanisms of its own: the Urgent and Push flags in the TCP header. The Urgent flag indicates whether the Urgent Pointer field is valid. When valid, the Urgent Pointer field indicates the location of the last byte of urgent data in the packet's Data field. The Urgent Pointer field is expressed as an offset from the Sequence Number in the TCP header. No indication is provided for the location of the first byte of urgent data. Likewise, no guidance is provided regarding what constitutes urgent data. An ULP or application decides when to mark data as urgent. The receiving TCP node is not required to take any particular action upon receipt of urgent data, but the general expectation is that some effort will be made to process the urgent data sooner than otherwise would occur if the data were not marked urgent.

As previously discussed, TCP decides when to transmit data received from a ULP. However, a ULP occasionally needs to be sure that data submitted to the source node's TCP byte stream has actually be sent to the destination. This can be accomplished via the push function. A ULP informs TCP that all data previously submitted needs to be "pushed" to the destination ULP by requesting (via the TCP service provider interface) the push function. This causes TCP in the source node to immediately transmit all data in the byte stream and to set the Push flag to one in the final packet. Upon receiving a packet with the Push flag set to 1, TCP in the destination node immediately forwards all data in the byte stream to the required ULPs (subject to the rules for in-order delivery based on the Sequence Number field). For more information about TCP QoS, readers are encouraged to consult IETF RFCs 793 and 1122.

Traffic Policying/Shaping

The previous sections covered ways you can queue different flows of traffic and then prioritize those flows. That is an important part of QoS. Sometimes, however, it is necessary to actually regulate or limit the amount of traffic an application is allowed to send across various interfaces or networks.

These features come in two different flavors: rate-limiting tools such as CAR, and shaping tools such as GTS or FRTS.
The main difference between these two traffic-regulation tools is that rate-limiting tools drop traffic based upon policing, and shaping tools generally buffer the excess traffic while waiting for the next open interval to transmit the data.

Cisco IOS QoS software includes two types of traffic shaping: GTS and FRTS. Both traffic-shaping methods are similar in implementation, although their command-line interfaces differ somewhat and they use different types of queues to contain and shape traffic that is deferred.

If a packet is deferred, GTS uses a WFQ to hold the delayed traffic. FRTS uses either a CQ or a PQ to hold the delayed traffic, depending on what you configured. As of April 1999, FRTS also supports WFQ to hold delayed traffic.

Policing literally means to drop excess traffic, shaping on the other hand allows the excess traffic to be queued.

Congestion Avoidance

As discussed previously, WFQ, PQ(priority queue), and CQ(custom queue) mechanisms prioritize the traffic that is of highest importance for bandwidth management.

__Congestion avoidance works on a similar problem from a completely different angle. you avoid further congestion by detecting congestion pattern and dropping packets from different flows, which causes applications to slow the amount of traffic being sent. This avoids what is known as global synchronization, which occurs when many IP TCP flows begin transmitting and stop transmitting at the same time. This is caused by the lack of QoS in a service provider's backbone.

Random Early Detection (RED) is a congestion avoidance mechanism

interested issue

TCP/UDP checksum

http://www.tcpipguide.com/free/t_TCPChecksumCalculationandtheTCPPseudoHeader-2.htm

NAT

compatibility issue

http://www.tcpipguide.com/free/t_IPNATCompatibilityIssuesandSpecialHandlingRequirem.htm

types

- Traditional NAT(unidirectional NAT) is designed to handle only outbound transactions; clients on the local network initiate requests and devices on the Internet send back responses.
- Bidirectional NAT, Two-Way NAT and Inbound NAT. However, in some circumstances, we may want to go in the opposite direction. That is, we may want to have a device on the outside network initiate a transaction with one on the inside. To permit this, we need a more capable type of NAT than the traditional version. This enhancement goes by various names, most commonly Bidirectional NAT, Two-Way NAT and Inbound NAT. All of these convey the concept that this kind of NAT allows both the type of transaction we saw in the previous topic and also transactions initiated from the outside network.

http://www.tcpipguide.com/free/t_IPNATBidirectionalTwoWayInboundOperation-2.htm
1) static mapping.
2) DNS

- Port-Based NAT, “overloaded” NAT, Network Address Port Translation (NAPT) and Port Address Translation (PAT).
http://www.tcpipguide.com/free/t_IPNATPortBasedOverloadedOperationNetworkAddressPor.htm

- Overlapping NAT or “Twice NAT”

http://books.google.com/books?id=Tvj5V_ypR2kC&pg=RA1-PA193&lpg=RA1-PA193&dq=twice+NAT&source=bl&ots=BhtPb-d7qp&sig=D-hehJ1eDyG-SGjn7bF1BYpuu5M&hl=en

TCP is connection oriented, does it mean packets travel in the same route?

http://books.google.com/books?id=HsCjH_V04tUC&pg=PA303&lpg=PA303&dq=tcp+connection+oriented&source=bl&ots=GEq23rT5fS&sig=ymo0rlbG_nOA2UtYpqBWijsNslk&hl=en

Connection Establishment
To establish a connection, TCP uses a 3-way handshake.

note that this is a virtual connection, not a physical connection. the TCP segment is encapsulated in an IP diagram and can be sent out of order,or lost or corrppted. and then resent. Each may use a different path to reach the destination.there is no physical connection.

MSS vs MTU

http://www.tcpipguide.com/free/t_TCPMaximumSegmentSizeMSSandRelationshiptoIPDatagra-2.htm
http://www.tcpipguide.com/free/t_IPDatagramSizetheMaximumTransmissionUnitMTUandFrag.htm

The default MSS for TCP is 536, which results from taking the minimum IP MTU of 576 and subtracting 20 bytes each for the IP and TCP headers.
MSS can be exchanged in SYN set up. each device at the end point of a connection can has its own MSS independently.

MTU = ip payload + IP head.
a minimum MTU of at least 576 bytes This value is specified in RFC 791, and was chosen to allow a “reasonable sized” data block of at least 512 bytes, plus room for the standard IP header(20) and options(40)

MTU Path Discovery

One of the message types defined in ICMPv4 is the Destination Unreachable message, which is returned under various conditions where an IP datagram cannot be delivered. One of these situations is when a datagram is sent that is too large to be forwarded by a router over a physical link but which has its Don’t Fragment (DF) flag set to prevent fragmentation. In this case, the datagram must be discarded and a Destination Unreachable message sent back to the source. A device can exploit this capability by testing the path with datagrams of different sizes, to see how large they must be before they are rejected.

Note that while intermediate routers may further fragment an already-fragmented IP message, intermediate devices do not reassemble fragments. Reassembly is done only by the recipient device.

fragment

Note that while intermediate routers may further fragment an already-fragmented IP message, intermediate devices do not reassemble fragments. Reassembly is done only by the recipient device. Perhaps the most important one is that fragments can take different routes to get from the source to destination, so any given router may not see all the fragments in a message.

http://www.tcpipguide.com/free/t_IPMessageFragmentationProcess-2.htm

- In IP fragment, only payload is fragmented. however, in PPP, the whole packet is fragmented, that mean the ppp head is also put into the 1st fragment payload.
- offset is specified in units of 8 bytes;
- each message sent between the same source and destination that is being fragmented has a different identifier. The source can decide how it generates unique identifiers. each other fragment is set to the same Identification value to mark them as part of the same original datagram.

books

http://www.digital-deception.net/books/stevens.pdf
http://www.cisco.com/en/US/products/hw/routers/ps221/products_white_paper09186a00800c69a8.shtml
http://www.sans.org/resources/tcpip.pdf
http://www.eventhelix.com/RealtimeMantra/Networking/

illustration pictures

http://www.tcpipguide.com/free/t_toc.htm

  • IP

because options can be length 0-40 bytes(word[#5-#14]), the minimum length of an IP header is 20 bytes, so the IHL is between 5 ~15 words (20~60 bytes). data starting from word[#5-#16383]

* http://en.wikipedia.org/wiki/IPv4#Packet_structure
* http://www.tcpipguide.com/free/t_IPDatagramGeneralFormat.htm

  • TCP

because options can be length 0-40 bytes(word[#5-#14]), the minimum length of an TCP header is 20 bytes, so the Data Offset field valid values are 5 through 15 (20~60 bytes). Thus, the minimum length of a TCP header is 20 bytes, and the maximum length is 60 bytes. data starting from word[#5-#16378]

The TCP header does not have a Length field, so the TCP length must be calculated for inclusion in the IP pseudo-header.

* http://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_structure
* http://www.tcpipguide.com/free/t_TCPMessageSegmentFormat-3.htm

  • UDP

data from word #16378

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License