Rtp

https://docs.google.com/a/google.com/viewer?url=http://www.netlab.tkk.fi/opetus/s38130/k99/presentations/4.pdf
http://suraj.lums.edu.pk/~cs584s06/slides/rtp.pdf

http://csperkins.org/research/index.html

SDP

SDP uses attributes to extend the core protocol. Attributes can appear within the Session or Media sections and are
scoped accordingly as “session-level” or “media-level”.
Attributes take two forms:

  • A property form: “a=<flag>” conveys a property of the session.
  • A value form: “a=<attribute>:<value>” provides a named parameter

RTP telephone events

RTP can describe DTMF in 3 formats:

- an event-only payload

      m=audio 12346 RTP/AVP 100
      a=rtpmap:100 telephone-event/8000
      a=fmtp:100 0-15
      a=ptime:50

- a tone-only payload

      m=audio 12346 RTP/AVP 101
      a=rtpmap:101 tone/8000
      a=ptime:50

- a combined payload. with tones primary and events as a single redundant layer.

      m=audio 12346 RTP/AVP 102 101 100
      a=rtpmap:102 red/8000/1
      a=fmtp:102 101/100
      a=rtpmap:101 tone/8000
      a=rtpmap:100 telephone-event/8000
      a=fmtp:100 0-15
      a=ptime:50

very good explaining from stackoverflow:

I recommend you to start with the RFC 4733 for two reasons:

1. It obsolotes the RFC 2833.
2. The chapter 5. is a great source to understand how a DTMF digit is produced.

Here is my understanding of how a DTMF digit should be sent:

* A start packet is emitted. It has its M flag set and the E flag cleared. The timestamp for the event is set.
* One or more continuation packets are emitted (as long as the user pressed the digit). They have theirs M And E flags cleared. They use the timestamp defined in the start packet, but their sequence numbers and their duration are incremented (see the RFC for the intervals).
* An end packet is sent (when the user stop pressing the digit). It has it M flag cleared and its E flag set.

Why should several packets be sent for one event ? Because the network is not always perfect and some loss can occur:

* The RFC states (2.5.1.2. "Transmission of Event Packets") that:

For robustness, the sender SHOULD retransmit "state" events periodically.

* And (2.5.1.4. "Retransmission of Final Packet") that:

The final packet for each event and for each segment SHOULD be sent a
total of three times at the interval used by the source for updates.
This ensures that the duration of the event or segment can be recognized correctly even if an instance of the last packet is lost.

RTP packets programming

http://www.csee.umbc.edu/~pmundur/courses/CMSC691C/lab5-kurose-ross.html

QoS

http://www.sipfoundry.org/web/jpatten/blogs/-/blogs/quality-of-service-setup-for-hp-procurve-switches-and-polycom-phones

6 Steps To Tune Your Voice Quality

multiplex

http://csperkins.org/research/rtcweb/

There are three fundamental points of multiplexing within the RTP framework:

  • Use of separate RTP Sessions: The first, and the most important, multiplexing point is the RTP session. This multiplexing point does not have an identifier within the RTP protocol itself, but instead relies on the lower layer to separate the different RTP sessions. This is most often done by separating different RTP sessions onto different UDP ports, or by sending to different IP multicast addresses. The distinguishing feature of an RTP session is that it has a separate SSRC identifier space; a single RTP session can span multiple transport connections provided packets are gatewayed such that participants are known to each other. Different RTP sessions are used to separate different types of media within a multimedia session. For example, audio and video flows are sent on separate RTP sessions. But also completely different usages of the same media type, e.g. video of the presenter and the slide video, benefits from being separated.
  • Multiplexing using the SSRC within an RTP session: The second multiplexing point is the SSRC that separates different sources of media within a single RTP session. An example might be different participants in a multiparty teleconference, or different camera views of a presentation. In most cases, each participant within an RTP session has a single SSRC, although this may change over time if collisions are detected. However, in some more complex scenarios participants may generate multiple media streams of the same type simultaneously (e.g., if they have two cameras, and so send two video streams at once) and so will have more than one SSRC in use at once. The RTCP CNAME can be used to distinguish between a single participant using two SSRC values (where the RTCP CNAME will be the same for each SSRC), and two participants (who will have different RTCP CNAMEs).
  • Multiplexing using the Payload Type within an RTP session: If different media encodings of the same media type (audio, video, text, etc) are to be used at different times within an RTP session, for example a single participant that can switch between two different audio codecs, the payload type is used to identify how the media from that particular source is encoded. When changing media formats within an RTP Session, the SSRC of the sender remains unchanged, but the RTP Payload Type changes to indicate the change in media format.

contributing source (CSRC): A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the synchronization source (SSRC) identifiers of the sources that contributed to the generation of a particular packet into the RTP header of that packet. This list is called the CSRC list. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer). See [RFC3550] section 3.

mixer: An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet. Because the timing among multiple input sources will not generally be synchronized, the mixer will make timing adjustments among the streams and generate its own timing for the combined stream. Thus, all data packets originating from a mixer will be identified as having the mixer as their synchronization source. See [RFC3550] section 3.

http://csperkins.org/research/rtcweb/2011-11-14-RTP-muxing-arch.pdf
http://diec.unizar.es/~jsaldana/personal/gamma_CCNC_2011_in_proc.pdf
http://msdn.microsoft.com/en-us/windows/hardware/gg463006.aspx

NAT

http://en.wikipedia.org/wiki/Network_address_translation

http://wiki.freeswitch.org/wiki/Natted_Softphone_ATA
http://www.asteriskguru.com/tutorials/sip_nat_oneway_or_no_audio_asterisk.html

best practice of sip nat traversal
a good post + attached files

http://yate.null.ro/pmwiki/index.php/Main/SIPNAT

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License