Session Initiation Protocol (SIP)
is a communications protocol for signaling and controlling multimedia
communication sessions. The most common applications of SIP are in Internet
telephony for voice and video calls, as well as instant messaging all over
Internet Protocol (IP) networks.
SIP is a simple but extendable signaling
protocol for setting up, modifying and shutting down communication sessions
between two or more participants. One or more media or even no media at all,
can be transmitted in the session context. SIP is independent of the actual
media and the route of the media can be different to the route of signalling messages.
SIP can also invite participants to IP multicast session.
It is an application layer protocol
used to:
- establish
- modify
- terminate
sessions consisting of one or
several media streams.
By sessions, we understand a set of
senders and receivers that communicate and the state kept in those senders and
receivers during the communication. Examples of a session can include Internet
telephone calls, distribution of multimedia, multimedia conferences,
distributed computer games, etc.
It supports name mapping and
redirection services transparently:
- Personal Mobility: one single externally visible identifier regardless of the network location.
Basic scope of SIP is to exchange:
- IP Addresses
- Port Numbers
to which systems can receive data.
IP is an application layer protocol
designed to be independent of the underlying transport layer. It is a
text-based protocol, incorporating many elements of the Hypertext Transfer
Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP).
The specification is available in
form of several RFCs.The most important one is RFC3261, which contains the core
protocol specification.
SIP works in conjunction with
several other application layer protocols that identify and carry the session
media. Media identification and negotiation is achieved with the Session
Description Protocol (SDP). For the transmission of media streams (voice,
video) SIP typically employs the Real-time Transport Protocol (RTP) or Secure
Real-time Transport Protocol (SRTP). For secure transmissions of SIP messages,
the protocol may be encrypted with Transport Layer Security (TLS).
SIP is not designed to transfer
audio, video, and so on. Rather, it just sets up the session. There are many other
protocols that are called into play to make this happen (such as TCP, UDP, RTP,
and so on), but they are seen as supporting protocols rather than part of the
SIP “package.” All SIP does is start, manage, and end the session; it passes
off the responsibility of the voice or video call to other protocols. This
characteristic is one of the primary differentiators between H.323 and SIP.
SIP is NOT:
- Transport protocol (like TCP, UDP)
- QoS reservation protocol (like RSVP)
- Gateway Control Protocol (like MEGACO)
- Used to send session capabilities (instead it makes use of SDP, Session Description Protocol)
- Designed for bulk transfer (like FTP)
- Limited to Internet Telephony
- it can be used by any application having a notion of session (e.g. peer-to-peer applications)
(Note: Most Internet telephony
service providers (ITSP), which allow businesses to use the Internet to make
outside telephone calls using VoIP, use SIP as their primary signalling protocol.)
History
SIP was originally designed by Mark
Handley, Henning Schulzrinne, Eve Schooler and Jonathan Rosenberg in 1996. The
protocol was standardized as RFC 2543 in 1999 (SIP 1.0). In November 2000, SIP
was accepted as a 3GPP signaling protocol and permanent element of the IP
Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia
services in cellular systems. As of 2014, the latest version (SIP 2.0) of the
specification is RFC 3261, published in June 2002, with extensions and
clarifications since then.
The U.S. National Institute of
Standards and Technology (NIST), Advanced Networking Technologies Division
provides a public-domain Java implementation that serves as a reference
implementation for the standard. The implementation can work in proxy server or
user agent scenarios and has been used in numerous commercial and research
projects. It supports RFC 3261 in full and a number of extension RFCs including
RFC 6665 (event notification) and RFC 3262 (reliable provisional responses).
While originally developed based on
voice applications, the protocol was envisioned and supports a diverse array of
applications, including video conferencing, streaming multimedia distribution,
instant messaging, presence information, file transfer, fax over IP and online
games.
SIP Characteristics
The basic features of SIP:
- Locating user: determination of the end system to be used for communication.
- Determining user capabilities: determination of the media and media parameters to be used.
- Determining user availability: determination of the willingness of the called party to engage in communications.
- Setting up the call: "ringing", establishment of call parameters at both called and calling party.
- Controlling the call: including transfer and termination of calls.
Main technical properties and some
implications of SIP:
- Text-based (ISO 10646 in UTF-8 encoding), similar to HTTP: Easy to learn, implement, debug and extend. Causes extra overhead, which is not a serious drawback for a signaling protocol. Header names can be abbreviated.
- Recommended transport protocol is UDP: It is not meant to send large amounts of data.
- Application level routing based on Request-URI: The signaling path through SIP proxies is controlled by the protocol itself not by the underlying network. Requires routing implementation in SIP proxies.
- Independence on the session it initiates and terminates (capability descriptions, transport protocol, etc.): Cooperates with different protocols, which can be developed independently. It is not a conference control protocol (floor control, voting, etc.) but it can be used to introduce one.
- Supports multicasting for signaling and media but no multicast address or any other network resource allocation.
- Support for stateless, efficient and "forward" compatible proxies (re-INVITE carries state, ignore the body, ignore extension methods).
Network Elements
SIP has been designed for IP
networking. The protocol makes use of standard elements like DNS and DHCP servers,
firewalls, NATs and proxies. Special support in DNS and DHCP servers is not
needed but it makes the protocol operations more efficient. The SIP protocol is
implemented by the user agent client (UAC) and server (UAS), redirect servers,
proxies and registrars. Registrars and location servers maintain the mapping
between user's permanent address and current physical addresses.
The SIP specification does not
actually define the network architecture. However, the logical elements and
their relationships can be determined based on the protocol specification. The
following figure demonstrates an example of inter-domain session setup. Both
UAC and UAS are located in their home domains. Thin lines represent SIP
signaling messages and thick lines represent media transmission and dotted line
represent non-SIP protocol.
Logical network elements involved in an inter-domain session setup |
In this scenario UAC composes an
INVITE message in order to set up a call with UAS. The message contains the
session data in its headers and media descriptions in the body in SDP format.
INVITE is sent to Outbound Proxy whose address may have been configured in UAC using
DHCP. Outbound Proxy uses DNS to resolve the recipient's address. It also
controls Firewall/NAT to open the ports for media transmission. Domain B has
configured all the incoming requests to go to Proxy/Registrar that controls
Firewall/NAT of Domain B. Proxy/Registrar queries the current location of UAS
from Location Server and forwards the message to UAS. In an intra-domain call a
redirect server could be used instead of a proxy in Domain B to return the
current location of UAS who could then be contacted directly by UAC without
having any proxy involved in the communications.
Since the request carried the media
descriptions of UAC and since the corresponding ports were opened in firewalls
media can immediately flow back from UAS to UAC. The signaling response is
routed along the same path as the request and it carries the media descriptions
of UAS. UAC can now send media to UAS. Finally UAC has to send ACK message to
UAS for acknowledging the successful session establishment.
User Agent
A SIP user agent (UA) is a logical
network end-point used to create or receive SIP messages and thereby manage a
SIP session. A SIP UA can perform the role of a User Agent Client (UAC), which
sends SIP requests, and the User Agent Server (UAS), which receives the
requests and returns a SIP response. These roles of UAC and UAS only last for
the duration of a SIP transaction.
A SIP phone is an IP phone that
implements SIP user agent and server functions, which provide the traditional
call functions of a telephone, such as dial, answer, reject, hold/unhold, and
call transfer. SIP phones may be implemented as a hardware device or as a
softphone. As vendors increasingly implement SIP as a standard telephony
platform, often driven by 4G efforts, the distinction between hardware-based
and software-based SIP phones is being blurred and SIP elements are implemented
in the basic firmware functions of many IP-capable devices. Examples are
devices from Nokia and BlackBerry.
In SIP, as in HTTP, the user agent
may identify itself using a message header field 'User-Agent', containing a
text description of the software/hardware/product involved. The User-Agent
field is sent in request messages, which means that the receiving SIP server
can see this information. SIP network elements sometimes store this
information, and it can be useful in diagnosing SIP compatibility problems.
SIP User Agent registration on SIP Registrar with authentification by login |
Back-to-Back User Agent
A back-to-back user agent (B2BUA)
is a logical network element in Session Initiation Protocol (SIP) applications.
A back-to-back user agent operates between both end points of a phone call or
communications session and divides the communication channel into two call legs
and mediates all SIP signaling between both ends of the call, from call
establishment to termination. As all control messages for each call flow through
the B2BUA, a service provider may implement value-added features available
during the call.
A B2BUA may provide the following
functions:
- Call management (billing, automatic call disconnection, call transfer, etc.)
- Network interworking (perhaps with protocol adaptation)
- Hiding of network internals (private addresses, network topology, etc.)
Often, B2BUAs are implemented in
media gateways to also bridge the media streams for full control over the
session.
Establishment a connection with the B2BUA |
Proxy Server
The proxy server is an intermediary
entity that acts as both a server and a client for the purpose of making
requests on behalf of other clients. A proxy server primarily plays the role of
routing, meaning that its job is to ensure that a request is sent to another
entity closer to the targeted user. Proxies are also useful for enforcing
policy (for example, making sure a user is allowed to make a call). A proxy
interprets, and, if necessary, rewrites specific parts of a request message
before forwarding it.
Call flow through Redirect Server and proxy |
There are two basic types of SIP
Proxy Servers:
- Stateless Servers: Stateless servers are simple message forwarders. They forward messages independently of each other. Stateless proxies do not take care of transactions.
- Stateful servers: Stateful proxies are more complex. Upon reception of a request, stateful proxies create a state and keep the state until the transaction finishes. Some transactions, especially those created by INVITE, can last quite long (until the called party picks up or declines the call). Because stateful proxies must maintain the state for the duration of the transactions, their performance is limited.
Registrar
A registrar is a SIP endpoint that
accepts REGISTER requests and places the information it receives in those
requests into a location service for the domain it handles. The location
service links one or more IP addresses to the SIP URI of the registering agent.
The URI uses the sip: scheme, although other protocol schemes are possible,
such as tel:. More than one user agent can register at the same URI, with the
result that all registered user agents receive the calls to the URI.
SIP registrars are logical
elements, and are commonly co-located with SIP proxies. But it is also possible
and often good for network scalability to place this location service with a
redirect server.
Redirect Server
A user agent server that generates
3xx (Redirection) responses to requests it receives, directing the client to
contact an alternate set of URIs. The redirect server allows proxy servers to
direct SIP session invitations to external domains.
Session Border Controller
Session border controllers serve as
middle boxes between UA and SIP servers for various types of functions, including
network topology hiding, and assistance in NAT traversal.
Gateway
Gateways can be used to interface a
SIP network to other networks, such as the public switched telephone network,
which use different protocols or technologies.
Operations
Protocol operations of SIP:
- INVITE initiates session establishment
- ACK confirms successful session establishment
- OPTIONS requests capabilities
- BYE terminates the session
- CANCEL cancels a pending session establishment
- REGISTER binds a permanent SIP URL to a temporary SIP URL for the current location.
The following diagram demonstrates
SIP protocol operations for user registration and session handling.
SIP protocol operations |
SIP Messages
SIP is a text-based protocol with
syntax similar to that of HTTP. There are two different types of SIP messages:
requests and responses. The first line of a request has a method, defining the
nature of the request, and a Request-URI, indicating where the request should
be sent. The first line of a response has a response code.
SIP Request
For SIP requests, RFC 3261 defines
the following methods:
- REGISTER: Used by a UA to register to the registrar.
- INVITE: Used to establish a media session between user agents.
- ACK: Confirms reliable message exchanges.
- CANCEL: Terminates a pending request.
- BYE: Terminates an existing session.
- OPTIONS: Requests information about the capabilities of a caller without the need to set up a session. Often used as keepalive messages.
- REFER: indicates that the recipient (identified by the Request-URI) should contact a third party using the contact information provided in the request. (call transfer)
A new method has been introduced in
SIP in RFC 3262:
- PRACK (Provisional Response Acknowledgement): PRACK improves network reliability by adding an acknowledgement system to the provisional responses (1xx). PRACK is sent in response to provisional response (1xx).
SIP Response
The SIP response types defined in
RFC 3261 fall in one of the following categories:
- Provisional (1xx): Request received and being processed.
- Success (2xx): The action was successfully received, understood, and accepted.
- Redirection (3xx): Further action needs to be taken (typically by sender) to complete the request.
- Client Error (4xx): The request contains bad syntax or cannot be fulfilled at the server.
- Server Error (5xx): The server failed to fulfill an apparently valid request.
- Global Failure (6xx): The request cannot be fulfilled at any server.
Transactions
SIP makes use of transactions to
control the exchanges between participants and deliver messages reliably. The
transactions maintain an internal state and make use of timers. Client
Transactions send requests and Server Transactions respond to those requests
with one-or-more responses. The responses may include zero-or-more Provisional
(1xx) responses and one-or-more final (2xx-6xx) responses.
Transactions are further
categorized as either Invite or Non-Invite. Invite transactions differ in that
they can establish a long-running conversation, referred to as a Dialog in SIP,
and so include an acknowledgment (ACK) of any non-failing final response (e.g.
200 OK).
Because of these transactional
mechanisms, SIP can make use of un-reliable transports such as User Datagram
Protocol (UDP).
Instant Messaging and Presence
The Session Initiation Protocol for
Instant Messaging and Presence Leveraging Extensions (SIMPLE) is the SIP-based
suite of standards for instant messaging and presence information. MSRP
(Message Session Relay Protocol) allows instant message sessions and file
transfer.
Applications
A SIP connection is a marketing
term for voice over Internet Protocol (VoIP) services offered by many Internet
telephony service providers (ITSPs). The service provides routing of telephone
calls from a client’s private branch exchange (PBX) telephone system to the public
switched telephone network (PSTN). Such services may simplify corporate
information system infrastructure by sharing Internet access for voice and
data, and removing the cost for Basic Rate Interface (BRI) or Primary Rate
Interface (PRI) telephone circuits.
Many VoIP phone companies allow
customers to use their own SIP devices, such as SIP-capable telephone sets, or
softphones.
SIP-enabled video surveillance
cameras can make calls to alert the owner or operator that an event has
occurred; for example, to notify that motion has been detected out-of-hours in
a protected area.
SIP is used in audio over IP for
broadcasting applications where it provides an interoperable means for audio
interfaces from different manufacturers to make connections with one another.
SIP Security
Security must be addressed at
several levels. At the network level the security is based on regular firewalls
and NATs since SIP is designed for IP networking. Controlling the firewall with
a SIP proxy is an essential enhancement for the standard IP security
mechanisms.
At the protocol level both the
media security and signaling security must be addressed. Media encryption is
specified in the message body with SDP.
Encryption
The increasing concerns about
security of calls that run over the public Internet has made SIP encryption
more popular. Because VPN is not an option for most service providers, most
service providers that offer secure SIP (SIPS) connections use TLS for securing
signaling. The relationship between SIP (port 5060) and SIPS (port 5061), is
similar to that as for HTTP and HTTPS, and uses URIs in the form
"sips:user@example.com". The media streams, which occur on different
connections to the signaling stream, can be encrypted with SRTP. The key
exchange for SRTP is performed with SDES (RFC 4568), or the newer and often
more user friendly ZRTP (RFC 6189), which can automatically upgrade RTP to SRTP
using dynamic key exchange (and a verification phrase). One can also add a
MIKEY (RFC 3830) exchange to SIP and in that way determine session keys for use
with SRTP.
Comparing H.323 and SIP
- SIP is less complex than H.323
- SIP is better suited for the integration of presence, IM, and audio/video.
- H.323 was porting the legacy world to the Internet. But SIP was designed for the Internet.
- VH.323 has is introducing new options in the last versions that were part of SIP from the beginning
----
No comments:
Post a Comment