Monday, January 20, 2014

All packets are not created equal: why DPI and policy vendors look at video encoding

As we are still contemplating the impact of last week's US ruling on net neutrality, I thought I would attempt today to settle a question I often get in my workshops. Why is DPI insufficient when it comes to video policy enforcement?

Deep packet inspection platforms have evolved from a static rules-based filtering engine to a sophisticated enforcement point allowing packet and protocol classification, prioritization and shaping. Ubiquitous in enterprises and telco networks, they are the jack-of-all-trade of traffic management, allowing such a diverse set of use cases as policy enforcement, adult content filtering, lawful interception, QoS management, peer-to-peer throttling or interdiction, etc...
DPIs rely first on a robust classification engine. It snoops through data traffic and classifies each packet based on port, protocol, interface, origin, destination, etc... The more sophisticated engines go beyond layer 3 and are able to recognize classes of traffic using headers. This classification engine is sufficient for most traffic type inspection, from web browsing to email, from VoIP to video conferencing or peer-to-peer sharing.
The premise, here is that if you can recognize, classify, tag traffic accurately, then you can apply rules governing the delivery of this traffic, ranging from interdiction to authorization, with many variants of shaping in between.

DPI falls short in many cases when it comes to video streaming. Until 2008 or so, most video streaming was relying on specialized protocols such as RTSP. The classification was easy, as the videos were all encapsulated in a specific protocol, allowing instantiation and enforcement of rules in pretty straightforward manner. The emergence and predominance of HTTP based streaming video (progressive download, adaptive streaming and variants) has complicated the task for DPIs. The transport protocol remains the same as general web traffic, but the behaviour is quite different. As we have seen many times in this blog, video traffic must be measured in different manner from generic data traffic, if policy enforcement is to be implemented. All packets are not created equal.


  • The first challenge is to recognise that a packet is video. DPIs generally infer the nature of the HTTP packet based on its origin/destination. For instance, they can see that the traffic's origin is YouTube, they can therefore assume that it is video. This is insufficient, not all YouTube traffic is video streaming (when you browse between pages, when you read or post comments, when you upload a video, when you like or dislike...). Applying video rules to browsing traffic or vice versa can have adverse consequences on the user experience.
  • The second challenge is policy enforcement. The main tool in DPI arsenal for traffic shaping is setting the delivery bit rate for a specific class of traffic. As we have seen, videos come in many definition (4k, HD, SD, QCIF...), many containers and many formats, resulting in a variety of different encoding bit rate. If you want to shape your video traffic, it is crucial that you know all these elements and the encoding bit rate, because if traffic is throttled below the encoding, rate, then the video stalls and buffers or times out. It is not reasonable to have a one-size-fits-all policy for video (unless it is to forbid usage). In order to extract the video-specific attributes of a session, you need to decode it, which requires in-line transcoding capabilities, even if you do not intend to modify that video.


Herein lies the difficulty. To implement intelligent, sophisticated traffic management rules today, you need to be able handle video. To handle video, you need to recognize it (not infer or assume), and measure it. To recognize and measure it, you need to decode it. This is one of the reasons why Allot bought Ortiva Wireless in 2012Procera partnered with Skyfire and ByteMobile upgraded their video inspection to full fledged DPI more recently. We will see more generic traffic management vendors (PCRF, PCEF, DPI...) partner and acquire video transcoding companies.

3 comments:

Don Bowman said...

I don't think this is technically accurate.
Identifying HTTP video is normally done by looking @ the MIME type.
Specific video providers like youtube can normally be found from the HOST: tag in the HTTP header.

You don't need to transcode video to understand the target resolution and bitrate: these are normally encoded in the meta data.

Patrick Lopez said...

Thanks for your comments, Don.
You raise many good points.

I believe that while looking at HTTP header can provide many indications on the content, such as MIME type, I think you would agree that in many cases, MIME type is either inaccurate or missing. MIME type describes the container but not the format of the video. Format is an important data point for codec management.
Additionally, recognizing flavours of HTTP-based delivery becomes somewhat difficult if one cannot discriminate download from adaptive streaming and its variants.

Target resolution and bit rate might be encoded in the metadata, it depends what metadata are accessible. In many case, these metadata are encapsulated in a manifest file for fragmented mp4 for instance. Being able to recognise and read that manifest is crucial to understanding and implementing video rules. Not every policy enforcement / DPI engine is able to detect and read video manifests or to extract relevant metadata from a video stream.

Discussion to be continued at MWC...

Carlos Bazzarella said...

To recognize an HTTP packet as video is trivial, you just need to look at the first 16 bytes of the file (container). The mime type is sometimes wrong.

Also to extract video metadata from the container does not require an in-line transcoder, that would be overkill. All you need are incremental decoders (per container type) that can zoom in on the required metadata while skipping most of the bytes in the file/container.