Tuesday, June 21, 2016

SDN / NFV: Enemy of the state

Extracted from my SDN and NFV in wireless workshop.

I want to talk today about an interesting subject I have seen popping up over the last six months or so and in many presentations in the stream I chaired at the NFV world congress a couple of months ago.

In NFV and to a certain extent in SDN as well, service availability is achieved through a combination of functions redundancy and fast failover routing whenever a failure is detected in the physical or virtual fabric. Availability is a generic term, though and covers different expectations whether you are a consumer, operator or enterprise. The telecom industry has heralded the mythical 99.999% or five nines availability as the target to reach for telecoms equipment vendors.

This goal has led to networks and appliances that are super redundant, at the silicon, server, rack and geographical levels, with complex routing, load balancing and clustering capabilities to guarantee that element failures do not impact catastrophically services. In today's cloud networks, one arrives to the conclusion that a single cloud, even tweaked can't performed beyond three nines availability and that you need a multi-cloud strategy to attain five nines of service availability...

Consumers, over the last ten years have proven increasingly ready to accept a service that might not be always of the best quality if the price point is low enough. We all remember the start of skype when we would complain of failed and dropped calls or voice distortions, but we all put up with it mostly because it was free-ish. As the service quality improved, new features and subscriptions schemes were added, allowing for new revenues as consumers adopted new services.
One could think from that example that maybe it is time to relax the five nines edict from telecoms networks but there are two data points that run counter to that assumption.


  1. The first and most prominent reason to keep a high level of availability is actually a regulatory mandate. Network operators operate not only a commercial network but also a series of critical infrastructure for emergency and government services. It is easy to think that 95 or 99% availability is sufficient until you have to deliver 911 calls, where that percentage difference means loss of life.
  2. The second reason is more innate to network operators themselves. Year after year, polls show that network operators believe that the way they outcompete each others and OTTs in the future is quality of service, where service availability is one of the first table stakes. 


As I am writing this blog, SDN and NFV in wireless have struggled through demonstrating basic load balancing and static traffic routing, to functions virtualization and auto scaling over the last years. What is left to get commercial grade (and telco grade) offerings is resolving the orchestration bit (I'll write another post on the battles in this segment) and creating a service that is both scalable and portable.

The portable bit is important, as a large part of the value proposition is to be able to place functions and services closer to the user or the edge of the network. To do that, an orchestration system has to be able to detect what needs to be consumed where and to place and chain relevant functions there.
Many vendors can demonstrate that part. The difficulty arises when it becomes necessary to scale in or down a function or when there is a failure.

Physical and virtual functions failure are to be expected. When they arise in today's systems, there is a loss of service, at least for the users that were using these functions. In some case, the loss is transient and a new request / call will be routed to another element the second time around, in other cases, it is permanent and the session / service cannot continue until another one is started.

In the case of scaling in or down, most vendors today will starve the virtual function and route all new requests to other VMs until this function can be shut down without impact to live traffic. It is not the fastest or the most efficient way to manage traffic. You essentially lose all the elasticity benefits on the scale down if you have to manage these moribund zombie-VNFs until they are ready to die.

Vendors and operators who have been looking at these issues have come to a conclusion. Beyond the separation of control and data plane, it is necessary to separate further the state of each machine, function service and to centralize it in order to achieve consistent availability, true elasticity and manage disaster recovery scenarios.

In most cases, this is a complete redesign for vendors. Many of them have already struggled to port their product to software, then port it to hypervisor, then optimized for performance... separating state from the execution environment is not going to be just another port. It is going to require redesign and re architecting.

The cloud-native vendors who have designed their platform with microservices and modularity in mind have a better chance, but there is still a series of challenges to be addressed. Namely, collecting state information from every call in every function, centralizing it and then redistribute it is going to create a lot of signalling traffic. Some vendors are advocating some inline signalling capabilities to convey the state information in a tokenized fashion, others are looking at more sophisticated approaches, including state controllers that will collect, transfer and synchronize relevant controllers across clouds.
In any case, it looks like there is still quite a lot of work to be done in creating truly elastic and highly available virtualized, software defined network.

Monday, June 13, 2016

Time to get out of consumer market for MNOs?

I was delivering a workshop on SDN / NFV in wireless, last week, at a major pan-european tier one operator group and the questions of encryption and net neutrality were put again on the table.

How much clever, elastic, agile software-defined traffic management can we really expect when "best effort" dictates the extent of traffic management and encryption renders many efforts to just understand traffic composition and velocity difficult?

There is no easy answer. I have spoken at length on both subjects (here and here, for instance) and the challenges have not changed much. Encryption is still a large part of traffic and although it is not growing as fast as initially planned after Google, Netflix, Snapchat or Facebook's announcements it is still a dominant part of data traffic. Many start to think that HTTPS / SSL is a first world solution, as many small and medium scale content or service providers that live on a freemium or ad-sponsored models can't afford the additional cost and latency unless they are forced to. Some think that encryption levels will hover around 50-60% of the total until mass adoption of HTTP/2 which could take 5+ years. We have seen, with T-Mobile's binge on  a first service launch that actively manages traffic, even encrypted to an agreed upon quality level. The net neutrality activists cried fool at the launch of the service, but quickly retreated when they saw the popularity and the first tangible signs of collaboration between content providers, aggregators and operators for customers' benefit.

As mentioned in the past, the problem is not technical, moral or academic. Encryption and net neutrality are just symptoms of an evolving value chain where the players are attempting to position themselves for dominance. The solution with be commercial and will involve collaboration in the form of content metadata exchange, to monitor, control and manage traffic. Mobile Edge Computing can be a good enabler in this. Mobile advertising, which is still missing over 20b$ in investment in the US alone when compared to other media and time spent / eyeball engagement will likely be part of the equation as well.

...but what happens in the meantime, until the value chain realigns? We have seen consumer postpaid ARPU declining in most mature markets for the last few years, while we seen engagement and usage of so-called OTT services explode. Many operators continue to keep their head in the sand and thinking of "business as usual" while timidly investigating new potential "revenue streams".

I think that the time has come for many to wake up and take hard decisions. In many cases, operators are not equipped organizationally or culturally for the transition that is necessary to flourish in a fluid environment where consumer flock to services that are free, freemium, or ad sponsored. What operators know best, subscription services see their price under intense pressure because OTTs are looking at usage and penetration at global levels, rather than per country. For these operators who understand the situation and are changing their ways, the road is still long and with many obstacles, particularly on the regulatory front, where they are not playing by the same rules as their OTT competition.

I suggest here that for many operators, it is time to get out. You had a good run, made lots of money on consumer services through 2G, 3G and early 4G, the next dollars or euros are going to be tremendously more expensive to get than the earlier.
At this point, I think there are emerging and underdeveloped verticals (such as enterprise and IoT) that are easier to penetrate (less regulatory barriers, more need for managed network capabilities and at least in the case of enterprise, more investment possibilities).
I think that at this stage, any operator who derives most of its revenue from consumer services should assume that these will likely dwindle to nothing unless drastic operational, organizational and cultural changes occur.
Some operator see the writing on the wall and have started the effort. There is no guarantee that it will work, but certainly having a software defined, virtualized elastic network will help if they are betting the farm on service agility. Others are looking at new technologies, open source and standards as they have done in the past. Aligning little boxes from industry vendors in neat powerpoint roadmap presentations, hiring a head of network transformation or virtualization... for them, the reality, I am afraid will come hard and fast. You don't invest in technologies to build services. You build services first and then look at whether you need more or new technologies to enable them.