{Core Analysis}: performance

Showing posts with label performance. Show all posts

Tuesday, June 21, 2016

SDN / NFV: Enemy of the state

Extracted from my SDN and NFV in wireless workshop.

I want to talk today about an interesting subject I have seen popping up over the last six months or so and in many presentations in the stream I chaired at the NFV world congress a couple of months ago.

In NFV and to a certain extent in SDN as well, service availability is achieved through a combination of functions redundancy and fast failover routing whenever a failure is detected in the physical or virtual fabric. Availability is a generic term, though and covers different expectations whether you are a consumer, operator or enterprise. The telecom industry has heralded the mythical 99.999% or five nines availability as the target to reach for telecoms equipment vendors.

This goal has led to networks and appliances that are super redundant, at the silicon, server, rack and geographical levels, with complex routing, load balancing and clustering capabilities to guarantee that element failures do not impact catastrophically services. In today's cloud networks, one arrives to the conclusion that a single cloud, even tweaked can't performed beyond three nines availability and that you need a multi-cloud strategy to attain five nines of service availability...

Consumers, over the last ten years have proven increasingly ready to accept a service that might not be always of the best quality if the price point is low enough. We all remember the start of skype when we would complain of failed and dropped calls or voice distortions, but we all put up with it mostly because it was free-ish. As the service quality improved, new features and subscriptions schemes were added, allowing for new revenues as consumers adopted new services.
One could think from that example that maybe it is time to relax the five nines edict from telecoms networks but there are two data points that run counter to that assumption.

The first and most prominent reason to keep a high level of availability is actually a regulatory mandate. Network operators operate not only a commercial network but also a series of critical infrastructure for emergency and government services. It is easy to think that 95 or 99% availability is sufficient until you have to deliver 911 calls, where that percentage difference means loss of life.
The second reason is more innate to network operators themselves. Year after year, polls show that network operators believe that the way they outcompete each others and OTTs in the future is quality of service, where service availability is one of the first table stakes.

As I am writing this blog, SDN and NFV in wireless have struggled through demonstrating basic load balancing and static traffic routing, to functions virtualization and auto scaling over the last years. What is left to get commercial grade (and telco grade) offerings is resolving the orchestration bit (I'll write another post on the battles in this segment) and creating a service that is both scalable and portable.

The portable bit is important, as a large part of the value proposition is to be able to place functions and services closer to the user or the edge of the network. To do that, an orchestration system has to be able to detect what needs to be consumed where and to place and chain relevant functions there.
Many vendors can demonstrate that part. The difficulty arises when it becomes necessary to scale in or down a function or when there is a failure.

Physical and virtual functions failure are to be expected. When they arise in today's systems, there is a loss of service, at least for the users that were using these functions. In some case, the loss is transient and a new request / call will be routed to another element the second time around, in other cases, it is permanent and the session / service cannot continue until another one is started.

In the case of scaling in or down, most vendors today will starve the virtual function and route all new requests to other VMs until this function can be shut down without impact to live traffic. It is not the fastest or the most efficient way to manage traffic. You essentially lose all the elasticity benefits on the scale down if you have to manage these moribund zombie-VNFs until they are ready to die.

Vendors and operators who have been looking at these issues have come to a conclusion. Beyond the separation of control and data plane, it is necessary to separate further the state of each machine, function service and to centralize it in order to achieve consistent availability, true elasticity and manage disaster recovery scenarios.

In most cases, this is a complete redesign for vendors. Many of them have already struggled to port their product to software, then port it to hypervisor, then optimized for performance... separating state from the execution environment is not going to be just another port. It is going to require redesign and re architecting.

The cloud-native vendors who have designed their platform with microservices and modularity in mind have a better chance, but there is still a series of challenges to be addressed. Namely, collecting state information from every call in every function, centralizing it and then redistribute it is going to create a lot of signalling traffic. Some vendors are advocating some inline signalling capabilities to convey the state information in a tokenized fashion, others are looking at more sophisticated approaches, including state controllers that will collect, transfer and synchronize relevant controllers across clouds.
In any case, it looks like there is still quite a lot of work to be done in creating truly elastic and highly available virtualized, software defined network.

Monday, October 19, 2015

SDN world 2015: unikernels, compromises and orchestrated obsolescence

Last week's Layer123 SDN and OpenFlow World Congress brought its usual slew of announcements and claims.

From my perspective, I have retained a contrasted experience from the show.

On one hand, it is clear that SDN has now transitioned from proof of concept to commercial trial, if not full commercial deployment and operators are now increasingly understanding the limits of open source initiatives such as OpenStack for carrier-grade deployments. The telling sign is the increasing number of companies specialized in OpenFlow or other protocols high performance hardware based switches.

It feels that Open vSwitch has not hit its stride, notably in term of performance and operators are left with either going open source, cost efficient but not scalable nor performing or compromising with best of breed, hardware-based, hardened switches that offer high performance and scalability but not the agility of software-based implementation yet. What is new, however, is that operators seem ready to compromise for time to market, rather than wait for a possibly more open solution that could - or not - deliver on its promises.

On the NFV front, I feel that many vendors have been forced to lower their silly claims in term of performance, agility and elasticity. It is quite clear that many of them have been called to prove themselves in operators' labs and have failed to deliver. In many cases, vendors are able to demonstrate agility, through VM porting / positioning using either their VNFM or an orchestrator's integration, they are even, in some cases, able to show some level of elasticity with auto-scaling powered by their own EMS, and many have put out press releases with Gbps or Tbps or millions of simultaneous sessions of capacity...
... but few are able to demonstrate all three at the same time, since their performance achievement has, in many cases been relying on SR-IOV to bypass the hypervisor layer, which ties the VM to the CPU in a manner that makes agility and elasticity extremely difficult to achieve.
Operators, here again, seem bound to compromise between performance or agility if they want to accelerate their time to market.

Operators themselves came in troves to show their progress on the subject, but I felt a distinct change in tone in term of their capacity to effectively get vendors deliver on the promises of the NFV successive white papers. One issue lies flatly on the operators' attitude themselves. Many MNO are displaying unrealistic and naive expectations. They say that they are investing in NFV as a means to attain vendor independence but they are unwilling to perform any integration themselves. It is very unlikely that large Telecom Equipment Manufacturer will willingly help deconstruct their value proposition by offering commoditized, plug-and-play, open interfaced virtualized functions.

SDN and NFV integration is still dirty work. Nothing really performs at line rate without optimization, no agility, flexibility, scalability is really attained without fine tuned integration. Operators won't realize the benefits of the technology if they don't get in on the integration work themselves.

At last, what is still missing from my perspective is a service creation strategy that would make use of a virtualized network. Most network operators still mention service agility and time to market as a key driver, but when asked what they would launch if their network was fully virtualized and elastic today, they quote disappointing early examples such as virtual (!?) VPN, security or broadband on demand... timid translations of existing "services" in a virtualized world. I am not sure most of the MNOs realize their competition is not each other but Google, Netflix, Uber, Facebook and others...
By the time they launch free and unlimited voice, data and messaging services underpinned by advertising or sponsored model, it will be quite late to think of new services, even if the network is fully virtualized. It feels like MNOs are orchestrating their own obsolescence.

At last, the latest buzzwords you must have in your presentation this quarter are:
The pet and cattle analogy,
SD WAN,
5G

...and if you haven't yet formulated a strategy with respect to containers (Dockers, etc...) don't bother, they're dead and the next big thing are unikernels. This and more in my latest report and workshop on "SDN NFV in wireless networks 2015 / 2016".

Tuesday, May 5, 2015

NFV world congress: thoughts on OPNFV and MANO

I am this week in sunny San Jose, California at the NFV World Congress where I will chair Thursday the stream on Policy and orchestration - NFV management.
My latest views on SDN / NFV implementation in wireless networks are published here.

The show started today with a mini-summit on OPNFV, looking at the organization's mission, roadmap and contribution to date.

The workshop was well-attended, with over 250 seats occupied and a good number of people standing in the back. On the purpose of OPNFV, it feels that the organization is still trying to find its mark a little bit, hesitating between being a transmission belt between ETSI NFV and open source implementation projects and graduating to a prescriptive set of blueprints for NFV implementations in wireless networks.

If you have trouble following, you are not the only one. I am quite confused myself. I thought OpenStack had a mandate to create source code for managing cloud network infrastructure and that NFV was looking at managing service in a virtualized fashion, which could sit on premises, clouds and hybrid environments. While NFV does not produce code, why do we need OPNFV for that?

Admittedly, the organization is not necessarily deterministic in its roadmap, but rather works on what its members feel is needed. As a result, it has decided that its first release, code-named ARNO will be supporting KVM as hypervisor environment and will feature an OpenStack architecture underpinned by an OpenDaylight-based SDN controller. ARNO should be released "this spring" and is limited in its scope as a first attempt to provide an example of a carrier-grade ETSI NFV-based source code for managing a SDN infrastructure. Right now, ARNO is focused on VIM (Virtual Infrastructure Management), and since the full MANO is not yet standardized and it is felt it is too big a chunk to look at for a first release, it will be part of a later requirement phase. The organization is advocating pushing requirements and bug resolution upstream (read to other open source communities) to make the whole SDN / NFV more "carrier-grade".

This is where, in my mind the reasoning breaks down. There is a contradiction in terms and intent here. On one hand, OPNFV advocates that there should not be separate branches within implementation projects such as OpenStack for instance for carrier specific requirements. Carrier-grade being the generic analogy to describe high availability, scalability and high performance. The rationale is that it could be beneficial to the whole OpenStack ecosystem. On the other hand, OPNFV seems to have been created to implement and test primarily NFV-based code for carrier environment. Why do we need OPNFV at all if we can push these requirements within OpenStack and ETSI NFV? The organization feels more like an attempt to supplement or even replace ETSI NFV by an opensource collaborative project that would be out of ETSI's hands.

More importantly, if you have been to OpenStack meeting, you know that you are probably twice as likely to meet people from the banking, insurance, media, automotive industry as from the telecommunications space. I have no doubt that theoretically, everyone would like more availability, scalability, performance, but practically, the specific needs of each enterprise segment rarely means they are willing to pay for over-engineered networks. Telco carrier-grade was born from regulatory pressure to provide a public infrastructure service, many enterprises wouldn't know what to do with the complications and constraints arising from these.

As a result, I personally have doubts for the success of the Telcos and forums such as OPNFV to influence larger groups such as OpenStack to deliver a "carrier-grade" architecture and implementation. I think that Telco operators and vendors are a little confused by open source. They essentially treat it as a standard, submitting change requests, requirements, gap analysis while not enough is done (by the operators community at least) to actually get their hands dirty and code. The examples of AT&T, Telefonica, Telecom Italia and some others are not in my mind reflective of the industry at large.

If ETSI were more effective, service orchestration in MANO would be the first agenda item, and plumbing such as VIM would be delegated to more advanced groups such as OpenStack. If a network has to become truly elastic, programmable, self reliant and agile, in a multi vendor environment, then MANO is the brain and it has to be defined and implemented by the operators themselves. Otherwise, we will see Huawei, Nokialcatelucent, Ericsson, HP and others become effectively the app store of the networks (last I checked, it did not work very well for operators when Apple and Android took control of that value chain...). Vendors have no real incentive to make orchestration open and to fulfill the vendor agnostic vision of NFV.

Thursday, January 22, 2015

The future is cloudy: NFV 2020

As the first phase of ETSI ISG NFV wraps up and phase 1's documents are being released, it is a good time to take stock of the progress to date and what lies ahead.

ETSI members have set an ambitious agenda to create a function and service virtualization strategy for broadband networks, aiming at reducing hardware and vendor dependency while creating an organic, automated, programmable network.

The first set of documents approved and published represents a great progress and possibly one of the fastest achievement for a new standard to be rolled out; in only two years. It also highlights how much work is still necessary to make the vision a reality.

Vendors announcements are everywhere, "NFV is a reality, it is happening, it works, you can deploy it in your networks today...". I have no doubt Mobile World Congress will see several "world's first commercial deployment of [insert your vLegacyProduct here]...". The reality is a little more nuanced.

Network Function Virtualization, as a standard does not allow today a commercial deployment out of the box. There are too many ill-defined interfaces, competing protocols, missing API to make it plug and play. The only viable deployment scenario today is from single vendor or tightly integrated (proprietary) dual vendor strategies for silo services / functions. From relatively simple (Customer Premise Equipment) to very complex (Evolved Packet Core), it will possible to see commercial deployments in 2015, but they will not be able to illustrate all the benefits of NFV.

As I mentioned before, orchestration, integration with SDN, performance, security, testing, governance... are some of the challenges that remain today for viable commercial deployment of NFV in wireless networks. These are only the technological challenges, but as mentioned before, operational challenges to evolve and train the workforce at operators is probably the largest challenge.

From my many interactions and interviews with network operators, it is clear that there are several different strategies at play.

The first strategy is to roll out a virtualized function / service with one vendor, after having tested, integrated, trialed it. It is a strategy that we are seeing a lot in Japan or Korea, for instance. It provides a pragmatic learning process towards implementing virtualized function in commercial networks, recognizing that standards and vendors implementations will not be fully interoperable before a few years.
The second strategy is to stimulate the industry by standards and forum participation, proof of concepts, and even homegrown development. This strategy is more time and resource-intensive but leads to the creation of an ecosystem. No big bang, but an evolutionary, organic roadmap that picks and chooses which vendor, network element, services are ready for trial, poc, limited and commercial deployment. The likes of Telefonica and Deutsche Telekom are good examples of this approach.
The third strategy is to define very specifically the functions that should be virtualized, their deployment, management and maintenance model and select a few vendors to enact this vision. AT&T is a good illustration here. The advantage is probably to have a tailored experience that meets their specific needs in a timely fashion before standards completion, the drawback being the flexibility as vendors are not interchangeable and integration is somewhat proprietary.
The last strategy is not a strategy, it is more a wait and see approach. Many operators do not have the resource or the budget to lead or manage this complex network and business transformation. they are observing the progress and placing bets in term of what can be deployed when.

As it stands, I will continue monitoring and chairing many of the SDN / NFV shows this year. My report on SDN / NFV in wireless networks is changing fast, as the industry is, so look out for updates throughout 2015.

Wednesday, January 14, 2015

2014 review and 2015 predictions

Last year, around this time, I had made some predictions for 2014. Let's have a look at how I fared and I'll risk some opinions for 2015.
Before predictions, though, new year, new web site, check it out at coreanalysis.ca

Content providers, creators, aggregators:

"OTT video content providers are reaching a stage of maturity where content creation / acquisition was the key in the first phase, followed by subscriber acquisition. As they reach critical mass, the game will change and they will need to simultaneously maximize monetization options by segmenting their user base into new price plans and find a way to unlock value in the mobile market."
On that front, content creation / acquisition still remains a key focus of large video OTT (See Netflix' launch of Marco Polo for $90m). Netflix has reported $8.9B of content obligations as of September 2014. On the monetization, front, we have also seen signs of maturity, with YouTube experimenting on new premium channels and Netflix charging premium for 4K streaming. HBO has started to break out of its payTV shell and has signed deals to be delivered as online broadband only subscriptions, without cable/satellite.
Netflix has signed a variety of deals with european MSOs and broadband operators as they launched there in 2014.
While many OTT, particularly social networks and radio/ audio streaming have collaborated and signed deals with mobile network operators, we are seeing also a tendency to increasingly encrypt and obfuscate online services to avoid network operators meddling in content delivery.
Both trends will likely accelerate in 2015, with more deals being struck between OTT and network operators for subscription-based zero-rated data services. We will also see in mobile networks the proportion of encrypted data traffic raise from the low 10's to at least 30% of the overall traffic.

Wholesaler or Value provider?

The discussion about the place of the network operator and MSO in content and service delivery is still very much active. We have seen, late last year, the latest net neutrality sword rattling from network operators and OTT alike, with even politicians entering the fray and trying to influence the regulatory debates. This will likely not be setted in 2015. As a result, we will see both more cooperation and more competition, with integrated offering (OTT could go full MVNO soon) and encrypted, obfuscated traffic on the rise. We will probably also see the first lawsuits from OTT to carriers with respect to traffic mediation, optimization and management. This adversarial climate will delay further monetization plays relying on mobile advertisement. Only integrated offering between OTT and carriers will be able to avail from this revenue source.

Some operators will step away from the value provider strategy and will embrace wholesale models, trying to sign as many MVNO and OTT as possible, focusing on network excellence. These strategies will fail as the price per byte will decline inexorably, unable to sustain a business model where more capacity requires more investment for diminishing returns.

Some operators will seek to actively manage and mediate the traffic transiting through their networks and will implement HTTPS / SPDY proxy to decrypt and optimize encrypted traffic, wherever legislation is more supple.

Mobile Networks

CAPEX will be on the rise overall with heterogeneous networks and LTE roll-out taking the lion share of investments.

LTE networks will show signs of weakness in term of peak traffic handling mainly due to video and audio streaming and some networks will accelerate LTE-A investments or aggressively curb traffic through data caps, throttles and onerous pricing strategies.

SDN will continue its progress as a back-office and lab technology in mobile networks but its incapacity to provide reliable, secure, scalable and manageable network capability will prevent it to make a strong commercial debut in wireless networks. 2018 is the likeliest time frame.

NFV will show strong progress and first commercial deployments in wireless networks, but in vertical, proprietary fashion, with legacy functions (DPI, EPC, IMS...) translated in a virtualized environment in a mono vendor approach. We will see also micro deployments in emerging markets where cost of ownership takes precedence over performance or reliability. APAC will also see some commercial deployments in large networks (Japan, Korea) in fairly proprietary implementations.

Orchestration and integration with SDN will be the key investments in the standardization community. The timeframe for mass market interoperable multi vendor commercial deployment is likely 2020.

To conclude this post, my last prediction is that someone will likely be bludgeoned to death with their own selfie stick, I'll put my money on Mobile World Congress 2015 as a likely venue, where I am sure countless companies will give them away, to the collective exasperation and eye-rolling of the Barcelona population.

That's all folks, see you soon at one of the 2015 shows.

Tuesday, August 26, 2014

SDN / NFV part V: flexibility or performance?

Early on in my investigations of how SDN and NFV are being implemented in mobile networks, I have found that performance remains one of the largest stumbling blocks the industry has to overcome if we want to transition to next generations networks.

Specifically, many vendors recognize behind closed doors that a virtualized environment today has many performance challenges. It explains probably why so many of the PoCs feature chipset vendors as a participant. A silicon vendor as a main proponent of virtualization is logical, as the industry seeks to transition from purpose-built proprietary hardware to open COTS platforms. It does not fully explain though the heavy involvement of the chipset vendors in these PoCs. Surely, if the technology is interoperable and open, chipset vendor integration would not be necessary?

Linux limitations

Linux as an operating system has been developed originally for single core systems. As multi-core and multithreaded architectures made their appearance, the operating system has shown great limitations in managing particularly demanding data planes applications. When one looks at virtualized network function, one has to contend with the host OS and the guest OS.
In both cases, a major loss of performance is observed at enter and exits of a VM and the OS. These software interrupt are necessary to pull the packets from the data plane to the application layer so that they can be processed. The cost of software interrupt for kernel Linux access ends up being prohibitive and create bottlenecks and race conditions as the traffic increases and more thread are involved. Specifically, every time the application needs to access the Linux kernel, it must pause the VM, save its context and stall the application to access the kernel. For instance, a base station can have over 100k software interrupt per second.

Intel DPDK and SR-IOV

Intel Data Plane Development Kit (DPDK) developed by 6Wind is used for I/O and packet forwarding functions. A “fast path” is created between the VM and the virtual network interface card (NIC) that improves the data path processing performance. This implementation allows to effectively bypass the guest hypervisor and to provide fast processing of packets between the VM and the host.

At the host level, Single Root I/O Virtualization (SR-IOV) is also used in conjunction with DPDK to provide NIC-to-VM connectivity, bypassing the Linux host and improving packet forwarding performance. The trade off there is that each VM on SR-IOV requires to be tied to a physical network card.

Performance or Flexibility?

The implementation of DPDK and SR-IOV has a cost. While performance for VNFs implementing both techniques show results close to physical appliance, the trade off is flexibility. In implementing these mechanisms, the VMs are effectively bound to the physical hardware resource they depend on. A perfect configuration and complete identical replication of every element, at the software and physical level is necessary for migration and scaling out. While Intel is working on a virtual DPDK integrated in the hypervisor, implementation of SDN / NFV in wireless networks for data plane hungry network functions will force vendors and networks in the short to medium term to choose between performance or flexibility.

Pages

Connect on Linkedin