Monday, August 26, 2024

of AI, automation, complex and complicated systems

 

I get drawn these days into discussions about the soft spot of AI. What is the best use of AI/ML, its utility in generative AI and its use in network automation, optimisation and autonomic functions.

In many cases, these discussions stumble upon misconceptions about the mechanics of statistics and their applications.

To put it simply, many do not distinguish between complexity and complication, which has a great effect on expectations of problem solving, automation and outcome prediction. A complex problem is an assembly of problems that can be broken down in subsets until simple unique problems can be identified, tagged, troubleshooted and resolved. These problems are ideal targets for automation. No matter how complex the task, if it can be broken down, if a method of procedure (mop) can be written for each subtask and eventually for the whole problem, it can be measured, automated, predicted and efficiency gains can be achieved.

Complicated problems are a different animal altogether. They might have sub task that can be identified and broken down, but other parts that have a large level of unknown and uncertainty.

Large Language Models can try to reduce the uncertainty by having larger samples, enabling even outlier patterns to emerge and be identified, but in many cases, complicated problems have dependencies that cannot be easily resolved from a pure mathematical standpoint.

This is where domain expertise comes in. In many cases, whenever issues arise in a telecoms network, it is not necessarily identified immediately from the source of the issue. Troubleshooting in many case requires knowledge of network topology, call flows, protocols, and multi domain expertise across core, transport, access, peering point, connectivity, data centers...

It is not possible to automate what you do not operate well. You cant operate well a system that you can't measure well and you can't measure well a system without a consolidated data storage and management strategy. In many cases, telco systems still produce logs in a proprietary format, on siloed systems and collecting, cleaning, exporting, processing, storing these data in a fully integrated data system is still in its infancy. This is however the very first step before even the categorization into complex or complicated issues can take place.

In many casse, data literacy need to pervade the entire organization to ensure that a data-driven strategy can be enacted, let alone moving to automation, autonomic or AI predictive systems. 

It becomes therefore very important to try and isolate complex from complicated systems and issues and try to apply as much data science and automation to the former, before trying to force AI/ML to the latter. As a rule of thumb, as the number of tasks or variables and the complexity increases, one can move from optimization, using scripting to automation, using scripting + ML, to prediction using AI / ML. As the number of unknowns and complication increases, one has to use subject matter experts and domain experts, to multi domain experts with end to end view of the system. 

As complications and tasks increase, the possibility to achieve autonomous systems decrease, as human expertise and manual intervention increase. Data science becomes less an operator than an attendant or an assistant to detect, automate the subset of tasks with identified outcome and patterns, accelerating resolution of the more complicated problem.

Friday, August 16, 2024

Rant: Why do we need 6G anyway?


I have to confess that, even after 25 years in the business, I am still puzzled by the way we build mobile networks. If tomorrow we were to restart from scratch, with today's technology and knowledge of the market, we would certainly design and deploy them in a very different fashion.

Increasingly, mobile network operators (MNOs) have realized that the planning, deployment and management of the infrastructure is a fundamentally different business than the development and commercialization of the associated connectivity services. They follow different investment and amortization cycle and have very different economic and financial profiles. For this reason, investors value network infrastructure differently from digital services and many MNOs have decided to start separating their fibre, antennas, radio assets from their commercial operation.

This has resulted in a flurry of splits, spin off, divestiture and the growth of tower and infrastructure specialized companies. If we follow this pattern to its logical conclusion, looking at the failed economics of 5G and the promises of 6G, one has to wonder whether we are on the right path.

Governments keep treating spectrum as a finite, exclusive resource, whereas as private networks and unlicensed spectrum demand is increasing, it is clear that there is a cognitive dissonance in the economic model. If 5G's success was predicated on enterprise, industries and verticals connectivity and if these organizations have needs that cannot be satisfied by the public networks, why would MNOs spend so much money on a spectrum that is unlikely to bring additional revenue? The consumer market does not need another G until new services and devices emerge that mandate different connectivity profiles. Metaverse was a fallacy, autonomous vehicles, robots... are in their infancy and workaround the lack of connectivity adequacy by keeping their compute and sensors on device, rather than at the edge.

As the industry prepares for 6G and its associated future hype and non sensical use cases and fantastical services, one has to wonder how can we stop designing networks for use cases that never emerge as dominant, forcing redesigns and late adaptation. Our track record as an industry is not great there. If you remember, 2G was designed for voice services. Texting was the unexpected killer app. 3G was designed for Push to talk over Cellular, believe it or not (remember SIP and IMS...) and picture messaging early browsing were successful. 4G was designed for Voice over LTE (VoLTE) and video / social media were the key services. 5G was supposed to be designed for enterprise and industry connectivity but failed to deliver so far (late implementation of slicing and 5G Stand Alone). So... what do we do now?

First, the economic model has to change. Rationally, it is not economically efficient for 4 or 5 MNOs to buy spectrum and deploy their separate networks to cover the same population. We are seeing more and more network sharing agreements, but we must go further. In many countries, it makes more sense to have a single neutral infrastructure operator, including the cell sites, radio, the fiber backhaul even edge data centers / central offices all the way but not including the core. This neutral host can have an economic model based on wholesale and the MNOs can focus on selling connectivity products.

Of course, this would probably suppose some level of governmental and regulatory overhaul to facilitate this model. Obviously, one of the problems here is that many MNOs would have to transfer assets and more importantly personnel to that neutral host, which would undoubtedly see much redundancy from 3 or 4 teams to one. Most economically advanced countries have unions protecting these jobs, so this transition is probably impossible unless a concerted effort to cap hires / not renew retirement departures / retrain people is effected over many years...

The other part of the equation is the connectivity and digital services themselves. Let's face it, connectivity differentiation has mostly been a pricing and bundling exercise to date. MNOs have not been overly successful with the creation and sale of digital services, the emergence of social media, video streaming services having occupied most of the consumer's interest. On the enterprise's side a large part of the revenue is related to the exploitation of the last mile connectivity, with the sale of secure private connections on public networks in the form of MPLS first then SD-WAN to SASE and cloud interconnection as the main services. Gen AI promises to be the new shining beacon of advanced services, but in truth, there is very little there in the short term in terms of differentiation for MNOs. 

There is nothing wrong with being a very good, cost effective, performant utility connectivity provider. But most markets can probably accommodate only one or two of these. Other MNOs, if they want to survive, must create true value in the form of innovative connectivity services. This supposes not only a change of mindset but also skill set. I think MNOs need to look beyond the next technology, the next G and evolve towards a more innovative model. I have worked on many of these, from the framework to the implementation and systematic creation of sustainable competitive advantage. It is quite different work from standards and technology evolution approach favored by MNOs but necessary for these seeking to escape the utility model.

In conclusion, 6G or technological improvements in speed, capacity, coverage, latency... are unlikely to solve the systemic economical and differentiation problem for MNOs unless more effort is put on service innovation and radical infrastructure sharing.

Thursday, August 8, 2024

The journey to automated and autonomous networks

 

The TM Forum has been instrumental in defining the journey towards automation and autonomous telco networks. 

As telco revenues from consumers continue to decline and the 5G promise to create connectivity products that enterprises, governments and large organizations will be able to discover, program and consume remains elusive, telecom operators are under tremendous pressure to maintain profitability.

The network evolution started with Software Defined Networks, Network Functions Virtualization and more recently Cloud Native evolution aims to deliver network programmability for the creation of innovative on-demand connectivity services. Many of these services require deterministic connectivity parameters in terms of availability, bandwidth, latency, which necessitate end to end cloud native fabric and separation of control and data plane. A centralized control of the cloud native functions allow to abstract resource and allocate them on demand as topology and demand evolve.

A benefit of a cloud native network is that, as software becomes more open and standardized in a multi vendor environment, many tasks that were either manual or relied on proprietary interfaces can now be automated at scale. As layers of software expose interfaces and APIs that can be discovered and managed by sophisticated orchestration systems, the network can evolve from manual, to assisted, to automated, to autonomous functions.


TM Forum defines 5 evolution stages from full manual operation to full autonomous networks.

  • Condition 0 - Manual operation and maintenance: The system delivers assisted monitoring capabilities, but all dynamic tasks must be 0 executed manually
  • Step 1 - Assisted operations and maintenance: The system executes a specific, repetitive subtask based on pre-configuration, which can be recorded online and traced, in order to increase execution efficiency.
  • Step 2: - Partial autonomous network: The system enables closed-loop operations and maintenance for specific units under certain external environments via statically configured rules.
  • Step 3 - Conditional autonomous network: The system senses real-time environmental changes and in certain network domains will optimize and adjust itself to the external environment to enable, closed-loop management via dynamically programmable policies.
  • Step 4 - Highly autonomous network: In a more complicated cross-domain environment, the system enables decision-making based on predictive analysis or active closed-loop management of service-driven and customer experience-driven networks via AI modeling and continuous learning.
  • Step 5 - Fully autonomous network: The system has closed-loop automation capabilities across multiple services, multiple domains (including partners’ domains) and the entire lifecycle via cognitive self-adaptation.
After describing the framework and conditions for the first 3 steps, the TM Forum has recently published a white paper describing the Level 4 industry blueprints.

The stated goals of level 4 are to enable the creation and roll out of new services within 1 week with deterministic SLAs and the delivery of Network as a service. Furthermore, this level should allow fewer personnel to manage the network (1000's of person-year) while reducing energy consumption and improving service availability.

These are certainly very ambitious objectives. The paper goes on to describe "high value scenarios" to guide level 4 development. This is where we start to see cognitive dissonance creeping in between the stated objectives and the methodology.  After all, much of what is described here exists today in cloud and enterprise environments and I wonder whether Telco is once again reinventing the wheel in trying to adapt / modify existing concepts and technologies that are already successful in other environments.

First, the creation of deterministic connectivity is not (only) the product of automation. Telco networks, in particular mobile networks are composed of a daisy chain of network elements that see customer traffic, signaling, data repository, look up, authentication, authorization, accounting, policy management functions being coordinated. On the mobile front, the signal effectiveness varies over time, as weather, power, demand, interferences, devices... impact the effective transmission. Furthermore, the load on the base station, the backhaul, the core network and the  internet peering point also vary over time and have an impact on its overall capacity. As you understand, creating a connectivity product with deterministic speed, latency capacity to enact Network as a Service requires a systemic approach. In a multi vendor environment, the RAN, the transport, the core must be virtualized, relying on solid fiber connectivity as much as possible to enable the capacity and speed. The low latency requires multiple computing points, all the way to the edge or on premise. The deterministic performance requires not only virtualization and orchestration of the RAN, but also the PON fiber and end to end slicing support and orchestration. This is something that I led at Telefonica with an open compute edge computing platform, a virtualized (XGS) PON on a ONF ONOS VOLTHA architecture with an open virtualized RAN. This was not automated yet, as most of these elements were advanced prototype at that stage, but the automation is the "easy" part once you have assembled the elements and operated them manually for enough time. The point here is that deterministic network performances is attainable but still a far objective for most operators and it is a necessary condition to enact NaaS, before even automation and autonomous networks.

Second, the high value scenarios described in the paper are all network-related. Ranging from network troubleshooting, to optimization and service assurance, these are all worthy objectives, but still do not feel "high value" in terms of creation of new services. While it is natural that automation first focuses on cost reduction for roll out, operation, maintenance, healing of network, one would have expected more ambitious "new services" description.

All in all, the vision is ambitious, but there is still much work to do in fleshing out the details and linking the promised benefits to concrete services beyond network optimization.

Friday, July 5, 2024

Readout: Ericsson's Mobility Report June 2024

 


It has been a few years now, since Ericsson has taken to provide a yearly report on their view of the evolution of connectivity. Alike Cisco's annual internet report, it provides interesting data points on telecom technology and services' maturity, but focused on cellular technology, lately embracing fixed-wireless access and non terrestrial networks as well. 

In this year's edition, a few elements caught my attention:

  • Devices supporting network slicing are few and far in-between. Only iOS 17 and Android 13 support some capabilities to indicate slicing parameters to their underlying applications. These devices are the higher end latest smartphones, so it is no wonder that 5G Stand Alone is late in delivering on its promises, if end to end slicing is only possible for a small fraction of customers. It is still possible to deploy slicing without device support, but there are limitations, most notably slicing per content / service, while slicing per device or subscriber profile is possible.

  • RedCap (5G reduced Capability) for IoT, wearables, sensors, etc... is making its appearance on the networks, mostly as demo and trials at this stage. The first devices are unlikely to emerge in mass market availability until end of next year.

  • Unsurprisingly, mobile data traffic is still growing, albeit at a lower rate than previously reported with a 25% yearly growth rate or just over 6% quarterly. The growth is mostly due to smartphones and 5G penetration and video consumption, accounting for about 73% of the traffic. This traffic data includes Fixed Wireless Access, although it is not broken down. The rollout of 5G, particularly in mid-band, together with carrier aggregation has allowed mobile network operators to efficiently compete with fixed broadband operators with FWA. FWA's growth, in my mind is the first successful application of 5G as a differentiated connectivity product. As devices and modems supporting slicing appear, more sophisticated connectivity and pricing models can be implemented. FWA price packages differ markedly from mobile data plans. The former are mostly speed based, emulating cable and fibre offering, whereas the latter are usually all you can eat best effort connectivity.

  • Where the traffic growth projections become murky, is with the impact of XR services. Mixed, augmented, virtual reality services haven't really taken off yet, but their possible impact on traffic mix and network load can be immense. XR requires a number of technologies to reach maturity at the same time (bendable / transparent screens, low power, portable, heat efficient batteries, low latency / high compute on device / at the edge, high down/ up link capabilities, deterministic mash latency over an area...) to reach mass market and we are still some ways away from it in my opinion.

  • Differential connectivity for cellular services is a long standing subject of interest of mine. My opinion remains the same: "The promise and business case of 5G was supposed to revolve around new connectivity services. Until now, essentially, whether you have a smartphone, a tablet, a laptop, a connected car, an industrial robot and whether you are a working from home or road warrior professional, all connectivity products are really the same. The only variable are the price and coverage.

    5G was supposed to offer connectivity products that could be adapted to different device types, verticals and industries, geographies, vehicles, drones,... The 5G business case hinges on enterprises, verticals and government adoption and willingness to pay for enhanced connectivity services. By and large, this hasn't happened yet. There are several reasons for this, the main one being that to enable these, a network overall is necessary.

    First, a service-based architecture is necessary, comprising 5G Stand Alone, Telco cloud and Multi-Access Edge Computing (MEC), Service Management and Orchestration are necessary. Then, cloud-native RAN, either cloud RAN or Open RAN (but particularly the RAN Intelligent Controllers - RICs) would be useful. All this "plumbing" to enable end to end slicing, which in turn will create the capabilities to serve distinct and configurable connectivity products.

    But that's not all... A second issue is that although it is accepted wisdom that slicing will create connectivity products that enterprises and governments will be ready to pay for, there is little evidence of it today. One of the key differentiators of the "real" 5G and slicing will be deterministic speed and latency. While most actors of the market are ready to recognize that in principle a controllable latency would be valuable, no one really knows the incremental value of going from variable best effort to deterministic 100, 10 or 5 millisecond latency.

    The last hurdle, is the realization by network operators that Mercedes, Wallmart, 3M, Airbus... have a better understanding of their connectivity needs than any carrier and that they have skilled people able to design networks and connectivity services in WAN, cloud, private and cellular networks. All they need is access and a platform with APIs. A means to discover, reserve, design connectivity services on the operator's network will be necessary and the successful operators will understand that their network skillset might be useful for consumers and small / medium enterprises, but less so for large verticals, government and companies." Ericsson is keen to promote and sell the "plumbing" to enable this vision to MNOs, but will this be sufficient to fulfill the promise?

  • Network APIs are a possible first step to open up the connectivity to third parties willing to program it. Network APIs is notably absent from the report, maybe due to the fact that the company announced a second impairment charge of 1.1B$ (after a 2.9B$ initial write off) in less than a year on the 6.2B$ acquisition of Vonage.

  • Private networks are another highlighted trend in the report with a convincing example of an implementation with Northstar innovation program, in collaboration with Telia and Astazero. The implementation focuses on automotive applications, from autonomous vehicle, V2X connectivity, remote control... On paper, it delivers everything operators dream about when thinking of differentiated connectivity for verticals and industries. One has to wonder how much it costs and whether it is sustainable if most of the technology is provided by a single vendor.

  • Open RAN and Programmable networks is showcased in AT&T's deal that I have previously reported and commented. There is no doubt that single vendor automation, programmability and open RAN can be implemented at scale. The terms of the deal with AT&T seem to indicate that it is a great cost benefit for them. We will have to measure the benefits as the changes are being rolled out in the coming years.


Wednesday, July 3, 2024

June 2024 Open RAN requirements from Vodafone, Telefonica, Deutsche Telekom, Tim and Orange


 As is now customary, the "big 5" European operators behind open RAN release their updated requirements to the market, indicating to vendors where they should direct their roadmaps to have the most chances to be selected in these networks.

As per previous iterations, I find it useful to compare and contrast the unanimous and highest priority requirements as indications of market maturity and directions. Here is my read on this year's release:

Scenarios:

As per last year, the big 5 unanimously require support for O-RU and vDU/CU with open front haul interface on site for macro deployments. This indicates that although the desire is to move to a disaggregated implementation, with vDU / CU potentially moving to the edge or the cloud, all the operators are not fully ready for these scenario and prioritize first a deployment like for like of a traditional gnodeB with a disaggregated virtualized version, but all at the cell site. 

Moving to the high priority scenarios requested by a majority of operators, vDU/vCU in a remote site with O-RU on site makes its appearance, together for RAN sharing. Both MORAN and MOCN scenarios are desirable, the former with shared O-RU and dedicated vDU/vCU and the latter with shared O-RU, vDU and optionally vCU. In all cases, RAN sharing management interface is to be implemented to allow host and guest operators to manage their RAN resource independently.

Additional high priority requirements are the support for indoor and outdoor small cells. Indoor sharing O-RU and vDU/vCU in multi operator environments, outdoors with single operator with O-RU and vDU either co-located on site or fully integrated with Higher Layer Split. The last high priority requirement is for 2G /3G support, without indication of architecture.

Security:

The security requirements are sensibly the same as last year, freely adopting 3GPP requirements for Open RAN. The polemic around Open RAN's level of security compared to other cloud virtualized applications or traditional RAN architecture has been put to bed. Most realize that open interfaces inherently open more attack surfaces, but this is not specific to Open RAN, every cloud based architecture has the same drawback. Security by design goes a long way towards alleviating these concerns and proper no trust architecture can in many cases provide a higher security posture than legacy implementations. In this case, extensive use of IPSec, TLS 1.3, certificates at the interfaces and port levels for open front haul and management plane provide the necessary level of security, together with the mTLS interface between the RICs. The O-Cloud layer must support Linux security features, secure storage, encrypted secrets with external storage and management system.

CaaS:

As per last year, the cloud native infrastructure requirements have been refined, including Hardware Accelerator (GPU, eASIC) K8 support, Block and Object Storage for dedicated and hyper converged deployments, etc... Kubernetes infrastructure discovery, deployment, lifecycle management and cluster configuration has been further detailed. Power saving specific requirements have been added, at the Fan, CPU level with SMO driven policy and configuration and idle mode power down capabilities.

CU / DU:

CU DU interface requirements remain the same, basically the support for all open RAN interfaces (F1, HLS, X2, Xn, E1, E2, O1...). The support for both look aside and in-line accelerator architecture is also the highest priority, indicating that operators havent really reached a conclusion for a preferable architecture and are mandating both for flexibility's sake (In other words, inline acceleration hasn't convinced them that it can efficiently (cost and power) replace look aside). Fronthaul ports must support up to 200Gb by 12 x 10/25Gb combinations and mid haul up to 2 x 100Gb. Energy efficiency and consumption is to be reported for all hardware (servers, CPUs, fans, NIC cards...). Power consumption targets for D-RAN of 400Watts at 100% load for 4T4R and 500 watts for 64T64R are indicated. These targets seem optimistic and poorly indicative of current vendors capabilities in that space.

O-RU:

The radio situation is still messy and my statements from last year still mostly stand: "While all operators claim highest urgent priority for a variety of Radio Units with different form factors (2T2R, 2T4R, 4T4R, 8T8R, 32T32R, 64T64R) in a variety of bands (B1, B3, B7, B8, B20, B28B, B32B/B75B, B40, B78...) and with multi band requirements (B28B+B20+B8, B3+B1, B3+B1+B7), there is no unanimity on ANY of these. This leads vendors trying to find which configurations could satisfy enough volume to make the investments profitable in a quandary. There are hidden dependencies that are not spelled out in the requirements and this is where we see the limits of the TIP exercise. Operators cannot really at this stage select 2 or 3 new RU vendors for an open RAN deployment, which means that, in principle, they need vendors to support most, if not all of the bands and configurations they need to deploy in their respective network. Since each network is different, it is extremely difficult for a vendor to define the minimum product line up that is necessary to satisfy most of the demand. As a result, the projections for volume are low, which makes the vendors only focus on the most popular configurations. While everyone needs 4T4R or 32T32R in n78 band, having 5 vendor providing options for these configurations, with none delivering B40 or B32/B75 makes it impossible for operators to select a single vendor and for vendors to aggregate sufficient volume to create a profitable business case for open RAN." This year, there is one configuration of high priority that has unanimous support: 4T4R B3+B1. The other highest priority configurations requested by a majority of operators are 2T4R B28B+B20+B8, 4T4R B7, B3+B1, B32B+B75B, and 32T32R B78 with various power targets from 200 to 240W.

Open Front Haul:

The Front Haul interface requirements only acknowledge the introduction of Up Link enhancements for massive MIMO scenarios as they will be introduced to the 7.2.x specification, with a lower priority. This indicates that while Ericsson's proposed interface and architecture impact is being vetted, it is likely to become an optional implementation, left to the vendor's s choice until / unless credible cost / performance gains can be demonstrated.

Transport:

Optical budgets and scenarios are now introduced.

RAN features:

Final MoU positions are now proposed. Unanimous items introduced in this version revolve mostly around power consumption and efficiency counters, KPIs and mechanisms. other new requirements introduced follow 3GPP rel 16 and 17 on carrier aggregation, slicing and MIMO enhancements.

Hardware acceleration:

a new section introduced to clarify the requirements associated with L1 and L2 use of look aside and inline. The most salient requirement is for multi RAT 4G/5G simultaneous support.

Near RT RIC:

The Near Real Time RIC requirements continue to evolve and be refined. My perspective hasn't changed on the topic. and a detailed analysis can be found here. In short letting third party prescribe policies that will manipulate the DU's scheduler is anathema for most vendors in the space and, beyond the technical difficulties would go against their commercial interests. operators will have to push very hard with much commercial incentive to see xapps from 3rd party vendors being commercially deployed.

E2E use cases:

End-to-end use cases are being introduced to clarify the operators' priorities for deployments. There are many  but offer a good understanding of their priorities. Traffic steering for dynamic traffic load balancing, QoE and QoS based optimization, to optimize resource allocation based on a desired quality outcome... RAN sharing, Slice assurance, V2x, UAV, energy efficiency... this section is a laundry list of desiderata , all mostly high priority, showing here maybe that operators are getting a little unfocused on what real use cases they should focus on as an industry. As a result, it is likely that too many priorities result in no priority at all.

SMO

With over  260 requirements, SMO and non RT RIC is probably a section that is the most mature and shows a true commercial priority for the big 5 operators.

All in all, the document provides a good idea of the level of maturity of Open RAN for the the operators that have been supporting it the longest. The type of requirements, their prioritization provides a useful framework for vendors who know how to read them.

More in depth analysis of Open RAN and the main vendors in this space is available here.


Thursday, June 20, 2024

Telco grade or cloud grade ? II

I have oftentimes criticized network operators’ naivety when it comes to their capacity to convince members of the ecosystem to adopt their telco idiosyncrasies.

Thursday, May 2, 2024

How to manage mobile video with Open RAN

Ever since the launch of 4G, video has been a thorny issue to manage for network operators. Most of them had rolled out unlimited or generous data plans, without understanding how video would affect their networks and economics. Most videos streamed to your phones use a technology called Adaptive Bit Rate (ABR), which is supposed to adapt the video’s definition (think SD, HD, 4K…) to the network conditions and your phone’s capabilities. While this implementation was supposed to provide more control in the way videos were streamed on the networks, in many cases it had a reverse effect.

 

The multiplication of streaming video services has led to ferocious competition on the commercial and technological front. While streaming services visibly compete on their pricing and content attractiveness, a more insidious technological battle has also taken place. The best way to describe it is to compare video to a gas. Video will take up as much capacity in the network as is available.

When you start a streaming app on your phone, it will assess the available bandwidth and try to deliver the highest definition video available. Smartphone vendors and streaming providers try to provide the best experience to their users, which in most cases means getting the highest bitrate available. When several users in the same cell try to stream video, they are all competing for the available bandwidth, which leads in many cases to a suboptimal experience, as some users monopolize most of the capacity while others are left with crumbs.

 

In recent years, technologies have emerged to mitigate this issue. Network slicing, for instance, when fully implemented could see dedicated slices for video streaming, which would theoretically guarantee that video streaming does not adversely impact other traffic (video conferencing, web browsing, etc…). However, it will not resolve the competition between streaming services in the same cell.

 

Open RAN offers another tool for efficiently resolving these issues. The RIC (RAN Intelligent Controller) provides for the first time the capability to visualize in near real time a cell’s congestion and to apply optimization techniques with a great level of granularity. Until Open RAN, the means of visualizing network congestion were limited in a multi-vendor environment and the means to alleviate them were broad and coarse. The RIC allows to create policies at the cell level, on a per connection basis. Algorithms allow traffic type inference and policies can be enacted to adapt the allocated bandwidth based on a variety of parameters such as signal strength, traffic type, congestion level, power consumption targets…

 

For instance, an operator or a private network for stadiums or entertainment venues could easily program their network to not allow upstream videos during a show, to protect broadcasting or intellectual property rights. This can be easily achieved by limiting the video uplink traffic while preserving voice, security and emergency traffic.

 

Another example would see a network actively dedicating deterministic capacity per connection during rush hour or based on threshold in a downtown core to guarantee that all users have access to video services with equally shared bandwidth and quality.

 

A last example could see first responder and emergency services get guaranteed high-quality access to video calls and broadcasts.

 

When properly integrated into a policy and service management framework for traffic slicing, Open RAN can be an efficient tool for adding fine grained traffic optimization rules, allowing a fairer apportioning of resource for all users, while preserving overall quality of experience.

 

Wednesday, March 27, 2024

State of Open RAN 2024: Executive Summary

 

The 2023 Open RAN market ended with a bang with AT&T awarding to Ericsson and Fujitsu a $14 billion deal to convert 70% of its traffic to run on Open RAN by end of 2026. 2024 started equally loud with the $13 billion acquisition of Juniper Networks from HPE on the thesis of the former company’s progress in telecoms AI and specifically in RAN intelligence with the launch of their RIC program.

2023 also saw the long-awaited launch of Drillish 1&1 in Germany, the first Open RAN greenfield in Europe, as well as the announcement from Vodafone that they will release a RAN RFQ that will see 30% of its 125,000 global sites dedicated to Open RAN.

Commercial deployments are now under way in western Europe, spurred by Huawei replacement mandates.

On the vendor’s front, Rakuten Symphony seems to have markedly failed to capitalize on Altiostar’s acquisition and convince brownfield network operators to purchase telecom gear from a fellow network operator. While Ericsson has announced its support for Open RAN with conditions, Samsung has been the vendor making the most progress with convincing market share growth across the geographies it covers. Mavenir has been steadily growing. A new generation of vendors have taken advantage of the Non-Real-Time RIC / SMO opportunity to enter the space. Non-traditional RAN vendors such as VMWare and Juniper Networks or SON vendors like Airhop have grown the most in that space, together with pure new entrants App players such as Rimedo Labs. With the acquisition of VMWare and Juniper Networks, both leaders in the RIC segment, 2024 could be live or die for this category, as the companies are reevaluating their priorities and aligning commercial interest with their acquirers.

On the technology side, the O-RAN alliance has continued its progress, publishing new releases while establishing bridgeheads with 3GPP and ETSI to facilitate the inclusion of Open RAN in the mainstream 5G advanced and 6G standards. The accelerator debate between inline and look aside architectures has died down, with the first layer 1 abstraction layers allowing vendors to effectively deploy on different silicon with minimal adjustment. Generative AI and large language models have captured the industry’s imagination and Nvidia has been capitalizing on the seemingly infinite appetite for specialized computing in cloud and telecom networks.

This report provides an exhaustive review of the key technology trends, vendors product offering, and strategies, ranging from silicon, servers, cloud CaaS, Open RUs, DU, CUs, RICs, apps and SMOs in the open RAN space in 2024.

Tuesday, March 19, 2024

Why are the US government and DoD in particular interested in Open RAN?

Over the last 24 months, it has been very interesting to see that the US Government has been moving from keen interest in Open RAN to make it policy for its procurement of connectivity technology.

As I am preparing to present for next week's RIC Forum, organized by NTIA and the US Department of Defense, many of my clients have been asking why the US Government seems so invested in Open RAN.

Supply chain diversification:

The first reason for this interest is the observation that the pool of network equipment provider has been growing increasingly shallow. The race from 3G to 4G to 5G has required vendors to attain a high level of industrialization and economy of scale, that has been achieved through many rounds of concentration. A limited supply chain with few vendors per category represents a strategic risk for the actor relying on this supply chain to operate economically. Open RAN allows the emergence of new vendors in specific categories that do not necessitate the industrial capacity to be delivering end to end RAN networks.

Cost effectiveness:

The lack of vendor choice has shifted negotiating power from network operators to vendors, which has negatively impacted margins and capacity to make changes. The emergence of new Open RAN vendors puts pressure on incumbents and traditional vendors to reduce their margins.

Geostrategic interest:

The growth of Huawei, ZTE and other Chinese vendors, with their suspected links to the Chinese government and Army, together with the somewhat obscure privacy and security laws there, has prompted the US government and many allies to ban or severely restraint the categories of Telecom Products that can be deployed in many telecom networks.

Furthermore, while US companies dominate traffic management, routing, data centers and hyperscalers space, the RAN, core network and general telco infrastructure remains dominated by European and Asian vendors. Open RAN has been an instrument to facilitate and accelerate Chinese vendors replacement, but also to stimulate the US vendors to emerge and grow.

DoD use case example: Spectrum Dominance

This area is less well understood and recognized but is an integral part of US Government generally and DoD's in particular interest in Open RAN. Private networks require connectivity products adapted for specific use cases, devices and geographies. Commercial macro networks offer "one size fits all" solution that are difficult and costly to adapt for that purpose. Essentially DoD runs hundreds of private networks, whether on its bases, its carriers or in ad hoc tactical environments. Being able to setup a secure, programmable, cost effective network, either permanently or ad hoc is an essential requirement, and can also become a differentiator or a force multiplier. A tactical unit deploying an ad hoc network might look at means not only to create a secure subnet, but also to establish spectrum dominance by manipulating waveforms and effectively interfering with adverse networks. This is one example where programmability at the RAN level can turn into an asset for battlefield dominance. There are many more use cases, but their classification might not enable us to publicly comment them. They illustrate though how technological dominance can extend to every aspect of telecom.

Open RAN in that respect provides programmability, cost effectiveness and modularity to create fit for purpose connectivity experiences in a multi vendor environment.


Wednesday, January 31, 2024

The AI-native telco network

AI, and more particularly generative AI has been a big buzzword since the public launch of GTP. The promises of AI to automate and operate complex tasks and systems are pervading every industry and telecom is not impervious to it. 

Most telecom equipment vendors have started incorporating AI or brushed up their big data / analytics skills at least in their marketing positioning. 
We have even seen a few market acquisitions where AI / automation has been an important part of the investment narrative / thesis (HPE / Juniper Networks)
Concurrently, many startups are being founded or are pivoting towards AI /ML to take advantage of this investment cycle. 

In telecoms, there has been use for big data, machine learning, deep learning and other similar methods for a long time. I was leading such a project at Telefonica on 2016, using advanced prediction algorithms to detect alarming patterns, infer root cause analysis and suggest automated resolutions. 

While generative AI is somewhat new, the use of data to analyze, represent, predict network conditions is well known. 

AI in telecoms is starting to show some promises, particularly when it comes to network planning, operation, spectrum optimization, traffic prediction, and power efficiency. It comes with a lot of preconditions that are often glossed over by vendors and operators alike. 

Like all data dependent technologies, one has first to have the ability to collect, normalize, sanitize and clean data before storing it for useful analysis. In an environment as idiosyncratic as a telecoms network, this is not an easy task. Not only networks are composed of a mix of appliances, virtual machines and cloud native functions, they have had successive technological generations deployed along each other, with different data schema, protocols, interface, repository which makes the extraction arduous. After that step, normalization is necessary to ensure that the data is represented the same way, with the same attributes, headers, … so that it can be exploited. Most vendors have their proprietary data schemes or “augment” standard with “enhanced” headers and metadata. In many case the data need to be translated in a format that can be normalized for ingestion. The cleaning and sanitizing is necessary to ensure that redundant or outlying data points do not overweight the data set. As always, “garbage in / garbage out” is an important concept to keep in mind. 

These difficult steps are unfortunately not the only prerequisite for an AI native network. The part that is often overlooked is that the network has to be somewhat cloud native to take full advantage of AI. The automation in telecoms networks requires interfaces and APIs to be defined, open and available at every layer, from access to transport to the core, from the physical to the virtual and cloud native infrastructure. NFV, SDN, network disaggregation, open optical, open RAN, service based architecture, … are some of the components that can enable a network to take full advantage of AI. 
Cloud networks and data centers seem to be the first to adopt AI, both for the hosting of the voracious GPUs necessary to train the Large Language Models and for the resale / enablement of AI oriented companies. 

For that reason, the more recent greenfield networks that have been recently deployed with the state of the art cloud native technologies should be the prime candidates for AI / ML based network planning, deployment and optimization. The amount of work necessary for the integration and deployment of AI native functions is objectively much lower than their incumbent competitors. 
We haven’t really seen sufficient evidence that this level of cloud "nativeness" enables mass optimization and automation with AI/ML that would result in massive cost savings in at least OPEX, creating a unfair competitive advantage against their incumbents. 

As the industry approaches Mobile World Congress 2024, with companies poised to showcase their AI capabilities, it is crucial to remain cognizant of the necessary prerequisites for these technologies to deliver tangible benefits. Understanding the time and effort required for networks to truly benefit from AI is essential in assessing the realistic impact of these advancements in the telecom sector.

Friday, January 26, 2024

Product Marketing as a Strategic Tool for Telco Vendors

Those who know me for a long time know that I am a Product Manager by trade. This is how I started my career and little by little, from products, to product lines, to solutions I have come to manage and direct business lines worth several hundred of millions of dollars. Along this path, I have become also a manager and team lead, then moved onto roles with increasing strategic content, from reselling, OEM, deals to buy and sell side acquisitions and integrations.

Throughout this time, I have noticed the increased significance of Product Marketing in the telecoms vendors environment. In a market that has seen (and is still seeing) much concentration, with long sales cycles and risk-adverse customers, being able to intelligently and simply state a product's differentiating factor becomes paramount.

Too often, large companies rely on brand equity and marketing communication to support sales. In a noisy market, large companies have many priorities, which end up diluting the brand promise and provide vague and disconnected messages across somewhat misaligned product and services.

By contrast, start ups and small companies often have much smaller range of products and services, but having less budget, focus in may case on technology and technical illustrations rather than exalting the benefits and value of their offering.

My experience has underscored the pivotal role of product marketing in shaping a company's valuation, whether for fundraising or acquisition purposes. Yet, despite its proven impact, many still regard it as a peripheral activity. The challenge lies in crafting a narrative that resonates—a narrative that not only embodies the company's strategic vision but also encapsulates market trends, technological evolutions, and competitive dynamics. It's about striking a delicate balance, weaving together product capabilities, customer pain points, and the distinct value proposition in a narrative that is both compelling and credible.

Many companies will have marketing communication departments working on product marketing, which often results in either vague and bland positioning or in disconnects between the claims and the true capabilities of the products. This can be very damaging for a company's image when its market claims do not reflect accurately the capabilities of the product or the evolution of the technology. 

Other companies have the product marketing as part of the product management function, whereas the messaging and positioning might be technically accurate, but lack competitive and market awareness to resonate and find a differentiating position that will maximize the value of the offering.

As the telecoms vendors' sector braces for heightened competition and market contraction, with established players fiercely guarding protecting their market share against aggressive newcomers, the role of product marketing becomes increasingly critical. It's an art form that merits recognition, demanding heightened attention and strategic investment. For those poised to navigate this complex terrain, embracing product marketing is not just an option; it's an imperative for sustained relevance and success in challenging market conditions. 


Monday, January 15, 2024

Gen AI and LLM: Edging the Latency Bet

The growth of generative AI and Large Language Models has restarted a fundamental question about the value of a millisecond of latency. When I was at Telefonica, and later, consulting at Bell Canada, one of the projects I was looking after was the development, business case, deployment, use cases and operation of Edge Computing infrastructure in a telecom network.

Since I have been developing and deploying Edge Computing platforms since 2016,  I have had a head start in figuring out the fundamental questions surrounding the technlogy, the business case and the commercial strategy.

Where is the edge?


The first question one has to tackle is where is the edge. It is an interesting question because it depends on your perspective. the edge is a different location if you are an hyperscaler, a telco network operator or a developer. It can also vary over time and geography. In any case, the edge is a place where one can position compute closer than the current Public or Private Cloud Infrastructure in order to derive additional benefits. It can vary from a regional, to a metro to a mini data center, all the way to on premise or on device cloud compute capability. Each has its distinct cost, limitation and benefit.

What are the benefits of Edge Computing?


The second question, or maybe the first one, from a pragmatic and commercial standpoint is why do we need edge computing? What are the benefits?

While these will vary depending on the consumer of the compute capability, and where the compute function is located, we can derive general benefits that will be indexed to the location. Among these, we can list data sovereignty, increased privacy and security and reduced latency, enabling cheaper (dumber) devices, the creation of new media types and new models and services.

What are the use cases of Edge Computing?

I have deployed and researched over 50 use cases of edge computing, from the banal storage, caching and streaming at the edge to the sophisticated TV production or the specialized Open RAN or telco User Plane Function or machine vision use cases for industrial and agriculture application.

What is the value of 1ms?

Sooner or later, after testing various use cases, locations and architectures, the fundamental question emerges. What is the value of 1ms? It is a question heavy with assumptions and correlations. In absolute, we would all like a connectivity that is faster, more resilient, more power efficient, economical and with lower latency. The factors that condition latency are the number of hops or devices the connection has to go through between the device and the origin point where the content or code is stored, transformed, computed and the distance between the device and the compute point. To radically reduce latency, you have to reduce the number of hops or reduce the distance, Edge Computing achieves both.

But obviously, there is a cost. The latency will be proportional to the distance,  so the fundamental question becomes what is the optimal placement of a compute resource, for which use case? Computing is a continuum and some applications and workload are not latency or privacy or sovereignty sensitive and can run on an indiscriminate public cloud, while others necessitate the compute to be in the same country, region or city. Others even require a closer proximity. The difference is staggering in terms of investments between a handful of centralized data centers and several hundreds / thousands? of micro data center.

What about AI and LLM?

Until now, these questions where somewhat theoretical and were answered organically by hyperscalers and operators based on their respective view of the market evolution. Generative AI and its extraordinary appetite for compute is rapidly changing this market space. Not only Gen AI accounts for a sizable and growing portion of all cloud compute capacity, the question of latency now is getting to the fore. Gen AI relies on Large Language Models that require large amount of storage and compute, to be able to to be trained to recognize patterns. The larger the LLM, the more compute capacity, the better the pattern recognition. Pattern recognition leads to generation of similar results based on incomplete prompts / question / data set, that is Gen AI. Where does latency come in? Part of the compute to generate a response to a question is in the inference business. While the data set resides in a large compute data center in a centralized cloud, inference is closer to the user, at the edge, where it parses the request and attempts to feed the trained model with unlabeled input to receive a prediction of the answer based on the trained model. The faster the inference is, the more responses the model can provide, which means that low latency, is a competitive advantage for a gen AI service.

As we have seen there is a relatively small number of options to reduce latency and they all involve large investment. The question then becomes: what s the value of a millisecond? Is 100 or 10 sufficient? When it comes to high frequency trading, 1ms is extremely valuable (billions of dollars). When it comes to online gaming, low latency is not as valuable as controlled and uniform latency across the players. When it comes to video streaming, latency is generally not an issue, but when it comes to machine vision for sorting fruits on a mechanical conveyor belt running at 10km/h, it is very important.

I have researched and deployed many edge computing use cases and derived a fairly comprehensive workshop on the technological, commercial and strategic aspects of Edge computing and Data Center investment strategies.

If you would like to know more, please get in touch.

Tuesday, January 9, 2024

HPE acquires Juniper Networks


On January 8, the first rumors started to emerge that HPE was entering final discussions to acquire Juniper Networks for $13b. By January 9th, HPE announced that they have entered into a definitive agreement for the acquisition.

Juniper Networks, known for its high-performance networking equipment, has been a significant player in the networking and telecommunications sector. It specializes in routers, switches, network management software, network security products, and software-defined networking technology. HPE, on the other hand, has a broad portfolio that includes servers, storage, networking, consulting, and support services.

 The acquisition of Juniper Networks by HPE could be a strategic move to strengthen HPE’s position in the networking and telecommunications sector, diversify its product offerings, and enhance its ability to compete with other major players in the market such as Cisco.

Most analysis I have read so far have pointed out AIOps and Mist AI as the core thesis for acquisition, enabling HPE to bridge the gap between equipment vendor and solution vendor, particularly in the Telco space.

While this is certainly an aspect of the value that Juniper Networks would provide to HPE, I believe that the latest progress from Juniper Networks in Telco beyond transport, particularly as an early leader in the emerging field of RAN Intelligence and SMO (Service Management and Orchestration) was a key catalyst in HPE's interest.

After all, Juniper Networks has been a networking specialist and leader for a long time, from SDN, SD WAN, Optical to data center, wired and wireless networks. While the company has been making great progress there, gradually virtualizing  and cloudifying its routers, firewalls and gateway functions, no revolutionary technology has emerged there until the application of Machine Learning and predictive algorithms to the planning, configuration, deployment and management of transport networks.

What is new as well, is Juniper Networks' efforts to penetrate the telco functions domains, beyond transport. The key area ready for disruption has been the Radio Access Network (RAN), specifically with Open RAN becoming an increasingly relevant industry trend to pervade networks architecture, culminating with AT&T's selection of Ericsson last month to deploy Open RAN for $14B.

Open RAN offers disaggregation of the RAN, with potential multivendor implementations, benefitting from open standard interfaces. Juniper Networks, not a traditional RAN vendor, has been quick to capitalize on its AIOps expertise by jumping on the RAN Intelligence marketspace, creating one of the most advanced RAN Intelligent Controller (RIC) in the market and aggressively integrating with as many reputable RAN vendors as possible. This endeavor, opening up the multi billion $ RAN and SMO markets is pure software and heavily biased towards AI/ML for automation and prediction.

HPE has been heavily investing in the telco space of late, becoming a preferred supplier of Telco CaaS and Cloud Native Functions (CNF) physical infrastructures. What HPE has not been able to do, is creating software or becoming a credible solutions provider / integrator. The acquisition of Juniper Networks could help solve this. Just like Broadcom's acquisition of VMWare (another early RAN Intelligence leader), or Cisco's acquisition of Accedian, hardware vendors yearn to go up the value chain by acquiring software and automation vendors, giving them the capacity to provide integrated end to end solutions and to achieve synergies and economy of scale through vertical integration.

The playbook is not new, but this potential acquisition could signal a consolidation trend in the telecommunications and networking industry, suggesting a more competitive landscape with fewer but larger players. This could have far-reaching implications for customers, suppliers, and competitors alike.