Showing posts with label Cloud RAN. Show all posts
Showing posts with label Cloud RAN. Show all posts

Thursday, July 31, 2025

The Orchestrator Conundrum strikes again: Open RAN vs AI-RAN

10 years ago (?!) I wrote about the overlaps and potential conflicts of the different orchestration efforts between SDN and NFV. Essentially, observing that, ideally, it is desirable to orchestrate network resources with awareness of services and that service and resource orchestration should have hierarchical and prioritized interactions, so that a service deployment and lifecycle is managed within resource capacity and when that capacity fluctuates, priorities can be enforced.

Service orchestrators have not really been able to be successfully deployed at scale for a variety a reasons, but primarily due to the fact that this control point was identified early on as a strategic effort for network operators and traditional network vendors. A few network operators attempted to create an open source orchestration model (Open Source MANO), while traditional telco equipment vendors developed their own versions and refused to integrate their network functions with the competition. In the end, most of the actual implementation focused on Virtual Infrastructure Management (VIM) and vertical VNF management, while orchestration remained fairly proprietary per vendor. Ultimately, Cloud Native Network Functions appeared and were deployed in Kubernetes inheriting its native resource management and orchestration capabilities.

In the last couple of years, Open RAN has attempted to collapse RAN Element Management Systems (EMS), Self Organizing Networks (SON) and Operation Support Systems (OSS) with the concept of Service Management and Orchestration (SMO). Its aim is to ostensibly provide a control platform for RAN infrastructure and services in a multivendor environment. The non real time RAN Intelligent Controller (RIC) is one of its main artefacts, allowing the deployment of rApps designed to visualize, troubleshoot, provision, manage, optimize and predict RAN resources, capacity and capabilities.

This time around, the concept of SMO has gained substantial ground, mainly due to the fact that the leading traditional telco equipment manufacturers were not OSS / SON leaders and that Orchestration was an easy target for non RAN vendors wanting to find a greenfield opportunity. 

As we have seen, whether for MANO or SMO, the barriers to adoption weren't really technical but rather economic-commercial as leading vendors were trying to protect their business while growing into adjacent areas.

Recently, AI-RAN as emerged as an interesting initiative, positing that RAN compute would evolve from specialized, proprietary and closed to generic, open and disaggregated. Specifically, RAN compute could see an evolution, from specialized silicon to GPU. GPUs are able to handle the complex calculations necessary to manage a RAN workload, with spare capacity. Their cost, however, greatly outweighs their utility if used exclusively for RAN. Since GPUs are used in all sorts of high compute environments to facilitate Machine Learning, Artificial Intelligence, Large and Small Language Models, Models Training and inference, the idea emerged that if RAN deploys open generic compute, it could be used both for RAN workloads (AI for RAN), as well as workloads to optimize the RAN (AI on RAN and ultimately AI/ML workloads completely unrelated to RAN (AI and RAN).

While this could theoretically solve the business case of deploying costly GPUs in hundreds of thousands of cell site, provided that the compute idle capacity could be resold as GPUaaS or AIaaS, this poses new challenges from a service / infrastructure orchestration standpoint. AI RAN alliance is faced with understanding orchestration challenges between resources and AI workloads

In an open RAN environment. Near real time and non real time RICs deploy x and r Apps. The orchestration of the apps, services and resources is managed by the SMO. While not all App could be categorized as "AI", it is likely that SMO will take responsibility for AI for and on RAN orchestration. If AI and RAN requires its own orchestration beyond K8, it is unlikely that it will be in isolation from the SMO.

From my perspective, I believe that the multiple orchestration, policy management and enforcement points will not allow a multi vendor environment for the control plane. Architecture and interfaces are still in flux, specialty vendors will have trouble imposing their perspective without control of the end to end architecture. As a result, it is likely that the same vendor will provide SMO, non real time RIC and AI RAN orchestration functions (you know my feelings about near real time RIC)

If you make the Venn diagram of vendors providing / investing in all three, you will have a good idea of the direction the implementation will take.

Wednesday, April 16, 2025

Is AI-RAN the future of telco?

 AI-RAN has emerged recently as an interesting evolution of telecoms networks. The Radio Access Network (RAN) has been undergoing a transformation over the last 10 years, from a vertical, proprietary highly concentrated market segment to a disaggregated, virtualized, cloud native ecosystem.

Product of the maturation of a number of technologies, including telco cloudification, RAN virtualization and open RAN and lately AI/ML, AI-RAN has been positioned as a means to disaggregate and open up further the RAN infrastructure.

This latest development has to be examined from an economic standpoint. RAN accounts roughly for 80% of a telco deployment (excluding licenses, real estate...) costs. 80% of these costs are roughly attributable to the radios themselves and their electronics. The market is dominated by few vendors and telecom operators are exposed to substantial supply chain risks and reduced purchasing power.

The AI RAN alliance was created in 2024 to accelerate its adoption. It is led by network operators (T-Mobile, Softbank, Boost Mobile, KT, LG Uplus, SK Telecom...) telecom and IT vendors (Nvidia, arm, Nokia, Ericsson Samsung, Microsoft, Amdocs, Mavenir, Pure Storage, Fujitsu, Dell, HPE, Kyocera, NEC, Qualcomm, Red Hat, Supermicro, Toyota...).

If you are familiar with this blog, you already know of the evolution from RAN to cloud RAN and Open RAN, and more recently the forays into RAN intelligence with the early implementations of near and non real time RAN Intelligence Controller (RIC)

AI-RAN goes one step further in proposing that the specialized electronics and software traditionally embedded in RAN radios be deployed on high compute, GPU based commercial off the shelf servers and that these GPUs manage the complex RAN computation (beamforming management, spectrum and power optimization, waveform management...) and double as a general high compute environment for AI/ML applications that would benefit from deployment in the RAN (video surveillance, scene, object, biometrics recognition, augmented / virtual reality, real time digital twins...). It is very similar to the edge computing early market space.

The potential success of AI-RAN relies on a number of techno / economic assumptions:

For Operators:

  • It is desirable to be able to deploy RAN management, analytics, optimization, prediction, automation algorithms in a multivendor environment that will provide deterministic, programmable results.
  • Network operators will be able and willing to actively configure, manage and tune RAN parameters.
  • Deployment of AI-RAN infrastructure will be profitable (combination of compute costs being offloaded by cost reduction by optimization and new services opportunities).
  • AI-RAN power consumption, density, capacity, performance will exceed traditional architectures in time.
  • Network Operator will be able to accurately predict demand and deploy infrastructure in time and in the right locations to capture it.
  • Network Operators will be able to budget the CAPEX / OPEX associated with this investment before revenue materialization.
  • An ecosystem of vendors will develop that will reduce supply chain risks

For vendors:

  • RAN vendors will open their infrastructure and permit third parties to deploy AI applications.
  • RAN vendors will let operators and third parties program the RAN infrastructure.
  • There is sufficient market traction to productize AI-RAN.
  • The rate of development of AI and GPU technologies will outpace traditional architecture.
  • The cost of roadmap disruption and increased competition will be outweighed by the new revenues or is the cost to survive.
  • AI-RAN represents an opportunity for new vendors to emerge and focus on very specific aspects of the market demand without having to develop full stack solutions.

For customers:

  • There will be a market and demand for AI as a Service whereas enterprises and verticals will want to use a telco infrastructure that will provide unique computing and connectivity benefits over on-premise or public cloud solutions.
  • There are AI/ML services that (will) necessitate high performance computing environments, with guaranteed, programmable connectivity with a cost profile that is better mutualized through a multi tenant environment
  • Telcom operators are the best positioned to understand and satisfy the needs of this market
  • Security, privacy, residency, performance, reliability will be at least equivalent to on premise or cloud with a cost / performance benefit. 
As the market develops, new assumptions are added every day. The AI-RAN alliance has defined three general groups to create the framework to validate them: 
  1. AI for RAN: AI to improve RAN performance. This group focuses on how to program and optimize the RAN with AI. The expectations is that this work will drastically reduce the cost of RAN, while allowing sophisticated spectrum, radio waves and traffic manipulations for specific use cases.
  2. AI and RAN: Architecture to run AI and RAN on the same infrastructure. This group must find the multitenant architecture allowing the system to develop into a platform able to host a variety of AI workloads concurrently with the RAN. 
  3. AI on RAN: AI applications to run on RAN infrastructure. This is the most ambitious and speculative group, defining the requirements on the RAN to support the AI workloads that will be defined
As for Telco Edge Computing, and RAN intelligence, while the technological challenges appear formidable, the commercial and strategic implications are likely to dictate whether AI RAN will succeed. Telecom operators are pushing for its implementation, to increase control over spending, and user experience of the RAN, while possibly developing new revenue with the diffusion of AIaaS. Traditional RAN vendors see the nascent technology as further threat to their capacity to sell programmable networks as black boxes, configured, sold and operated by them. New vendors see the opportunity to step into the RAN market and carve out market share at the expense of legacy vendors.

Monday, March 10, 2025

MWC 25 thoughts

 Back from Mobile World Congress 2025!

I am so thankful I get to meet my friends, clients, ex colleagues year after year and to witness how our industry is moving first hand.

2025 was probably my 23rd congress or so and I always find it invaluable for many reasons. 



Innovation from the East

What stood up for me this year was how much innovation is coming from Asian companies, while most Western companies seem to be focusing on cost control. 

The feeling was pervasive throughout the show and the GLOMO awards winners showed Huawei, ZTE, China Mobile, SK, Singtel… investing in discovering and solving problems that many in Western markets dismiss as futuristic or outside their comfort zone. In mature markets, where price attrition is the rule, differentiation is key.

On a related topic, being Canadian, I can’t help thinking that many companies and regulators who looked at the banning of some Chinese vendors from their markets due to security preoccupations are now finding themselves in the situation to evaluate whether American suppliers do not also represent a risk in the future. 

Without delving into politics, I saw and heard many initiatives to enhance security, privacy, sovereignty, either in the cloud or the supply chain categories. 

Open telco APIs

Open APIs and the progress of telco networks APIs is encouraging, but while it is a good idea, it feels late and lacking in comparison with webscalers tooling and offering to discover, consume, and manage network functions on demand. Much work remains to be done in my opinion to enhance the aaS portion of the offering, particularly if slicing APIs are to be offered. 

Open RAN & RIC

Open RAN threat has successfully accelerated cloud and virtualized RAN adoption. Samsung started the trend and Ericsson’s deployment at AT&T has crystalized the mMIMo +CU+DU+non RT RIC from a main vendor and small cells + rApps from others as a viable option. Vodafone’s RAN refresh should see maybe more players into the mix as Mavenir and Nokia are struggling to gain meaningful market share. 

The Juniper / HPE acquisition drama, together with the Broadcom / VMware commercial strategy seem to have killed the idea of an independent Non RT RIC vendor. Near RT RIC, remains in my mind a flawed proposition as host of 3rd party xApps, and as an expensive gadget for anything else than narrow use cases. 

AI

AI of course, was the belle of the ball at MWC. Everyone had a twist, a demo, a model, an agent but few were able to demonstrate utility beyond automated time series regression as predictions or LLM based natural language processing as nauseam…

Some were convincingly starting to show Small Models that were tailored to their technology, topology and network with promising results. It is still early but it feels that this is where the opportunity lies. The creation and curation of a dataset that can be used to plan, manage, maintain, predict the state of one’s network, with bespoke algorithms seems more desirable than the wholesale vague large and poorly trained models. 

Telco Cloud and Edge computing is having a bit of a moment with AI and GPU aaS strategies being enacted.

All in all, many are trying to develop an AI strategy, and while we are still far from the AI-Native Telco Network, there is some progress and some interesting ventures amidst the noise.

Thursday, February 6, 2025

The AI-Native Telco Network VI: Storage


The AI-Native Telco Network I

The AI-Native Telco Network II

The AI-Native Telco Network III

The AI-Native Telco Network IV: Compute

The AI-Native Telco Network V: Network

As it turns out, a network that needs to run AI, either to self optimize or to offer wholesale AI related services needs some adjustments from a conventional telecom network. After looking at the compute and network functions, this post is looking at storage.

Storage has, for the longest time, been an afterthought in telecoms networks. Beyond the IT workloads and the management of data centers, storage needs were usually addressed embedded with the compute functions, sold by server vendors, or when necessary as direct attached storage appliances, usually OEMd or resold by the same vendors.

Today's networks see each network function, whether physical, virtualized or containerized coming with its own dedicated storage. The data generated by each function, whether telemetry, alarm, user, or control plane, logs or event is stored first locally, then a portion is exported to a data lake for cleaning and processing, then eventually a data warehouse, whether on a private or public cloud so that OSS, BSS and analytics functions can provide dashboards on the health, load, usage of the network and recommendations on optimizations.

The extraction, cleaning, and processing of these disparate datasets takes time, anywhere between 30 minutes to hours to accurately represent the network state.

One of the applications of AI/ML in telecoms networks is to optimize the networks reactively when there is an event or proactively when we can plan for a given change. This supposes that a feedback loop is built between the analytics layer and the operational layer, whereas a recommendation to change network parameters can be executed programmatically and automatically.

Speed becomes necessary, particularly to react to unpredicted events. Reducing reaction time if there is an element outage is crucial. This supposes that the state of the network must be observable in near real time, so that the AI/ML engines can detect patterns, anomalies and provide root cause analysis and remediation as fast as possible. The compute applied to these calculations, together with the speed of transmission have a direct effect on the speed, but not only.

Storage, as it turns out is also a crucial element of creating an AI-Native network. The large majority of AI/ML relies on storing data as object, whereas each data element is stored independently, in an unstructured manner, irrespective of size, but with an associated metadata file that describes the data element in details, allowing easy association and manipulation for AI/ML.

Why are traditional storage architectures not suitable for AI-Native Networks?

To facilitate the AI Native network, data element must be extracted from their network functions fast and transferred in a data repository that allows their manipulation at scale. It is easier said than done. Legacy systems have been built originally for block storage (databases and virtual machines, great for low latency, bad for high throughput). Objects are usually not natively supported and are in separate storage. Each vendor supports different protocols and interface, and each store is single tenant to its application.

Data needs to be shared and read by many network functions simultaneously, while they are being processed. Traditional architectures see data stored individually by network functions, then exported to larger databases, then amalgamated in data lakes for processing. The process is lengthy, error-prone and negates the capacity to act/react in real time.

The data sets are increasingly varied, between large and small objects, data streams and files, random and sequential read and write requirements. Legacy storage solutions require different systems for different use cases and data sets. This lengthens further the data amalgamation necessary for automation at scale.

Data needs to be properly labeled, without limitation of metadata, annotation and tags equally for billions of small objects (event records) or very large ones (video files). Traditional storage solutions are designed either for small or large objects and struggle to accommodate both in the same architecture. They also have limitations in the amount of metadata per object. This increases cost and time to insight while reducing their capacity to evolve.

Datasets are live structures. They often exist in different formats and versions for different users. Traditional architectures are not able to handle multiple formats simultaneously, and versions of the same datasets require separate storage elements. This leads to data inconsistencies, corruption and divergence of insight.

Performance is key in AI systems, and it is multidimensional. Storage solutions need to be able to accommodate simultaneously high throughput, scale out capacity and low latency. Traditional storage systems are built for capacity but not designed for high throughput and low latency, which reduces dramatically the performance of data pipelines.

Hybrid and multi cloud become a key requirement for AI, as data needs to be exposed to access, transport, core, OSS/ BSS domains in the edge, the private cloud and the public cloud simultaneously. Traditional storage solutions necessitate adaptation, translation, duplication, and migration to be able to function across cloud boundaries, which significantly increase their cost, while reducing their performance and capabilities.

As we have seen, the data storage architecture for a telecom network becomes a strategic infrastructure decision and the traditional storage solutions cannot accommodate AI and network automation at scale.

Storage Requirements for AI-Native Networks

Perhaps the most important attribute for AI project storage is agility—the ability to grow from a few hundred gigabytes to petabytes, to perform well with rapidly changing mixed workloads, to serve data to training and production clients simultaneously throughout a project’s life, and to support the data models used by project tools.

The attributes of an ideal AI storage solution are: 

Performance Agility

          I/O performance that scales with capacity.

          Rapid manipulation of billions of items, e.g., for randomization during training.

Capacity Flexibility

          Wide range (100s of gigabytes to petabytes) .

          High performance with billions of data items.

          Range of cost points optimized for both active and seldom accessed data.

Availability & Data Durability

          Continuous operation over decade-long project lifetimes.

          Protection of data against loss due to hardware, software, and operational faults.

          Non-disruptive hardware and software upgrade and replacement.

          Seamless data sharing by development, training, and production.

Space and Power Efficiency

          Low space and power requirements that free data center resources for power-hungry computation.

Security

          Strong administrative authentication.

          “Data at rest” encryption.

          Protection against malware (especially ransomware) attacks.

Operational Simplicity

          Non-disruptive modernization for continuous long-term productivity.

          Support for AI projects’ most-used interconnects and protocols.

          Autonomous configuration (e.g. device groups, data placement, protection, etc.).

          Self-tuning to adjust to rapidly changing mixed random/ sequential I/O loads.

Hybrid and Multi Cloud Natively

          Data agility to cross cloud boundaries

          Centralized data lifecycle management

          Decide which data set is stored and processed where

          From edge for inference to private cloud for optimization and automation to public cloud for model training and replication.

Traditional "spinning disk" based storage have not been designed for AI/ML workloads. They lack the performance, agility, cost effectiveness, latency, power consumptions attributes necessary to enable AI networks at scale. Modern storage infrastructure, designed for high performance computing rely on Flash storage, an efficient, cost effective, low power, high performance technology that enables compute and network elements to perform at line rate for AI workloads.

Tuesday, January 28, 2025

The AI-Native Telco Network V: Network


The AI-Native Telco Network I

The AI-Native Telco Network II

The AI-Native Telco Network III

The AI-Native Telco Network IV: Compute

As we have seen in previous posts, AI and the journey to autonomous networks forces telco operators to look at their network architecture and reevaluate whether their infrastructure is fit for this purpose. In many cases, the first reflex for them is to deploy new servers and GPUs in AI dedicated pods and to find out that processing power itself is not enough for a high performance AI system. The network connectivity needs to be accelerated as well.

SmartNICs

While dedicated routing and packet processing are necessary, one way to increase performance of an AI pod is to deploy accelerators in the shape of Smart Network Interface Cards (SmartNICs).

SmartNICs are specialized network cards designed to offload certain networking tasks from the CPU and provide additional processing power at the network edge. Unlike traditional NICs, which merely serve as communication devices, SmartNICs come equipped with onboard processing capabilities such as CPUs, ASICs, FPGAs or programmable processors. These capabilities allow SmartNICs to handle packet processing, traffic management, and other networking tasks, without burdening the CPU.

While they are certainly hybrid compute / network dedicated silicon, they accelerate overall performance by offloading packet processing, user plane functions, load balancing, etc. from the CPUs and GPUs that can be freed up for pure AI workload processing.

For telecom providers, SmartNICs offer a way to improve network efficiency while simultaneously boosting the ability to handle AI workloads in real-time.

High-Speed Ethernet

One of the most straightforward ways to increase network speed is by adopting higher bandwidth Ethernet standards. Traditional networks may rely on 10GbE or 25GbE, but AI workloads benefit from faster connections, such as 100GbE or even 400GbE, which provide higher throughput and lower latency.

AI models, especially large deep learning models, require massive data transfer between nodes. Upgrading to 100GbE or 400GbE can drastically improve the speed at which data is exchanged between GPUs, CPUs, and storage systems in an AI pod, reducing the time required to train models and increasing throughput.

AI models often need to pull vast amounts of training data from storage. Higher-speed Ethernet allows AI pods to access data more quickly, decreasing bottlenecks in I/O.

Use Low-Latency Networking Protocols

Adopting advanced networking protocols such as InfiniBand or RoCE (RDMA over Converged Ethernet) is essential to reduce latency in AI pods. These protocols are designed to enable faster communication between nodes by bypassing traditional network stacks and reducing the overhead that can slow down AI workloads.

InfiniBand and RoCE provide extremely low-latency communication between AI pods, which is crucial for high-performance AI training and inference.
These protocols support higher bandwidths (up to 200Gbps or more) and provide more efficient communication channels, ideal for high-throughput AI workloads like distributed deep learning.

To increase AI performance, telecom operators need to focus on upgrading their network infrastructure to support the growing demands of AI workloads. By implementing strategies such as high-speed Ethernet, SmartNICs, and specialized AI interconnects, operators can enhance the speed, scalability, and efficiency of their AI pods. This enables faster processing of large datasets, reduced latency, and improved overall performance for AI training and inference, allowing telecom operators to stay ahead in the competitive AI-driven landscape.
Storage, we will see in the next post, plays also an integral part in AI performance on a telecom network.

Thursday, January 23, 2025

The AI-Native Telco Network IV: Compute

The AI-Native Telco Network I

 The AI-Native Telco Network II

 The AI-Native Telco Network III

As we have seen in previous posts, to accommodate and make use of AI at scale, a network must be tuned and architected for this purpose. While any telco network can deploy AI in discrete environments or throughout its fabric, the difference between a Data strategy and an AI strategy is speed + feedback loop.

Most Data collected in a telco network has been used for very limited purpose. Mainly archiving for forensics to determine the root cause of an anomaly or outage, charging and customer management functions or for legal interception or regulatory requirements. For these use cases, Data needs to be properly formatted and laid to rest until analytics engines can provide a representation of the state of the network or an account. Speed is not an issue here, the system can suffer minutes or hour delays before a coherent picture is formed and represented.

AI altogether can provide better insight through larger datasets than classical analytics. It provides better capacity to correlate events and to predict the evolution of the network state. It can also propose optimization, enhancements, mitigation recommendations, but to be truly effective, it needs to be able to have feedback loop to the network functions, so that these recommendations can be turned into actions and automated.


Herein lies the trick. If you want to run AI in your network, so that you can automate it, allowing it to reactively or proactively auto scale, heal, optimize its performance, power consumption, cost, etc... at scale, it cannot be done manually. Automation is necessary throughout. Speed from event, anomaly, pattern, insight detection to action becomes key.

As we have seen, speed is the product of high performance, low latency in the production, extraction, storage, and processing of data to create actionable insights that can be automated. At the fabric layer, compute, connectivity and storage are the elements that need to be properly designed to enable the speed to run AI.

In this post, we will look at the compute function. Processing, analyzing, manipulating Data requires computing capabilities. There are different architectures of computing units for different purposes.

  • The CPU (Central Processing Units) are general purpose computing, suitable for serial tasks. Multiple CPU Cores can work in parallel to enhance performance. Suitable for most telecoms functions, except real time processing. Generic CPUs are used in most telco data centers and clouds for most telco functions, from OSS, BSS to Core and transport. At the edge and the RAN, CPUs are used for Centralized Unit functions.
  • ASICs (Application Specific Integrated Circuits) are CPUs that have been designed for specific tasks or applications. They are not as versatile as other processing units but deliver the absolute highest performance in smallest footprint for specific applications. They can be found in first generation Open RAN servers to run Distributed Unit functions, as well as in specialized packet routing and packet switching (more on that in the connectivity post).
  • FPGA (Field Programmable Gate Arrays) are CPUs that can be programmed to adapt to specific workloads without necessitating complete redesign. They provide a good balance between adaptability and performance and are suitable for cryptographic and rapid data processing. They are used in telco networks in security gateways, as well as advanced routing and packet processing functions.
  • GPUs (Graphics Processing Units) feature large numbers of smaller cores, coupled with high memory bandwidth making them suitable for graphics processing and large number of parallel matrix calculations. In telco network, GPUs are starting to be introduced for AI / ML workloads in data centers and clouds (neural networks and model training), as well as in the RAN for the Distributed Unit and RAN Intelligent Controller.
  • TPUs (Tensor Processing Units) are Google's specialized processing units optimized for Tensor processing of ML and deep learning model training and inference. They are not yet used in Telco environments but can be used on Google Cloud in a hybrid scenario.
  • NPUs (Neural Processing Units) are designed for Neural Networks for deep learning processing. They are very suitable for inference tasks as their power consumption and footprint are very small. They start to appear in telco networks at the edge, and in devices.

Artificial Intelligence, Machine Learning can run on any of the above computing platform. The difference is the performance, footprint, cost and power consumption profile. We have seen lately the emergence of GPUs as the new processing unit poised to replace CPUs, ASICs and FPGAs in specialized traffic functions, using the RAN and AI as its beachhead. GPUs are key in running AI workloads at scale , delivering the performance in terms of low latency and high throughput necessary for rapid time to insight.

Their cost and power consumption forces network operators to find the right balance between the number of GPUs and their placement throughout the network, to enable both high processing power necessary for model training, in the private cloud, together with low latency for rapid inferencing and automation at the edge. While this architecture might provide the best basis for an automated or autonomous network, its cost and the rapid rate of change in GPU generations might give most a pause.

The main challenge becomes the selection of compute architecture that can provide the most capacity, speed, while remaining cost effective to procure and run. For this reason, many telco operators have decided to centralize in a first step their GPU farms, to fine tune their use cases, with limited decentralized deployments. Another avenue for exploration is the wholesaling of the compute capacity to reduce internal costs. We have seen a few GPUaaS and AIaaS initiatives recently announced.

In any cases, most operators who have deployed high capacity AI pods with GPUs, find that the performance of the overall system requires further refinement and look at connectivity as the next step in their AI-Native network journey. That will be the theme of our next post.

Wednesday, July 3, 2024

June 2024 Open RAN requirements from Vodafone, Telefonica, Deutsche Telekom, Tim and Orange


 As is now customary, the "big 5" European operators behind open RAN release their updated requirements to the market, indicating to vendors where they should direct their roadmaps to have the most chances to be selected in these networks.

As per previous iterations, I find it useful to compare and contrast the unanimous and highest priority requirements as indications of market maturity and directions. Here is my read on this year's release:

Scenarios:

As per last year, the big 5 unanimously require support for O-RU and vDU/CU with open front haul interface on site for macro deployments. This indicates that although the desire is to move to a disaggregated implementation, with vDU / CU potentially moving to the edge or the cloud, all the operators are not fully ready for these scenario and prioritize first a deployment like for like of a traditional gnodeB with a disaggregated virtualized version, but all at the cell site. 

Moving to the high priority scenarios requested by a majority of operators, vDU/vCU in a remote site with O-RU on site makes its appearance, together for RAN sharing. Both MORAN and MOCN scenarios are desirable, the former with shared O-RU and dedicated vDU/vCU and the latter with shared O-RU, vDU and optionally vCU. In all cases, RAN sharing management interface is to be implemented to allow host and guest operators to manage their RAN resource independently.

Additional high priority requirements are the support for indoor and outdoor small cells. Indoor sharing O-RU and vDU/vCU in multi operator environments, outdoors with single operator with O-RU and vDU either co-located on site or fully integrated with Higher Layer Split. The last high priority requirement is for 2G /3G support, without indication of architecture.

Security:

The security requirements are sensibly the same as last year, freely adopting 3GPP requirements for Open RAN. The polemic around Open RAN's level of security compared to other cloud virtualized applications or traditional RAN architecture has been put to bed. Most realize that open interfaces inherently open more attack surfaces, but this is not specific to Open RAN, every cloud based architecture has the same drawback. Security by design goes a long way towards alleviating these concerns and proper no trust architecture can in many cases provide a higher security posture than legacy implementations. In this case, extensive use of IPSec, TLS 1.3, certificates at the interfaces and port levels for open front haul and management plane provide the necessary level of security, together with the mTLS interface between the RICs. The O-Cloud layer must support Linux security features, secure storage, encrypted secrets with external storage and management system.

CaaS:

As per last year, the cloud native infrastructure requirements have been refined, including Hardware Accelerator (GPU, eASIC) K8 support, Block and Object Storage for dedicated and hyper converged deployments, etc... Kubernetes infrastructure discovery, deployment, lifecycle management and cluster configuration has been further detailed. Power saving specific requirements have been added, at the Fan, CPU level with SMO driven policy and configuration and idle mode power down capabilities.

CU / DU:

CU DU interface requirements remain the same, basically the support for all open RAN interfaces (F1, HLS, X2, Xn, E1, E2, O1...). The support for both look aside and in-line accelerator architecture is also the highest priority, indicating that operators havent really reached a conclusion for a preferable architecture and are mandating both for flexibility's sake (In other words, inline acceleration hasn't convinced them that it can efficiently (cost and power) replace look aside). Fronthaul ports must support up to 200Gb by 12 x 10/25Gb combinations and mid haul up to 2 x 100Gb. Energy efficiency and consumption is to be reported for all hardware (servers, CPUs, fans, NIC cards...). Power consumption targets for D-RAN of 400Watts at 100% load for 4T4R and 500 watts for 64T64R are indicated. These targets seem optimistic and poorly indicative of current vendors capabilities in that space.

O-RU:

The radio situation is still messy and my statements from last year still mostly stand: "While all operators claim highest urgent priority for a variety of Radio Units with different form factors (2T2R, 2T4R, 4T4R, 8T8R, 32T32R, 64T64R) in a variety of bands (B1, B3, B7, B8, B20, B28B, B32B/B75B, B40, B78...) and with multi band requirements (B28B+B20+B8, B3+B1, B3+B1+B7), there is no unanimity on ANY of these. This leads vendors trying to find which configurations could satisfy enough volume to make the investments profitable in a quandary. There are hidden dependencies that are not spelled out in the requirements and this is where we see the limits of the TIP exercise. Operators cannot really at this stage select 2 or 3 new RU vendors for an open RAN deployment, which means that, in principle, they need vendors to support most, if not all of the bands and configurations they need to deploy in their respective network. Since each network is different, it is extremely difficult for a vendor to define the minimum product line up that is necessary to satisfy most of the demand. As a result, the projections for volume are low, which makes the vendors only focus on the most popular configurations. While everyone needs 4T4R or 32T32R in n78 band, having 5 vendor providing options for these configurations, with none delivering B40 or B32/B75 makes it impossible for operators to select a single vendor and for vendors to aggregate sufficient volume to create a profitable business case for open RAN." This year, there is one configuration of high priority that has unanimous support: 4T4R B3+B1. The other highest priority configurations requested by a majority of operators are 2T4R B28B+B20+B8, 4T4R B7, B3+B1, B32B+B75B, and 32T32R B78 with various power targets from 200 to 240W.

Open Front Haul:

The Front Haul interface requirements only acknowledge the introduction of Up Link enhancements for massive MIMO scenarios as they will be introduced to the 7.2.x specification, with a lower priority. This indicates that while Ericsson's proposed interface and architecture impact is being vetted, it is likely to become an optional implementation, left to the vendor's s choice until / unless credible cost / performance gains can be demonstrated.

Transport:

Optical budgets and scenarios are now introduced.

RAN features:

Final MoU positions are now proposed. Unanimous items introduced in this version revolve mostly around power consumption and efficiency counters, KPIs and mechanisms. other new requirements introduced follow 3GPP rel 16 and 17 on carrier aggregation, slicing and MIMO enhancements.

Hardware acceleration:

a new section introduced to clarify the requirements associated with L1 and L2 use of look aside and inline. The most salient requirement is for multi RAT 4G/5G simultaneous support.

Near RT RIC:

The Near Real Time RIC requirements continue to evolve and be refined. My perspective hasn't changed on the topic. and a detailed analysis can be found here. In short letting third party prescribe policies that will manipulate the DU's scheduler is anathema for most vendors in the space and, beyond the technical difficulties would go against their commercial interests. operators will have to push very hard with much commercial incentive to see xapps from 3rd party vendors being commercially deployed.

E2E use cases:

End-to-end use cases are being introduced to clarify the operators' priorities for deployments. There are many  but offer a good understanding of their priorities. Traffic steering for dynamic traffic load balancing, QoE and QoS based optimization, to optimize resource allocation based on a desired quality outcome... RAN sharing, Slice assurance, V2x, UAV, energy efficiency... this section is a laundry list of desiderata , all mostly high priority, showing here maybe that operators are getting a little unfocused on what real use cases they should focus on as an industry. As a result, it is likely that too many priorities result in no priority at all.

SMO

With over  260 requirements, SMO and non RT RIC is probably a section that is the most mature and shows a true commercial priority for the big 5 operators.

All in all, the document provides a good idea of the level of maturity of Open RAN for the the operators that have been supporting it the longest. The type of requirements, their prioritization provides a useful framework for vendors who know how to read them.

More in depth analysis of Open RAN and the main vendors in this space is available here.


Thursday, May 2, 2024

How to manage mobile video with Open RAN

Ever since the launch of 4G, video has been a thorny issue to manage for network operators. Most of them had rolled out unlimited or generous data plans, without understanding how video would affect their networks and economics. Most videos streamed to your phones use a technology called Adaptive Bit Rate (ABR), which is supposed to adapt the video’s definition (think SD, HD, 4K…) to the network conditions and your phone’s capabilities. While this implementation was supposed to provide more control in the way videos were streamed on the networks, in many cases it had a reverse effect.

 

The multiplication of streaming video services has led to ferocious competition on the commercial and technological front. While streaming services visibly compete on their pricing and content attractiveness, a more insidious technological battle has also taken place. The best way to describe it is to compare video to a gas. Video will take up as much capacity in the network as is available.

When you start a streaming app on your phone, it will assess the available bandwidth and try to deliver the highest definition video available. Smartphone vendors and streaming providers try to provide the best experience to their users, which in most cases means getting the highest bitrate available. When several users in the same cell try to stream video, they are all competing for the available bandwidth, which leads in many cases to a suboptimal experience, as some users monopolize most of the capacity while others are left with crumbs.

 

In recent years, technologies have emerged to mitigate this issue. Network slicing, for instance, when fully implemented could see dedicated slices for video streaming, which would theoretically guarantee that video streaming does not adversely impact other traffic (video conferencing, web browsing, etc…). However, it will not resolve the competition between streaming services in the same cell.

 

Open RAN offers another tool for efficiently resolving these issues. The RIC (RAN Intelligent Controller) provides for the first time the capability to visualize in near real time a cell’s congestion and to apply optimization techniques with a great level of granularity. Until Open RAN, the means of visualizing network congestion were limited in a multi-vendor environment and the means to alleviate them were broad and coarse. The RIC allows to create policies at the cell level, on a per connection basis. Algorithms allow traffic type inference and policies can be enacted to adapt the allocated bandwidth based on a variety of parameters such as signal strength, traffic type, congestion level, power consumption targets…

 

For instance, an operator or a private network for stadiums or entertainment venues could easily program their network to not allow upstream videos during a show, to protect broadcasting or intellectual property rights. This can be easily achieved by limiting the video uplink traffic while preserving voice, security and emergency traffic.

 

Another example would see a network actively dedicating deterministic capacity per connection during rush hour or based on threshold in a downtown core to guarantee that all users have access to video services with equally shared bandwidth and quality.

 

A last example could see first responder and emergency services get guaranteed high-quality access to video calls and broadcasts.

 

When properly integrated into a policy and service management framework for traffic slicing, Open RAN can be an efficient tool for adding fine grained traffic optimization rules, allowing a fairer apportioning of resource for all users, while preserving overall quality of experience.

 

Wednesday, March 27, 2024

State of Open RAN 2024: Executive Summary

 

The 2023 Open RAN market ended with a bang with AT&T awarding to Ericsson and Fujitsu a $14 billion deal to convert 70% of its traffic to run on Open RAN by end of 2026. 2024 started equally loud with the $13 billion acquisition of Juniper Networks from HPE on the thesis of the former company’s progress in telecoms AI and specifically in RAN intelligence with the launch of their RIC program.

2023 also saw the long-awaited launch of Drillish 1&1 in Germany, the first Open RAN greenfield in Europe, as well as the announcement from Vodafone that they will release a RAN RFQ that will see 30% of its 125,000 global sites dedicated to Open RAN.

Commercial deployments are now under way in western Europe, spurred by Huawei replacement mandates.

On the vendor’s front, Rakuten Symphony seems to have markedly failed to capitalize on Altiostar’s acquisition and convince brownfield network operators to purchase telecom gear from a fellow network operator. While Ericsson has announced its support for Open RAN with conditions, Samsung has been the vendor making the most progress with convincing market share growth across the geographies it covers. Mavenir has been steadily growing. A new generation of vendors have taken advantage of the Non-Real-Time RIC / SMO opportunity to enter the space. Non-traditional RAN vendors such as VMWare and Juniper Networks or SON vendors like Airhop have grown the most in that space, together with pure new entrants App players such as Rimedo Labs. With the acquisition of VMWare and Juniper Networks, both leaders in the RIC segment, 2024 could be live or die for this category, as the companies are reevaluating their priorities and aligning commercial interest with their acquirers.

On the technology side, the O-RAN alliance has continued its progress, publishing new releases while establishing bridgeheads with 3GPP and ETSI to facilitate the inclusion of Open RAN in the mainstream 5G advanced and 6G standards. The accelerator debate between inline and look aside architectures has died down, with the first layer 1 abstraction layers allowing vendors to effectively deploy on different silicon with minimal adjustment. Generative AI and large language models have captured the industry’s imagination and Nvidia has been capitalizing on the seemingly infinite appetite for specialized computing in cloud and telecom networks.

This report provides an exhaustive review of the key technology trends, vendors product offering, and strategies, ranging from silicon, servers, cloud CaaS, Open RUs, DU, CUs, RICs, apps and SMOs in the open RAN space in 2024.

Tuesday, March 19, 2024

Why are the US government and DoD in particular interested in Open RAN?

Over the last 24 months, it has been very interesting to see that the US Government has been moving from keen interest in Open RAN to make it policy for its procurement of connectivity technology.

As I am preparing to present for next week's RIC Forum, organized by NTIA and the US Department of Defense, many of my clients have been asking why the US Government seems so invested in Open RAN.

Supply chain diversification:

The first reason for this interest is the observation that the pool of network equipment provider has been growing increasingly shallow. The race from 3G to 4G to 5G has required vendors to attain a high level of industrialization and economy of scale, that has been achieved through many rounds of concentration. A limited supply chain with few vendors per category represents a strategic risk for the actor relying on this supply chain to operate economically. Open RAN allows the emergence of new vendors in specific categories that do not necessitate the industrial capacity to be delivering end to end RAN networks.

Cost effectiveness:

The lack of vendor choice has shifted negotiating power from network operators to vendors, which has negatively impacted margins and capacity to make changes. The emergence of new Open RAN vendors puts pressure on incumbents and traditional vendors to reduce their margins.

Geostrategic interest:

The growth of Huawei, ZTE and other Chinese vendors, with their suspected links to the Chinese government and Army, together with the somewhat obscure privacy and security laws there, has prompted the US government and many allies to ban or severely restraint the categories of Telecom Products that can be deployed in many telecom networks.

Furthermore, while US companies dominate traffic management, routing, data centers and hyperscalers space, the RAN, core network and general telco infrastructure remains dominated by European and Asian vendors. Open RAN has been an instrument to facilitate and accelerate Chinese vendors replacement, but also to stimulate the US vendors to emerge and grow.

DoD use case example: Spectrum Dominance

This area is less well understood and recognized but is an integral part of US Government generally and DoD's in particular interest in Open RAN. Private networks require connectivity products adapted for specific use cases, devices and geographies. Commercial macro networks offer "one size fits all" solution that are difficult and costly to adapt for that purpose. Essentially DoD runs hundreds of private networks, whether on its bases, its carriers or in ad hoc tactical environments. Being able to setup a secure, programmable, cost effective network, either permanently or ad hoc is an essential requirement, and can also become a differentiator or a force multiplier. A tactical unit deploying an ad hoc network might look at means not only to create a secure subnet, but also to establish spectrum dominance by manipulating waveforms and effectively interfering with adverse networks. This is one example where programmability at the RAN level can turn into an asset for battlefield dominance. There are many more use cases, but their classification might not enable us to publicly comment them. They illustrate though how technological dominance can extend to every aspect of telecom.

Open RAN in that respect provides programmability, cost effectiveness and modularity to create fit for purpose connectivity experiences in a multi vendor environment.


Tuesday, January 9, 2024

HPE acquires Juniper Networks


On January 8, the first rumors started to emerge that HPE was entering final discussions to acquire Juniper Networks for $13b. By January 9th, HPE announced that they have entered into a definitive agreement for the acquisition.

Juniper Networks, known for its high-performance networking equipment, has been a significant player in the networking and telecommunications sector. It specializes in routers, switches, network management software, network security products, and software-defined networking technology. HPE, on the other hand, has a broad portfolio that includes servers, storage, networking, consulting, and support services.

 The acquisition of Juniper Networks by HPE could be a strategic move to strengthen HPE’s position in the networking and telecommunications sector, diversify its product offerings, and enhance its ability to compete with other major players in the market such as Cisco.

Most analysis I have read so far have pointed out AIOps and Mist AI as the core thesis for acquisition, enabling HPE to bridge the gap between equipment vendor and solution vendor, particularly in the Telco space.

While this is certainly an aspect of the value that Juniper Networks would provide to HPE, I believe that the latest progress from Juniper Networks in Telco beyond transport, particularly as an early leader in the emerging field of RAN Intelligence and SMO (Service Management and Orchestration) was a key catalyst in HPE's interest.

After all, Juniper Networks has been a networking specialist and leader for a long time, from SDN, SD WAN, Optical to data center, wired and wireless networks. While the company has been making great progress there, gradually virtualizing  and cloudifying its routers, firewalls and gateway functions, no revolutionary technology has emerged there until the application of Machine Learning and predictive algorithms to the planning, configuration, deployment and management of transport networks.

What is new as well, is Juniper Networks' efforts to penetrate the telco functions domains, beyond transport. The key area ready for disruption has been the Radio Access Network (RAN), specifically with Open RAN becoming an increasingly relevant industry trend to pervade networks architecture, culminating with AT&T's selection of Ericsson last month to deploy Open RAN for $14B.

Open RAN offers disaggregation of the RAN, with potential multivendor implementations, benefitting from open standard interfaces. Juniper Networks, not a traditional RAN vendor, has been quick to capitalize on its AIOps expertise by jumping on the RAN Intelligence marketspace, creating one of the most advanced RAN Intelligent Controller (RIC) in the market and aggressively integrating with as many reputable RAN vendors as possible. This endeavor, opening up the multi billion $ RAN and SMO markets is pure software and heavily biased towards AI/ML for automation and prediction.

HPE has been heavily investing in the telco space of late, becoming a preferred supplier of Telco CaaS and Cloud Native Functions (CNF) physical infrastructures. What HPE has not been able to do, is creating software or becoming a credible solutions provider / integrator. The acquisition of Juniper Networks could help solve this. Just like Broadcom's acquisition of VMWare (another early RAN Intelligence leader), or Cisco's acquisition of Accedian, hardware vendors yearn to go up the value chain by acquiring software and automation vendors, giving them the capacity to provide integrated end to end solutions and to achieve synergies and economy of scale through vertical integration.

The playbook is not new, but this potential acquisition could signal a consolidation trend in the telecommunications and networking industry, suggesting a more competitive landscape with fewer but larger players. This could have far-reaching implications for customers, suppliers, and competitors alike.


Monday, December 4, 2023

Is this the Open RAN tipping point: AT&T, Ericsson, Fujitsu, Nokia, Mavenir


The latest publications around Open RAN deliver a mixed bag of progress and skepticism. How to interpret these conflicting information?

A short retrospective of the most recent news:

On the surface, Open RAN seems to be benefiting from a strong momentum and delivering on its promise of disrupting traditional RAN with the introduction of new suppliers, together with the opening of traditional architecture to a more disaggregated and multi vendor model. The latest announcement from AT&T and Ericsson even would point out that the promise of reduced TCO for brownfield deployments is possible:
AT&T's yearly CAPEX guidance is supposed to reduce from a high of ~$24B to about 20B$ per year starting in 2024. If the 14B$ for 5 years spent on Ericsson RAN yield the announced 70% of traffic on Open RAN infrastructure, AT&T might have dramatically improved their RAN CAPEX with this deal.

What is driving these announcements?

For network operators, Open RAN has been about strategic supply chain diversification. The coalescence of the market into an oligopoly, and a duopoly after the exclusion of Chinese vendors to a large number of Western Networks has created an unfavorable negotiating position for the carriers. The business case of 5G relies heavily on declining costs or rather a change in the costs structure of deploying and operating networks. Open RAN is an element of it, together with edge computing and telco clouds.

For operators

The decision to move to Open RAN is mostly not any longer up for debate. While the large majority of brownfield networks will not completely transition to Open RAN they will introduce the technology, alongside the traditional architecture, to foster cloud native networks implementations. It is not a matter of if but a matter of when.
When varies for each market / operator. Operators do not roll out a new technology just because it makes sense even if the business case is favorable. A window of opportunity has to present itself to facilitate the introduction of the new technology. In the case of Open RAN, the windows can be:
  • Generational changes: 4G to 5G, NSA to SA, 5G to 6G
  • Network obsolescence: the RAN contracts are up for renewal, the infrastructure is aging or needs a refresh. 
  • New services: private networks, network slicing...
  • Internal strategy: transition to cloud native, personnel training, operating models refresh
  • Vendors weakness: Nothing better than an end of quarter / end of year big infrastructure bundle discount to secure and alleviate the risks of introducing new technologies

For traditional vendors

For traditional vendors, the innovator dilemma has been at play. Nokia has endorsed Open RAN early on, with little to show for it until recently, convincingly demonstrating multi vendor integration and live trials. Ericsson, as market leader has been slower to endorse Open RAN has so far adopted it selectively, for understandable reasons.

For emerging vendors

Emerging vendors have had mixed fortunes with Open RAN. The early market leader, Altiostar was absorbed by Rakuten which gave the market pause for ~3 years, while other vendors caught up. Mavenir, Samsung, Fujitsu and others offer credible products and services, with possible multi vendors permutations. 
Disruptors, emerging and traditional vendors are all battling in RAN intelligence and orchestration market segment, which promises to deliver additional Open RAN benefits (see link).


Open RAN still has many challenges to circumvent to become a solution that can be adopted in any network, but the latest momentum seem to show progress for the implementation of the technology at scale.
More details can be found through my workshops and advisory services.