{Core Analysis}: programmable networks

Showing posts with label programmable networks. Show all posts

Wednesday, September 10, 2025

The 6G promise

As I attend TMForum Innovate Americas in Dallas, AI, automation and autonomous networks dominate the debates. I have long held the belief that the promise of 5G to deliver adapted connectivity to different organizations, industries, verticals and market segment was necessary for network operators to create sustainable differentiation. At Telefonica, nearly 10 years ago, I was positing that network slicing would only be useful if we were able to deliver hundreds or thousands or slices.

One of the key insights came from interactions with customers in the automotive, banking and manufacturing industries. The CIOs from these large organizations don’t want to be sold connectivity products. They don’t want the network operator to create and configure the connectivity experience.

The CIOs from Mercedes, Ford, Magna know better what their connectivity needs are and what kind of slices would be useful than the network operators serving them. They don’t want to have to spend time educating their providers so that they can design a service for them. They don’t want to outsource the optimization of their connectivity to a third party who doesn’t understand their evolving needs.

The growth in private networks implementations in healthcare, energy, mining, transportation and ports for instance, is a sign that there is demand in dedicated, customized connectivity products. It is also a sign that network operators have failed so far to build the slicing infrastructure and capacity to serve these use cases.

As a result, I proposed that network operators should focus on creating a platform for industries to discover, configure and consume connectivity services. This vision had a lot of prerequisites. Networks need to evolve and adopt network virtualization through separation of hardware and software, cloud native functions, centralized orchestration, stand-alone core, network slicing, the building of the platform and API exposure…

A lot of progress has been made in all these categories, to the point that we see emerging the first dedicated slicing solutions for first responders, defense and industries. These slices are still mostly statically provisioned and managed by the network operators, but they will gradually grow.

The largest issue for evolving from static to dynamic slicing and therefore moving from network operated to as a service user configurable is managing conflicts between the slices. Dedicating static capacity for each slice is inefficient and too cost prohibitive to implement at scale except for the largest governmental use cases. Dynamic slicing creation and management requires network observability, jointly with near real time capacity prediction, reservation, and attribution.

This is where AI can provide the missing step to enable dynamic slicing for network as a service. If you can extract data from the user device, network telemetry and functions fast enough to be made available to algorithms for pattern identification in near real time, you can identify the device, user, industry, service and create the best fit connectivity, whether for a gaming console connected to a 4K TV in FWA, a business user on a video conference call, industrial collaborating robots assembling a vehicle, or a drone delivering a package.

All these use cases have different connectivity needs that are today either served by best effort undifferentiated connectivity or rigidly rule-based private networks.

As 6G is starting to emerge, will it fulfil the 5G promises and deliver curated connectivity experiences?

Thursday, February 6, 2025

The AI-Native Telco Network VI: Storage

The AI-Native Telco Network I

The AI-Native Telco Network II

The AI-Native Telco Network III

The AI-Native Telco Network IV: Compute

The AI-Native Telco Network V: Network

As it turns out, a network that needs to run AI, either to self optimize or to offer wholesale AI related services needs some adjustments from a conventional telecom network. After looking at the compute and network functions, this post is looking at storage.

Storage has, for the longest time, been an afterthought in telecoms networks. Beyond the IT workloads and the management of data centers, storage needs were usually addressed embedded with the compute functions, sold by server vendors, or when necessary as direct attached storage appliances, usually OEMd or resold by the same vendors.

Today's networks see each network function, whether physical, virtualized or containerized coming with its own dedicated storage. The data generated by each function, whether telemetry, alarm, user, or control plane, logs or event is stored first locally, then a portion is exported to a data lake for cleaning and processing, then eventually a data warehouse, whether on a private or public cloud so that OSS, BSS and analytics functions can provide dashboards on the health, load, usage of the network and recommendations on optimizations.

The extraction, cleaning, and processing of these disparate datasets takes time, anywhere between 30 minutes to hours to accurately represent the network state.

One of the applications of AI/ML in telecoms networks is to optimize the networks reactively when there is an event or proactively when we can plan for a given change. This supposes that a feedback loop is built between the analytics layer and the operational layer, whereas a recommendation to change network parameters can be executed programmatically and automatically.

Speed becomes necessary, particularly to react to unpredicted events. Reducing reaction time if there is an element outage is crucial. This supposes that the state of the network must be observable in near real time, so that the AI/ML engines can detect patterns, anomalies and provide root cause analysis and remediation as fast as possible. The compute applied to these calculations, together with the speed of transmission have a direct effect on the speed, but not only.

Storage, as it turns out is also a crucial element of creating an AI-Native network. The large majority of AI/ML relies on storing data as object, whereas each data element is stored independently, in an unstructured manner, irrespective of size, but with an associated metadata file that describes the data element in details, allowing easy association and manipulation for AI/ML.

Why are traditional storage architectures not suitable for AI-Native Networks?

To facilitate the AI Native network, data element must be extracted from their network functions fast and transferred in a data repository that allows their manipulation at scale. It is easier said than done. Legacy systems have been built originally for block storage (databases and virtual machines, great for low latency, bad for high throughput). Objects are usually not natively supported and are in separate storage. Each vendor supports different protocols and interface, and each store is single tenant to its application.

Data needs to be shared and read by many network functions simultaneously, while they are being processed. Traditional architectures see data stored individually by network functions, then exported to larger databases, then amalgamated in data lakes for processing. The process is lengthy, error-prone and negates the capacity to act/react in real time.

The data sets are increasingly varied, between large and small objects, data streams and files, random and sequential read and write requirements. Legacy storage solutions require different systems for different use cases and data sets. This lengthens further the data amalgamation necessary for automation at scale.

Data needs to be properly labeled, without limitation of metadata, annotation and tags equally for billions of small objects (event records) or very large ones (video files). Traditional storage solutions are designed either for small or large objects and struggle to accommodate both in the same architecture. They also have limitations in the amount of metadata per object. This increases cost and time to insight while reducing their capacity to evolve.

Datasets are live structures. They often exist in different formats and versions for different users. Traditional architectures are not able to handle multiple formats simultaneously, and versions of the same datasets require separate storage elements. This leads to data inconsistencies, corruption and divergence of insight.

Performance is key in AI systems, and it is multidimensional. Storage solutions need to be able to accommodate simultaneously high throughput, scale out capacity and low latency. Traditional storage systems are built for capacity but not designed for high throughput and low latency, which reduces dramatically the performance of data pipelines.

Hybrid and multi cloud become a key requirement for AI, as data needs to be exposed to access, transport, core, OSS/ BSS domains in the edge, the private cloud and the public cloud simultaneously. Traditional storage solutions necessitate adaptation, translation, duplication, and migration to be able to function across cloud boundaries, which significantly increase their cost, while reducing their performance and capabilities.

As we have seen, the data storage architecture for a telecom network becomes a strategic infrastructure decision and the traditional storage solutions cannot accommodate AI and network automation at scale.

Storage Requirements for AI-Native Networks

Perhaps the most important attribute for AI project storage is agility—the ability to grow from a few hundred gigabytes to petabytes, to perform well with rapidly changing mixed workloads, to serve data to training and production clients simultaneously throughout a project’s life, and to support the data models used by project tools.

The attributes of an ideal AI storage solution are:

Performance Agility

• I/O performance that scales with capacity.

• Rapid manipulation of billions of items, e.g., for randomization during training.

Capacity Flexibility

• Wide range (100s of gigabytes to petabytes) .

• High performance with billions of data items.

• Range of cost points optimized for both active and seldom accessed data.

Availability & Data Durability

• Continuous operation over decade-long project lifetimes.

• Protection of data against loss due to hardware, software, and operational faults.

• Non-disruptive hardware and software upgrade and replacement.

• Seamless data sharing by development, training, and production.

Space and Power Efficiency

• Low space and power requirements that free data center resources for power-hungry computation.

Security

• Strong administrative authentication.

• “Data at rest” encryption.

• Protection against malware (especially ransomware) attacks.

Operational Simplicity

• Non-disruptive modernization for continuous long-term productivity.

• Support for AI projects’ most-used interconnects and protocols.

• Autonomous configuration (e.g. device groups, data placement, protection, etc.).

• Self-tuning to adjust to rapidly changing mixed random/ sequential I/O loads.

Hybrid and Multi Cloud Natively

• Data agility to cross cloud boundaries

• Centralized data lifecycle management

• Decide which data set is stored and processed where

• From edge for inference to private cloud for optimization and automation to public cloud for model training and replication.

Traditional "spinning disk" based storage have not been designed for AI/ML workloads. They lack the performance, agility, cost effectiveness, latency, power consumptions attributes necessary to enable AI networks at scale. Modern storage infrastructure, designed for high performance computing rely on Flash storage, an efficient, cost effective, low power, high performance technology that enables compute and network elements to perform at line rate for AI workloads.

Tuesday, January 28, 2025

The AI-Native Telco Network V: Network

The AI-Native Telco Network I

The AI-Native Telco Network II

The AI-Native Telco Network III

The AI-Native Telco Network IV: Compute

As we have seen in previous posts, AI and the journey to autonomous networks forces telco operators to look at their network architecture and reevaluate whether their infrastructure is fit for this purpose. In many cases, the first reflex for them is to deploy new servers and GPUs in AI dedicated pods and to find out that processing power itself is not enough for a high performance AI system. The network connectivity needs to be accelerated as well.

SmartNICs

While dedicated routing and packet processing are necessary, one way to increase performance of an AI pod is to deploy accelerators in the shape of Smart Network Interface Cards (SmartNICs).

SmartNICs are specialized network cards designed to offload certain networking tasks from the CPU and provide additional processing power at the network edge. Unlike traditional NICs, which merely serve as communication devices, SmartNICs come equipped with onboard processing capabilities such as CPUs, ASICs, FPGAs or programmable processors. These capabilities allow SmartNICs to handle packet processing, traffic management, and other networking tasks, without burdening the CPU.

While they are certainly hybrid compute / network dedicated silicon, they accelerate overall performance by offloading packet processing, user plane functions, load balancing, etc. from the CPUs and GPUs that can be freed up for pure AI workload processing.

For telecom providers, SmartNICs offer a way to improve network efficiency while simultaneously boosting the ability to handle AI workloads in real-time.

High-Speed Ethernet

One of the most straightforward ways to increase network speed is by adopting higher bandwidth Ethernet standards. Traditional networks may rely on 10GbE or 25GbE, but AI workloads benefit from faster connections, such as 100GbE or even 400GbE, which provide higher throughput and lower latency.

AI models, especially large deep learning models, require massive data transfer between nodes. Upgrading to 100GbE or 400GbE can drastically improve the speed at which data is exchanged between GPUs, CPUs, and storage systems in an AI pod, reducing the time required to train models and increasing throughput.

AI models often need to pull vast amounts of training data from storage. Higher-speed Ethernet allows AI pods to access data more quickly, decreasing bottlenecks in I/O.

Use Low-Latency Networking Protocols

Adopting advanced networking protocols such as InfiniBand or RoCE (RDMA over Converged Ethernet) is essential to reduce latency in AI pods. These protocols are designed to enable faster communication between nodes by bypassing traditional network stacks and reducing the overhead that can slow down AI workloads.

InfiniBand and RoCE provide extremely low-latency communication between AI pods, which is crucial for high-performance AI training and inference.
These protocols support higher bandwidths (up to 200Gbps or more) and provide more efficient communication channels, ideal for high-throughput AI workloads like distributed deep learning.

To increase AI performance, telecom operators need to focus on upgrading their network infrastructure to support the growing demands of AI workloads. By implementing strategies such as high-speed Ethernet, SmartNICs, and specialized AI interconnects, operators can enhance the speed, scalability, and efficiency of their AI pods. This enables faster processing of large datasets, reduced latency, and improved overall performance for AI training and inference, allowing telecom operators to stay ahead in the competitive AI-driven landscape.

Storage, we will see in the next post, plays also an integral part in AI performance on a telecom network.

Thursday, January 23, 2025

The AI-Native Telco Network IV: Compute

The AI-Native Telco Network I

The AI-Native Telco Network II

The AI-Native Telco Network III

As we have seen in previous posts, to accommodate and make use of AI at scale, a network must be tuned and architected for this purpose. While any telco network can deploy AI in discrete environments or throughout its fabric, the difference between a Data strategy and an AI strategy is speed + feedback loop.

Most Data collected in a telco network has been used for very limited purpose. Mainly archiving for forensics to determine the root cause of an anomaly or outage, charging and customer management functions or for legal interception or regulatory requirements. For these use cases, Data needs to be properly formatted and laid to rest until analytics engines can provide a representation of the state of the network or an account. Speed is not an issue here, the system can suffer minutes or hour delays before a coherent picture is formed and represented.

AI altogether can provide better insight through larger datasets than classical analytics. It provides better capacity to correlate events and to predict the evolution of the network state. It can also propose optimization, enhancements, mitigation recommendations, but to be truly effective, it needs to be able to have feedback loop to the network functions, so that these recommendations can be turned into actions and automated.

Herein lies the trick. If you want to run AI in your network, so that you can automate it, allowing it to reactively or proactively auto scale, heal, optimize its performance, power consumption, cost, etc... at scale, it cannot be done manually. Automation is necessary throughout. Speed from event, anomaly, pattern, insight detection to action becomes key.

As we have seen, speed is the product of high performance, low latency in the production, extraction, storage, and processing of data to create actionable insights that can be automated. At the fabric layer, compute, connectivity and storage are the elements that need to be properly designed to enable the speed to run AI.

In this post, we will look at the compute function. Processing, analyzing, manipulating Data requires computing capabilities. There are different architectures of computing units for different purposes.

The CPU (Central Processing Units) are general purpose computing, suitable for serial tasks. Multiple CPU Cores can work in parallel to enhance performance. Suitable for most telecoms functions, except real time processing. Generic CPUs are used in most telco data centers and clouds for most telco functions, from OSS, BSS to Core and transport. At the edge and the RAN, CPUs are used for Centralized Unit functions.
ASICs (Application Specific Integrated Circuits) are CPUs that have been designed for specific tasks or applications. They are not as versatile as other processing units but deliver the absolute highest performance in smallest footprint for specific applications. They can be found in first generation Open RAN servers to run Distributed Unit functions, as well as in specialized packet routing and packet switching (more on that in the connectivity post).
FPGA (Field Programmable Gate Arrays) are CPUs that can be programmed to adapt to specific workloads without necessitating complete redesign. They provide a good balance between adaptability and performance and are suitable for cryptographic and rapid data processing. They are used in telco networks in security gateways, as well as advanced routing and packet processing functions.
GPUs (Graphics Processing Units) feature large numbers of smaller cores, coupled with high memory bandwidth making them suitable for graphics processing and large number of parallel matrix calculations. In telco network, GPUs are starting to be introduced for AI / ML workloads in data centers and clouds (neural networks and model training), as well as in the RAN for the Distributed Unit and RAN Intelligent Controller.
TPUs (Tensor Processing Units) are Google's specialized processing units optimized for Tensor processing of ML and deep learning model training and inference. They are not yet used in Telco environments but can be used on Google Cloud in a hybrid scenario.
NPUs (Neural Processing Units) are designed for Neural Networks for deep learning processing. They are very suitable for inference tasks as their power consumption and footprint are very small. They start to appear in telco networks at the edge, and in devices.

Artificial Intelligence, Machine Learning can run on any of the above computing platform. The difference is the performance, footprint, cost and power consumption profile. We have seen lately the emergence of GPUs as the new processing unit poised to replace CPUs, ASICs and FPGAs in specialized traffic functions, using the RAN and AI as its beachhead. GPUs are key in running AI workloads at scale , delivering the performance in terms of low latency and high throughput necessary for rapid time to insight.

Their cost and power consumption forces network operators to find the right balance between the number of GPUs and their placement throughout the network, to enable both high processing power necessary for model training, in the private cloud, together with low latency for rapid inferencing and automation at the edge. While this architecture might provide the best basis for an automated or autonomous network, its cost and the rapid rate of change in GPU generations might give most a pause.

The main challenge becomes the selection of compute architecture that can provide the most capacity, speed, while remaining cost effective to procure and run. For this reason, many telco operators have decided to centralize in a first step their GPU farms, to fine tune their use cases, with limited decentralized deployments. Another avenue for exploration is the wholesaling of the compute capacity to reduce internal costs. We have seen a few GPUaaS and AIaaS initiatives recently announced.

In any cases, most operators who have deployed high capacity AI pods with GPUs, find that the performance of the overall system requires further refinement and look at connectivity as the next step in their AI-Native network journey. That will be the theme of our next post.

Thursday, December 19, 2024

The AI-Native Telco Network III

The AI Native Telco Network I

The AI Native Telco Network II

Telecommunications Networks have evolved over time to accommodate voice, texts, images, web browsing, video streaming and social media. Radio, transport and core networks have seen radical evolution to accommodate these. Recently, Cloud computing has influenced telecom networks designs, bringing separation of control /user plane, hardware /software and centralization of management, configuration, orchestration and administration functions.

Telecom networks have always generated and managed enormous amounts of data which have historically been stored in local appliances, then offloaded to larger specialized data storage systems for analysis, post processing and analytics. The journey between the creation of the data to its availability for insight was 5-10 minutes. This was fine as long as data was used for alarming, dashboards and analytics.

Lately, Machine Learning, used to detect patterns in large data sets and to provide actionable insights, has undergone a dramatic acceleration with advances in Artificial Intelligence. AI has changed the way we look at data by opening the promises of network and patterns predictability, automation at scale and ultimately autonomous networks. Generative AI, Interpretative AI and Predictive AI are the three main applications of the technology.

Generative AI is able to use natural language as an input and to create text, documentation, pictures, videos, avatars and agents, intuiting the intent behind the prompt by harnessing Large Language Models.

Interpretative AI provides explanation and insight from large datasets, to highlight patterns, correlation and causations that go unnoticed if processed manually.

Predictive AI draws from time series and correlation pattern analysis to propose predictions on the evolution of these patterns.

Implementing an AI-Native network requires careful consideration - the way data is extracted, collected, formatted, exported, stored before processing has an enormous impact on the quality and precision of the AI output.

To provide its full benefit, AI is necessarily distributed, with Large Language Models training better suited for large compute clusters in private or public clouds, while inference and feedback loop management is more adequately deployed at the edge of the network, particularly for latency sensitive services.

In particular, the extraction and speed of transmission of the data, throughout the compute continuum, from edge to cloud is crucial to an effective AI native infrastructure strategy.

In a telecom network, the compute continuum consists of the device accessing the network, the Radio Access Network with its Edge, the Metro and Regional Central Offices, the National Data Centers hosting the Private Cloud and the International Data Centers hosting the Public Clouds.

As network operators examine the implications of running AI in their networks, enhancing, distributing and linking compute, storage and networking throughout the continuum becomes crucial.

Compute is an essential part of the AI equation but it is not the only one. For AI to perform at scale, connectivity and storage architecture are key.

To that end, large investments are made to deploy advanced GPUs, SmartNICs and next generation storage from the edge to the cloud, to allow for hierarchized levels of model training and inference.

One of the applications of AI is the detection of patterns in large data sets, allowing the prediction of an outcome or the generation of an output based on statistical analysis. The larger the datasets, the more precise the pattern detection, the more accurate the prediction, the more human-like the output.

In many cases, AI engines can create extremely good predictions and output based on large datasets. The data needs to be accurate but not necessarily recent. Predicting seasonal variations in data traffic in a network, for instance, requires accurate time series, but not up to the minute refresh.

However, networks automation and the path to autonomous require datasets to be perpetually enriched and refreshed with real time data streams, enabling fast optimization or adaptation.

Telecoms networks are complex, composed of many domains, layers and network functions. While they are evolving towards cloud native technology, all networks have a certain amount of legacy, deployed in closed appliances, silos or monolithic virtual machines.

To function at scale in its mission of automation towards autonomous networks, AI needs a real time understanding of the network state, health, performance, across all domains and network functions. The faster data can be extracted and processed, the faster the feedback loop and the reaction or anticipation of network and demand events.

As AI applications scale, the network infrastructure must be able to handle increased data traffic without compromising performance. High-speed data transmission and low latency are key to maintaining scalability. For applications like autonomous vehicles, real-time fraud detection, and other AI-driven services, low latency ensures a seamless and responsive user experience. Data transmission speed and low latency are essential for the efficient and effective operation of AI-based network automation, enabling real-time processing, efficient data handling, improved performance, scalability, and enhanced user experience.

There are several elements that impact latency and data transmission in a telecom network. Among those is how fast traffic can be computed throughout the continuum.

To that end, AI-Native Telco networks have been rethinking the basic architecture and infrastructure necessary for the networking, compute and storage functions.

I will examine in the subsequent posts the evolution of compute, networking and storage functions to enable networks to evolve to an AI-Native architecture.

Friday, July 5, 2024

Readout: Ericsson's Mobility Report June 2024

It has been a few years now, since Ericsson has taken to provide a yearly report on their view of the evolution of connectivity. Alike Cisco's annual internet report, it provides interesting data points on telecom technology and services' maturity, but focused on cellular technology, lately embracing fixed-wireless access and non terrestrial networks as well.

In this year's edition, a few elements caught my attention:

Devices supporting network slicing are few and far in-between. Only iOS 17 and Android 13 support some capabilities to indicate slicing parameters to their underlying applications. These devices are the higher end latest smartphones, so it is no wonder that 5G Stand Alone is late in delivering on its promises, if end to end slicing is only possible for a small fraction of customers. It is still possible to deploy slicing without device support, but there are limitations, most notably slicing per content / service, while slicing per device or subscriber profile is possible.
RedCap (5G reduced Capability) for IoT, wearables, sensors, etc... is making its appearance on the networks, mostly as demo and trials at this stage. The first devices are unlikely to emerge in mass market availability until end of next year.
Unsurprisingly, mobile data traffic is still growing, albeit at a lower rate than previously reported with a 25% yearly growth rate or just over 6% quarterly. The growth is mostly due to smartphones and 5G penetration and video consumption, accounting for about 73% of the traffic. This traffic data includes Fixed Wireless Access, although it is not broken down. The rollout of 5G, particularly in mid-band, together with carrier aggregation has allowed mobile network operators to efficiently compete with fixed broadband operators with FWA. FWA's growth, in my mind is the first successful application of 5G as a differentiated connectivity product. As devices and modems supporting slicing appear, more sophisticated connectivity and pricing models can be implemented. FWA price packages differ markedly from mobile data plans. The former are mostly speed based, emulating cable and fibre offering, whereas the latter are usually all you can eat best effort connectivity.
Where the traffic growth projections become murky, is with the impact of XR services. Mixed, augmented, virtual reality services haven't really taken off yet, but their possible impact on traffic mix and network load can be immense. XR requires a number of technologies to reach maturity at the same time (bendable / transparent screens, low power, portable, heat efficient batteries, low latency / high compute on device / at the edge, high down/ up link capabilities, deterministic mash latency over an area...) to reach mass market and we are still some ways away from it in my opinion.
Differential connectivity for cellular services is a long standing subject of interest of mine. My opinion remains the same: "The promise and business case of 5G was supposed to revolve around new connectivity services. Until now, essentially, whether you have a smartphone, a tablet, a laptop, a connected car, an industrial robot and whether you are a working from home or road warrior professional, all connectivity products are really the same. The only variable are the price and coverage.

5G was supposed to offer connectivity products that could be adapted to different device types, verticals and industries, geographies, vehicles, drones,... The 5G business case hinges on enterprises, verticals and government adoption and willingness to pay for enhanced connectivity services. By and large, this hasn't happened yet. There are several reasons for this, the main one being that to enable these, a network overall is necessary.

First, a service-based architecture is necessary, comprising 5G Stand Alone, Telco cloud and Multi-Access Edge Computing (MEC), Service Management and Orchestration are necessary. Then, cloud-native RAN, either cloud RAN or Open RAN (but particularly the RAN Intelligent Controllers - RICs) would be useful. All this "plumbing" to enable end to end slicing, which in turn will create the capabilities to serve distinct and configurable connectivity products.

But that's not all... A second issue is that although it is accepted wisdom that slicing will create connectivity products that enterprises and governments will be ready to pay for, there is little evidence of it today. One of the key differentiators of the "real" 5G and slicing will be deterministic speed and latency. While most actors of the market are ready to recognize that in principle a controllable latency would be valuable, no one really knows the incremental value of going from variable best effort to deterministic 100, 10 or 5 millisecond latency.

The last hurdle, is the realization by network operators that Mercedes, Wallmart, 3M, Airbus... have a better understanding of their connectivity needs than any carrier and that they have skilled people able to design networks and connectivity services in WAN, cloud, private and cellular networks. All they need is access and a platform with APIs. A means to discover, reserve, design connectivity services on the operator's network will be necessary and the successful operators will understand that their network skillset might be useful for consumers and small / medium enterprises, but less so for large verticals, government and companies." Ericsson is keen to promote and sell the "plumbing" to enable this vision to MNOs, but will this be sufficient to fulfill the promise?
Network APIs are a possible first step to open up the connectivity to third parties willing to program it. Network APIs is notably absent from the report, maybe due to the fact that the company announced a second impairment charge of 1.1B$ (after a 2.9B$ initial write off) in less than a year on the 6.2B$ acquisition of Vonage.
Private networks are another highlighted trend in the report with a convincing example of an implementation with Northstar innovation program, in collaboration with Telia and Astazero. The implementation focuses on automotive applications, from autonomous vehicle, V2X connectivity, remote control... On paper, it delivers everything operators dream about when thinking of differentiated connectivity for verticals and industries. One has to wonder how much it costs and whether it is sustainable if most of the technology is provided by a single vendor.
Open RAN and Programmable networks is showcased in AT&T's deal that I have previously reported and commented. There is no doubt that single vendor automation, programmability and open RAN can be implemented at scale. The terms of the deal with AT&T seem to indicate that it is a great cost benefit for them. We will have to measure the benefits as the changes are being rolled out in the coming years.

Friday, October 20, 2023

FYUZ 2023 review and opinions on latest Open RAN announcements

Last week marked the second edition of FYUZ, the Telecom Infra Project's annual celebration of open and disaggregated networks. TIP's activity, throughout the year, provides a space for innovation and collaboration in telecoms network access, transport and core main domains. The working groups create deployment blueprints as well as implementation guidelines and documentation. The organization also federates a number of open labs, facilitating interoperability, conformance and performance testing.

I was not there are for the show's first edition, last year, but found a lot of valuable insight in this year's. I understand from casual discussion with participants that this year was a little smaller than last, probably due to the fact that the previous edition saw Meta presenting its Metaverse ready networks strategy, which attracted a lot of people outside the traditional telco realm. AT about 1200 attendees, the show felt busy without being overwhelming and the mix of main stage conference content in the morning and breakout presentations in the afternoon left ample time for sampling the top notch food and browsing the booth. What I found very different in that show also, was how approachable and relaxed attendees were, which allowed for productive and yet casual discussions.

Even before FYUZ, the previous incarnation of the show, the TIP forum was a landmark show for vendors and operators announcing their progress on open and disaggregated networks, particularly around open RAN.

The news that came out of the show this year marked an interesting progress in the technology's implementation, and a possible transition from the trough of disillusion to a pragmatic implementation.

The first day saw big announcements from Santiago Tenorio, TIP's chairman and head of Open RAN at Vodafone. The operator announced that Open RAN's evaluation and pilots were progressing well and that it would, in its next global RFQ for RAN refresh, affecting over 125,000 cell sites see Open RAN gain at least 30% of the planned deployment. The RFQ is due to be released this year for selection in early 2024, as their contracts with existing vendors are due to expire in April 2025.

That same day, Ericsson’s head of networks, Fredrik Jejdling, confirmed the company's support of Open RAN announced earlier this year. You might have read my perspective on Ericsson's stance on Open RAN, the presentation did not change my opinion, but it is a good progress for the industry that the RAN market leader is now officially supporting the technology, albeit with some caveats.

Nokia, on their side announced a 5G Open RAN pilot with Vodafone in Italy, and another pilot successfully completed in Romania, on a cluster of Open RAN sites shared by Orange and Vodafone (MOCN).

While TIP is a traditional conduit for the big 5 European operators to enact their Open RAN strategy, this year saw an event dominated by Vodafone, with a somewhat subdued presence from Deutsche Telekom, Telefonica, Orange and TIM. Rakuten Symphony was notable by its absence, as well as Samsung.

The subsequent days saw less prominent announcements, but good representation and panel participation from Open RAN supporters and vendors. Particularly, Mavenir and Juniper networks were fairly vocal about late Open RAN joiners who do not really seem to embrace multivendor competition and open API / interfaces approach.

I was fortunate to be on a few panels, notably on the main stage to discuss RAN intelligence progress, particularly around the RICs and Apps emergence as orchestration and automation engines for the RAN.

I also presented the findings of my report on the topic, presentation below and moderated a panel on overcoming automation challenges in telecom networks with CI/CD/CT.

Monday, October 2, 2023

DOCOMO's 30% TCO Open RAN savings

DOCOMO announced last week, during Mobile World Congress Las Vegas the availability of its OREX offering for network operators. OREX, which stands for Open RAN Experience, was initially introduced by the Japanese operator in 2021 as OREC (Open RAN Ecosystem).

The benefits claimed by DOCOMO are quite extraordinary, as they expect to "reduce clients’ total cost of ownership by up to 30% when the costs of initial setup and ongoing maintenance are taken into account. It can also reduce the time required for network design by up to 50%. Additionally, OREX reduces power consumption at base stations by up to 50%".

The latest announcement clarifies DOCOMO's market proposition and differentiation. Since the initial communications of OREX, DOCOMO was presenting to the market a showcase of validated Open RAN blueprint deployments that the operator had carried out in its lab. What was unclear was the role DOCOMO wanted to play. Was the operator just offering best practice and exemplar implementation or were they angling for a different play? The latest announcement clarifies DOCOMO's ambitions.

On paper, the operator showed an impressive array of vendors, collaborating to provide multi vendor Open RAN deployments, with choices and some possible permutations between each element of the stack.

At the server layer, OREX provided options from DELL, HP and Fujitsu, all on x86 platforms, with various acceleration ASICS/FPGA... from Intel FlexRAN, Qualcomm, AMD and nvidia. While the COTS servers are readily interchangeable, the accelerator layer binds the open RAN software vendor and is not easily swappable.

At the virtualization O-Cloud layer, DOCOMO has integrated vmware, Red Hat, and WNDRVR which represents the current best of breed in that space.

The base station software CU / DU has seen implementations from Mavenir, NTT Data, and Fujitsu.

What is missing in this picture and a little misleading is the Open Radio Unit vendors that have participated in these setups, since this where network operators need the most permutability. As of today, most Open RAN multi vendor deployments will see a separate vendor in the O-RU and CU/DU space. This is due to the fact that no single vendor today can satisfy the variety of O-RUs necessary to meet all spectrum / form factors a brownfield operator needs. More details about this in my previous state of Open RAN post here.

In this iteration, DOCOMO has clarified the O-RU vendors it has worked with most recently (Dengyo Technology, DKK Co, Fujitsu, HFR, Mavenir, and Solid). As always the devil is in the detail and unfortunately DOCOMO falls short from providing a more complete view of the types of O-RU (mMIMO or small cell?) and the combination of O-RU vendor - CU/DU vendor - Accelerator vendor - band, which is ultimately the true measure of how open this proposition would be.

What DOCOMO clarifies most in this latest iteration, is their contribution and the role they expect to play in the market space.

First, DOCOMO introduces their Open RAN compliant Service Management and Orchestration (SMO). This offering is a combination of NTT DOCOMO developments and third party contributions (details can be found in my report and workshop Open RAN RICs and Apps 2023). The SMO is DOCOMO's secret sauce when it comes to the claimed savings, resulting mainly from automation of design, deployment and maintenance of the Open RAN systems, as well as RU energy optimization.

At last, DOCOMO presents their vast integration experience and is now proposing these systems integration, support and maintenance services. The operator seeks the role of specialized SI and prime contractor for these O-RAN projects.

While DOCOMO's experience is impressive and has led many generations of network innovation, the latest movement to transition from leading operator and industry pioneer to O-RAN SI and vendor is reminiscent of other Japanese companies such as Rakuten with their Symphony offering. Japanese operators and vendors see the contraction of their domestic market as a strategic threat to their core business and they try to replicate their success overseas. While quite successful in greenfield environments, the hypothesis that brownfield operators (particularly tier 1) will buy technology and services from another carrier (even if not geographically competing) still needs to be validated.

Monday, September 25, 2023

Is Ericsson's Open RAN stance that open?

An extract from the Open RAN RIC and Apps report and workshop.

Ericsson is one of the most successful Telecom Equipment Manufacturers of all time, having navigated market concentration phases, the emergence of powerful rivals from China and elsewhere, and the pitfalls of the successive generations and their windows of opportunity for new competitors to emerge.

With a commanding estimated global market share of 26.9% (39% excluding China) in RAN, the company is the uncontested leader in the space. While the geopolitical situation and the ban of Chinese vendors in many western markets has been a boon for the company’s growth, Open RAN has become the largest potential threat to their RAN business.

At first skeptical (if not outright hostile) to the new architecture, the company has been keeping an eye on its development and traction over the last years and has formulated a cautious strategy to participate and influence its development.

In 2023, Ericsson seems to have accepted that Open RAN is likely to stay and represents both a threat and opportunity for its telecom business. The threat is of course on the RAN infrastructure business, and while the company has been moving to cloud ran, virtualizing and containerizing its software, the company still in majority ships vertical, fully integrated base stations.

When it comes to Open RAN, the company seems to get closer to embracing the concept, with conditions.

Ericsson has been advocating that the current low layer split 7.2.x is not suitable for massive MIMO and high capacity 5G systems and is proposing an alternative fronthaul interface to the O-RAN alliance. Cynics might say this is a delaying tactic, as other vendors have deployed massive MIMO on 7.2.x in the field, but as market leader, Ericsson has some strong datasets to bring to the conversation and contest the suitability of the current implementation. Ericsson is now publicly endorsing Open RAN architecture and, having virtualized its RAN software, will offer a complete solution, with O-RU, vDU,.vCU, SMO and Non-RT RIC . The fronthaul interface will rely on the recently proposed fronthaul and the midhaul will remain the F1 3GPP interface.

On the opportunity front, while most Ericsson systems usually ship with an Element Management System (EMS), which can be integrated into a Management and Orchestration (MANO) or Service Management and Orchestration (SMO) framework, the company has not entirely dominated this market segment and Open RAN, in the form of SMO and Non-RT RIC represent an opportunity to grow in the strategic intelligence and orchestration sector.

Ericsson is using the market leader playbook to its advantage. First rejecting Open RAN as immature, not performing and not secure, then admitting that it can provide some benefits in specific conditions, and now embracing it with very definite caveats.

The front haul interface proposal by the company seems self-serving, as no other vendor has really raised the same concerns in terms of performance and indeed commercial implementations have been observed with performance profiles comparable to traditional vendors.

The Non-RT RIC and rApp market positioning is astute and allows Ericsson simultaneously to claim support for Open RAN and to attack the SMO market space with a convincing offer. The implementation is solid and reflects Ericsson’s high industrialization and quality practice. It will doubtless offer a mature implementation of SMO / Non-RT RIC and rApps and provide a useful set of capabilities for operators who want to continue using Ericsson RAN with a higher instrumentation level. The slow progress for 3^rd party integration both from a RIC and Apps perspective is worrisome and could be either the product of the company quality and administrative processes or a strategy to keep the solution fairly closed and Ericsson-centric, barring a few token 3^rd party integrations.

Thursday, February 20, 2020

Telco relevance and growth

I am often asked what I think are the necessary steps for network operators to return to growth. This is usually a detailed discussion, but at a high level, I think a key to operators' profitability is in creating network services that are differentiated.
I have seen so much value being created for consumers and enterprises at Telefonica when we started retaking control of the connectivity, that I think there are some universal lessons to be learned there.

Curating experiences

Creating differentiated network services doesn't necessarily mean looking at hyper futuristic scenarios that entail autonomous drones or remote surgery. While these are likely to occur in the next 10 to 20 years, there is plenty today that can be done to better user experiences.
For instance, uploading large files or editing graphics files in the cloud is still slow and clumsy. Also, broadband networks' advertised speed has become meaningless for most consumers. How can you have a 600mbps connection and still suffer from pixelated video stream or a lagging gaming session? There are hundreds of these unsatisfactory experiences that could benefit from better connectivity.

These nonoptimal experiences can be where operators can start creating value and differentiating themselves. Afterall, operators own their networks; since they do not rely on the open internet for transport, they should presumably be able to control the traffic and user experience at a granular level? A better connectivity experience is not always synonymous with more speed, in most case it means a control debit, latency and volume.

Accepting this, means that you have to recognize that the diktat of "one size fits all" is over for your network. You cannot create a connectivity product that is essentially the same for everyone, whether they are a teenage gamer, an avid video streaming fan, an architect office, a dentist or a bank branch. They all have different needs, capabilities, price elasticity and you can't really believe that your network will be able to meet all their needs simultaneously without more control. Growth is unlikely to come in the future for everyone paying the same price for the same service. There are pockets of hyper profitability to extract, but they need a granular control of the connectivity.

"Vanilla" connectivity for all will not grow in terms of revenue per user with more general speed.

Being able to create differentiated experience for each segment means certainly being able to identify and measure them. That's the easy part. Operators mostly have a good, granular grasp on their market segments. The hard part is finding out what these segments want / need and are willing to pay. The traditional approach is to proceed by creating a value proposition, based on a technology advance, test it in market studies, focus groups, limited trials and trials at scale before national launch.

While this might work well for services that are universal and apply to a large part of the population, identifying the micro segments that are willing to pay more for a differentiated connectivity experience requires a more granular approach. Creating experiences that delight the customers is usually not the result of a marketing genius that had it all planned in advance. In my experience, creating, identifying and nurturing this value comes from the contact with the client, letting them experience the service. There are usually many unintended consequences when one starts playing with connectivity. Many of successful telco services are the fruit of such unintended consequences (texting was initially a signalling protocol for instance).

Programmable networks

One way to create and curate such experiences is to increase your control on the connectivity. This means disaggregate, virtualize and software-define the elements of your access (virtualize the OLT and the RAN, built a programmable SDN layer).
You should accept that you can't a priori really understand what your customers will value without testing it. There will be a lot of unintended consequences (positive and negative). It is therefore necessary to create a series of hypothesis that you will systematically test with the customer to validate or discard them. These tests must happen "in the wild" with real customers, because there are invariably also many unintended consequences in deploying in live networks with real population compared to in a lab with "friends and family" users.
In average, you might need to test 50-60 variants to find 2 or 3 successful services. In telecom-years, that's about 100 years at today's development / testing cycles. But if you have a programmable networks, and know how to program, these variants can be created and tested at software speed.

Therefore, you need to test often and pivot fast and you need to be able to test with small, medium and large samples. The key for this is to build an end to end CI/CD lab that is able to coarsely reproduce your network setup from the core, the access and transport perspective. It needs to be software defined with open interfaces, so that you can permutate, swap and configure new elements on-demand.

Since current networks and elements are so complex and proprietary, you need to identify greenfields and islands of wilderness in your connectivity where you will be able to experiment in isolation without disrupting your core customer base. At Telefonica, these uncharted connectivity fields were rural networks and edge computing, in other networks, AI-augmented networks operation, network slicing or 5G could be perfect experimentation grounds.

Pluridisciplinary teams

Another learning is that not integrating user / customer feedback at every stage of the elaboration of the service is deadly. It is necessary that UX designers be part of the process from the inception and throughout. They might not be as heavily involved in some phases (development) than others (inception, beta, trial...) so they can be shared across projects.
Increasingly, data science, security and privacy good practices need to be considered also throughout the projects pivot points. In many cases, it is difficult, expensive or impossible to retrofit them if they were not part of the original design.
Products and services do not necessarily need large teams to take off the ground and create value, but they do need dedication and focus. Resist the temptation to have the core team work cross-project. What you gain by identifying possible synergies, you lose in velocity. Rather have small dedicated teams with core members and specialists that are lent from project to project for periods of time.
Foster internal competition. Evaluate often and be ready to pivot or kill projects.

Paradoxically, when you find a successful service, in many organization, the phase in which these projects are most likely to die is when transitioning to the products and business teams. The key is possibly for these not to transition. I have long advocated that it is easier for an operator to launch 5G as a separate company than as an evolution. But it is impractical for many operators to consider starting a parallel organization for network transformation.These innovations, if they are to transform the way the networks and services are managed must be accompanied by a continuous training process and a constant resource rotation between innovative and live projects. Therefore transformation and innovation is not the work of a dedicated team, but of the whole workforce and everyone has opportunity to participate in innovation projects, from inception to delivery.

Beyond the "how", the teams need a clear framework to guide them in their daily decision making. The "what" needs to be oriented by a vision, strategies, tactics and a doctrine that will explore in a subsequent post.

Please share your experience with transformation and innovation projects in the telco world. We all grow by sharing. "A rising tide lifts all boats".

Interested in how these principles were applied to the creation of the Open RAN market? contact me for a copy of the report "xRAN 2020".

Pages

Connect on Linkedin

Wednesday, September 10, 2025

The 6G promise

Thursday, February 6, 2025

The AI-Native Telco Network VI: Storage

Why are traditional storage architectures not suitable for AI-Native Networks?

Storage Requirements for AI-Native Networks

Tuesday, January 28, 2025

The AI-Native Telco Network V: Network

SmartNICs

High-Speed Ethernet

Use Low-Latency Networking Protocols

Thursday, January 23, 2025

The AI-Native Telco Network IV: Compute

Thursday, December 19, 2024

The AI-Native Telco Network III

Friday, July 5, 2024

Readout: Ericsson's Mobility Report June 2024

Friday, October 20, 2023

FYUZ 2023 review and opinions on latest Open RAN announcements

Monday, October 2, 2023

DOCOMO's 30% TCO Open RAN savings

Monday, September 25, 2023

Is Ericsson's Open RAN stance that open?

Thursday, February 20, 2020

Telco relevance and growth

Curating experiences

Programmable networks

Pluridisciplinary teams