Showing posts with label IaaS. Show all posts
Showing posts with label IaaS. Show all posts

Thursday, July 31, 2025

The Orchestrator Conundrum strikes again: Open RAN vs AI-RAN

10 years ago (?!) I wrote about the overlaps and potential conflicts of the different orchestration efforts between SDN and NFV. Essentially, observing that, ideally, it is desirable to orchestrate network resources with awareness of services and that service and resource orchestration should have hierarchical and prioritized interactions, so that a service deployment and lifecycle is managed within resource capacity and when that capacity fluctuates, priorities can be enforced.

Service orchestrators have not really been able to be successfully deployed at scale for a variety a reasons, but primarily due to the fact that this control point was identified early on as a strategic effort for network operators and traditional network vendors. A few network operators attempted to create an open source orchestration model (Open Source MANO), while traditional telco equipment vendors developed their own versions and refused to integrate their network functions with the competition. In the end, most of the actual implementation focused on Virtual Infrastructure Management (VIM) and vertical VNF management, while orchestration remained fairly proprietary per vendor. Ultimately, Cloud Native Network Functions appeared and were deployed in Kubernetes inheriting its native resource management and orchestration capabilities.

In the last couple of years, Open RAN has attempted to collapse RAN Element Management Systems (EMS), Self Organizing Networks (SON) and Operation Support Systems (OSS) with the concept of Service Management and Orchestration (SMO). Its aim is to ostensibly provide a control platform for RAN infrastructure and services in a multivendor environment. The non real time RAN Intelligent Controller (RIC) is one of its main artefacts, allowing the deployment of rApps designed to visualize, troubleshoot, provision, manage, optimize and predict RAN resources, capacity and capabilities.

This time around, the concept of SMO has gained substantial ground, mainly due to the fact that the leading traditional telco equipment manufacturers were not OSS / SON leaders and that Orchestration was an easy target for non RAN vendors wanting to find a greenfield opportunity. 

As we have seen, whether for MANO or SMO, the barriers to adoption weren't really technical but rather economic-commercial as leading vendors were trying to protect their business while growing into adjacent areas.

Recently, AI-RAN as emerged as an interesting initiative, positing that RAN compute would evolve from specialized, proprietary and closed to generic, open and disaggregated. Specifically, RAN compute could see an evolution, from specialized silicon to GPU. GPUs are able to handle the complex calculations necessary to manage a RAN workload, with spare capacity. Their cost, however, greatly outweighs their utility if used exclusively for RAN. Since GPUs are used in all sorts of high compute environments to facilitate Machine Learning, Artificial Intelligence, Large and Small Language Models, Models Training and inference, the idea emerged that if RAN deploys open generic compute, it could be used both for RAN workloads (AI for RAN), as well as workloads to optimize the RAN (AI on RAN and ultimately AI/ML workloads completely unrelated to RAN (AI and RAN).

While this could theoretically solve the business case of deploying costly GPUs in hundreds of thousands of cell site, provided that the compute idle capacity could be resold as GPUaaS or AIaaS, this poses new challenges from a service / infrastructure orchestration standpoint. AI RAN alliance is faced with understanding orchestration challenges between resources and AI workloads

In an open RAN environment. Near real time and non real time RICs deploy x and r Apps. The orchestration of the apps, services and resources is managed by the SMO. While not all App could be categorized as "AI", it is likely that SMO will take responsibility for AI for and on RAN orchestration. If AI and RAN requires its own orchestration beyond K8, it is unlikely that it will be in isolation from the SMO.

From my perspective, I believe that the multiple orchestration, policy management and enforcement points will not allow a multi vendor environment for the control plane. Architecture and interfaces are still in flux, specialty vendors will have trouble imposing their perspective without control of the end to end architecture. As a result, it is likely that the same vendor will provide SMO, non real time RIC and AI RAN orchestration functions (you know my feelings about near real time RIC)

If you make the Venn diagram of vendors providing / investing in all three, you will have a good idea of the direction the implementation will take.

Monday, January 4, 2021

The telco multi core

TobiasD / Pixabay

 
There is something that has been irking me for the last few months: everyone in telco seems to carry on thinking that they will continue have a single omnipotent centralized core network. Even though variations between workloads (voice vs browsing vs video vs gaming vs AR vs AI vs IoT...) continue to amplify and the business models (owned, and operated, IaaS, SaaS, PaaS...) increasingly require separate command and control.

The answer seems to be that slicing will magically solve everything. I fail to understand how slicing can accommodate diverging simultaneous needs from the same infrastructure without overprovisioning but that's a question for another time.

What troubles me most, is that networks have dealt with separate cores for a long time. In many cases, because of IoT or B2B business units who could not afford the timelines and costs of adapting the centralized core, or because, simply the network authority wanted to separate consumer traffic from enterprises. In other cases, you have network sharing and multi-operator core networks (MOCN) that have emerged as viable solution to segregate and manage traffic in a logical network.

I am not an engineer or a scientist, but it feels like the most advancement in processing in the last years is due to parallelization or specialization, and I don't see silicon vendors building bigger CPUs, but rather orchestrating as many CPUs on the same board as possible to manage concurrent, yet different workloads. This analogy has also seen the emergence of specialized processing units such as GPU or TPUs for specific workloads, in specific circumstances...

Now that most cloud providers and many telco vendors have proven the compatibility of their core network (at least the control plane) with cloud infrastructure and networks, I don't understand why telco standards and industry still feel that 5G will have THE core network to evolve to, and that, when, it will be 5G, when it will be standalone, when it will support slicing, when it will have a platform to recognize, identify, reserve, network resources, when it will be able to create dynamic slices on demand... all will be solved.

I feel that many of these issues have been resolved yet? Slicing is just a new iteration of tunneling, VPN, packet tagging, traffic shaping that are today prevalent in many networks. Cloud providers have effectively solved most of these challenges within their networks already so why are telcos trying to reinvent the wheel? 

Wishing a single, unique, centralized core is not necessarily going to make it so. Other telcos, cloud providers, soon industry verticals, governments, IT vendors will have their core. Thinking that the telco single core architecture will be able to manage all workloads and use cases and verticals simultaneously in a 5G world seems too much like magical thinking.

If you're a telco, you might not like it but you better plan for a multi core network, because others will be soon, whether you want it or not. Chances are there are already premises in the third party caches and edge infrastructure being deployed in your networks.

You might want to start thinking in terms of core per service types, like voice, unicast TV, general browsing, low latency IoT, high compute applications, Edge... and per business model like retail consumer, retail enterprise, wholesale telco, wholesale cloud, IaaS, PaaS...