Monday, August 26, 2024

of AI, automation, complex and complicated systems

 

I get drawn these days into discussions about the soft spot of AI. What is the best use of AI/ML, its utility in generative AI and its use in network automation, optimisation and autonomic functions.

In many cases, these discussions stumble upon misconceptions about the mechanics of statistics and their applications.

To put it simply, many do not distinguish between complexity and complication, which has a great effect on expectations of problem solving, automation and outcome prediction. A complex problem is an assembly of problems that can be broken down in subsets until simple unique problems can be identified, tagged, troubleshooted and resolved. These problems are ideal targets for automation. No matter how complex the task, if it can be broken down, if a method of procedure (mop) can be written for each subtask and eventually for the whole problem, it can be measured, automated, predicted and efficiency gains can be achieved.

Complicated problems are a different animal altogether. They might have sub task that can be identified and broken down, but other parts that have a large level of unknown and uncertainty.

Large Language Models can try to reduce the uncertainty by having larger samples, enabling even outlier patterns to emerge and be identified, but in many cases, complicated problems have dependencies that cannot be easily resolved from a pure mathematical standpoint.

This is where domain expertise comes in. In many cases, whenever issues arise in a telecoms network, it is not necessarily identified immediately from the source of the issue. Troubleshooting in many case requires knowledge of network topology, call flows, protocols, and multi domain expertise across core, transport, access, peering point, connectivity, data centers...

It is not possible to automate what you do not operate well. You cant operate well a system that you can't measure well and you can't measure well a system without a consolidated data storage and management strategy. In many cases, telco systems still produce logs in a proprietary format, on siloed systems and collecting, cleaning, exporting, processing, storing these data in a fully integrated data system is still in its infancy. This is however the very first step before even the categorization into complex or complicated issues can take place.

In many casse, data literacy need to pervade the entire organization to ensure that a data-driven strategy can be enacted, let alone moving to automation, autonomic or AI predictive systems. 

It becomes therefore very important to try and isolate complex from complicated systems and issues and try to apply as much data science and automation to the former, before trying to force AI/ML to the latter. As a rule of thumb, as the number of tasks or variables and the complexity increases, one can move from optimization, using scripting to automation, using scripting + ML, to prediction using AI / ML. As the number of unknowns and complication increases, one has to use subject matter experts and domain experts, to multi domain experts with end to end view of the system. 

As complications and tasks increase, the possibility to achieve autonomous systems decrease, as human expertise and manual intervention increase. Data science becomes less an operator than an attendant or an assistant to detect, automate the subset of tasks with identified outcome and patterns, accelerating resolution of the more complicated problem.

Friday, August 16, 2024

Rant: Why do we need 6G anyway?


I have to confess that, even after 25 years in the business, I am still puzzled by the way we build mobile networks. If tomorrow we were to restart from scratch, with today's technology and knowledge of the market, we would certainly design and deploy them in a very different fashion.

Increasingly, mobile network operators (MNOs) have realized that the planning, deployment and management of the infrastructure is a fundamentally different business than the development and commercialization of the associated connectivity services. They follow different investment and amortization cycle and have very different economic and financial profiles. For this reason, investors value network infrastructure differently from digital services and many MNOs have decided to start separating their fibre, antennas, radio assets from their commercial operation.

This has resulted in a flurry of splits, spin off, divestiture and the growth of tower and infrastructure specialized companies. If we follow this pattern to its logical conclusion, looking at the failed economics of 5G and the promises of 6G, one has to wonder whether we are on the right path.

Governments keep treating spectrum as a finite, exclusive resource, whereas as private networks and unlicensed spectrum demand is increasing, it is clear that there is a cognitive dissonance in the economic model. If 5G's success was predicated on enterprise, industries and verticals connectivity and if these organizations have needs that cannot be satisfied by the public networks, why would MNOs spend so much money on a spectrum that is unlikely to bring additional revenue? The consumer market does not need another G until new services and devices emerge that mandate different connectivity profiles. Metaverse was a fallacy, autonomous vehicles, robots... are in their infancy and workaround the lack of connectivity adequacy by keeping their compute and sensors on device, rather than at the edge.

As the industry prepares for 6G and its associated future hype and non sensical use cases and fantastical services, one has to wonder how can we stop designing networks for use cases that never emerge as dominant, forcing redesigns and late adaptation. Our track record as an industry is not great there. If you remember, 2G was designed for voice services. Texting was the unexpected killer app. 3G was designed for Push to talk over Cellular, believe it or not (remember SIP and IMS...) and picture messaging early browsing were successful. 4G was designed for Voice over LTE (VoLTE) and video / social media were the key services. 5G was supposed to be designed for enterprise and industry connectivity but failed to deliver so far (late implementation of slicing and 5G Stand Alone). So... what do we do now?

First, the economic model has to change. Rationally, it is not economically efficient for 4 or 5 MNOs to buy spectrum and deploy their separate networks to cover the same population. We are seeing more and more network sharing agreements, but we must go further. In many countries, it makes more sense to have a single neutral infrastructure operator, including the cell sites, radio, the fiber backhaul even edge data centers / central offices all the way but not including the core. This neutral host can have an economic model based on wholesale and the MNOs can focus on selling connectivity products.

Of course, this would probably suppose some level of governmental and regulatory overhaul to facilitate this model. Obviously, one of the problems here is that many MNOs would have to transfer assets and more importantly personnel to that neutral host, which would undoubtedly see much redundancy from 3 or 4 teams to one. Most economically advanced countries have unions protecting these jobs, so this transition is probably impossible unless a concerted effort to cap hires / not renew retirement departures / retrain people is effected over many years...

The other part of the equation is the connectivity and digital services themselves. Let's face it, connectivity differentiation has mostly been a pricing and bundling exercise to date. MNOs have not been overly successful with the creation and sale of digital services, the emergence of social media, video streaming services having occupied most of the consumer's interest. On the enterprise's side a large part of the revenue is related to the exploitation of the last mile connectivity, with the sale of secure private connections on public networks in the form of MPLS first then SD-WAN to SASE and cloud interconnection as the main services. Gen AI promises to be the new shining beacon of advanced services, but in truth, there is very little there in the short term in terms of differentiation for MNOs. 

There is nothing wrong with being a very good, cost effective, performant utility connectivity provider. But most markets can probably accommodate only one or two of these. Other MNOs, if they want to survive, must create true value in the form of innovative connectivity services. This supposes not only a change of mindset but also skill set. I think MNOs need to look beyond the next technology, the next G and evolve towards a more innovative model. I have worked on many of these, from the framework to the implementation and systematic creation of sustainable competitive advantage. It is quite different work from standards and technology evolution approach favored by MNOs but necessary for these seeking to escape the utility model.

In conclusion, 6G or technological improvements in speed, capacity, coverage, latency... are unlikely to solve the systemic economical and differentiation problem for MNOs unless more effort is put on service innovation and radical infrastructure sharing.

Thursday, August 8, 2024

The journey to automated and autonomous networks

 

The TM Forum has been instrumental in defining the journey towards automation and autonomous telco networks. 

As telco revenues from consumers continue to decline and the 5G promise to create connectivity products that enterprises, governments and large organizations will be able to discover, program and consume remains elusive, telecom operators are under tremendous pressure to maintain profitability.

The network evolution started with Software Defined Networks, Network Functions Virtualization and more recently Cloud Native evolution aims to deliver network programmability for the creation of innovative on-demand connectivity services. Many of these services require deterministic connectivity parameters in terms of availability, bandwidth, latency, which necessitate end to end cloud native fabric and separation of control and data plane. A centralized control of the cloud native functions allow to abstract resource and allocate them on demand as topology and demand evolve.

A benefit of a cloud native network is that, as software becomes more open and standardized in a multi vendor environment, many tasks that were either manual or relied on proprietary interfaces can now be automated at scale. As layers of software expose interfaces and APIs that can be discovered and managed by sophisticated orchestration systems, the network can evolve from manual, to assisted, to automated, to autonomous functions.


TM Forum defines 5 evolution stages from full manual operation to full autonomous networks.

  • Condition 0 - Manual operation and maintenance: The system delivers assisted monitoring capabilities, but all dynamic tasks must be 0 executed manually
  • Step 1 - Assisted operations and maintenance: The system executes a specific, repetitive subtask based on pre-configuration, which can be recorded online and traced, in order to increase execution efficiency.
  • Step 2: - Partial autonomous network: The system enables closed-loop operations and maintenance for specific units under certain external environments via statically configured rules.
  • Step 3 - Conditional autonomous network: The system senses real-time environmental changes and in certain network domains will optimize and adjust itself to the external environment to enable, closed-loop management via dynamically programmable policies.
  • Step 4 - Highly autonomous network: In a more complicated cross-domain environment, the system enables decision-making based on predictive analysis or active closed-loop management of service-driven and customer experience-driven networks via AI modeling and continuous learning.
  • Step 5 - Fully autonomous network: The system has closed-loop automation capabilities across multiple services, multiple domains (including partners’ domains) and the entire lifecycle via cognitive self-adaptation.
After describing the framework and conditions for the first 3 steps, the TM Forum has recently published a white paper describing the Level 4 industry blueprints.

The stated goals of level 4 are to enable the creation and roll out of new services within 1 week with deterministic SLAs and the delivery of Network as a service. Furthermore, this level should allow fewer personnel to manage the network (1000's of person-year) while reducing energy consumption and improving service availability.

These are certainly very ambitious objectives. The paper goes on to describe "high value scenarios" to guide level 4 development. This is where we start to see cognitive dissonance creeping in between the stated objectives and the methodology.  After all, much of what is described here exists today in cloud and enterprise environments and I wonder whether Telco is once again reinventing the wheel in trying to adapt / modify existing concepts and technologies that are already successful in other environments.

First, the creation of deterministic connectivity is not (only) the product of automation. Telco networks, in particular mobile networks are composed of a daisy chain of network elements that see customer traffic, signaling, data repository, look up, authentication, authorization, accounting, policy management functions being coordinated. On the mobile front, the signal effectiveness varies over time, as weather, power, demand, interferences, devices... impact the effective transmission. Furthermore, the load on the base station, the backhaul, the core network and the  internet peering point also vary over time and have an impact on its overall capacity. As you understand, creating a connectivity product with deterministic speed, latency capacity to enact Network as a Service requires a systemic approach. In a multi vendor environment, the RAN, the transport, the core must be virtualized, relying on solid fiber connectivity as much as possible to enable the capacity and speed. The low latency requires multiple computing points, all the way to the edge or on premise. The deterministic performance requires not only virtualization and orchestration of the RAN, but also the PON fiber and end to end slicing support and orchestration. This is something that I led at Telefonica with an open compute edge computing platform, a virtualized (XGS) PON on a ONF ONOS VOLTHA architecture with an open virtualized RAN. This was not automated yet, as most of these elements were advanced prototype at that stage, but the automation is the "easy" part once you have assembled the elements and operated them manually for enough time. The point here is that deterministic network performances is attainable but still a far objective for most operators and it is a necessary condition to enact NaaS, before even automation and autonomous networks.

Second, the high value scenarios described in the paper are all network-related. Ranging from network troubleshooting, to optimization and service assurance, these are all worthy objectives, but still do not feel "high value" in terms of creation of new services. While it is natural that automation first focuses on cost reduction for roll out, operation, maintenance, healing of network, one would have expected more ambitious "new services" description.

All in all, the vision is ambitious, but there is still much work to do in fleshing out the details and linking the promised benefits to concrete services beyond network optimization.