Monday, December 16, 2024

The AI-Native Telco Network II

I have been working on telco networks big Data, Machine Learning, Deep Learning and AI for the last 8 years or so. Between Interpretative AI, Predictive AI and Generative AI, we have seen much progress lately, but I think a lot of the discussions about using general Large Language Models for telco networks is not applicable.

Much of the datasets in Telcos, like in government and defense, is proprietary. It is not shared outside the organization and wouldn't suffer "contamination" from external sources unless under very specific conditions, for very limited subsets.


As a result, a large part of cloud-based, public LLMs are just noise as far as telcos are concerned. The largest opportunity is in proprietary, smaller models, where the algorithmics can be somewhat outsourced but the storage, processing, training of the model are in house. This type of sovereign or proprietary AI can better account for the specificity of a network and its users than larger models trained on generic data.


The problem many encounter is that the operators don't necessarily have all the data literacy or resource necessary to develop the algorithms or even to format the dataset properly, while specialized vendors might have the AI/ML domain expertise but cannot train the models on real data, since they are proprietary and stay on-network.


The result is telcos first focusing on the architecture and infrastructure of the data network and pipeline, the formatting and scrubbing of the dataset, the storage, processing and transmission of the data between on premise, private and the interaction with hybrid / public cloud instances.

Vendors are proposing a variety of solutions with promises of savings, new revenues and new services, but in many cases, they are based on models running on synthetic data and no one knows what the result will be until tested with the real dataset, tuned and remodeled.

Training models on synthetic data might be necessary for vendors but it's a bit like training for football in the hope to play rugby. Sure. some skills are transferable, but even a world class football player won't make it to professional rugby.

This is where the opportunity lies for operators. Recruit, train telco professionals to be data literate, so that they can understand how vendors should produce datasets and how to exploit them. This is not a spectator sport where you can just buy solutions off the shelf and let your vendors manage them for you.



Monday, August 26, 2024

of AI, automation, complex and complicated systems

 

I get drawn these days into discussions about the soft spot of AI. What is the best use of AI/ML, its utility in generative AI and its use in network automation, optimisation and autonomic functions.

In many cases, these discussions stumble upon misconceptions about the mechanics of statistics and their applications.

To put it simply, many do not distinguish between complexity and complication, which has a great effect on expectations of problem solving, automation and outcome prediction. A complex problem is an assembly of problems that can be broken down in subsets until simple unique problems can be identified, tagged, troubleshooted and resolved. These problems are ideal targets for automation. No matter how complex the task, if it can be broken down, if a method of procedure (mop) can be written for each subtask and eventually for the whole problem, it can be measured, automated, predicted and efficiency gains can be achieved.

Complicated problems are a different animal altogether. They might have sub task that can be identified and broken down, but other parts that have a large level of unknown and uncertainty.

Large Language Models can try to reduce the uncertainty by having larger samples, enabling even outlier patterns to emerge and be identified, but in many cases, complicated problems have dependencies that cannot be easily resolved from a pure mathematical standpoint.

This is where domain expertise comes in. In many cases, whenever issues arise in a telecoms network, it is not necessarily identified immediately from the source of the issue. Troubleshooting in many case requires knowledge of network topology, call flows, protocols, and multi domain expertise across core, transport, access, peering point, connectivity, data centers...

It is not possible to automate what you do not operate well. You cant operate well a system that you can't measure well and you can't measure well a system without a consolidated data storage and management strategy. In many cases, telco systems still produce logs in a proprietary format, on siloed systems and collecting, cleaning, exporting, processing, storing these data in a fully integrated data system is still in its infancy. This is however the very first step before even the categorization into complex or complicated issues can take place.

In many casse, data literacy need to pervade the entire organization to ensure that a data-driven strategy can be enacted, let alone moving to automation, autonomic or AI predictive systems. 

It becomes therefore very important to try and isolate complex from complicated systems and issues and try to apply as much data science and automation to the former, before trying to force AI/ML to the latter. As a rule of thumb, as the number of tasks or variables and the complexity increases, one can move from optimization, using scripting to automation, using scripting + ML, to prediction using AI / ML. As the number of unknowns and complication increases, one has to use subject matter experts and domain experts, to multi domain experts with end to end view of the system. 

As complications and tasks increase, the possibility to achieve autonomous systems decrease, as human expertise and manual intervention increase. Data science becomes less an operator than an attendant or an assistant to detect, automate the subset of tasks with identified outcome and patterns, accelerating resolution of the more complicated problem.