Thursday, December 19, 2024

The AI-native telco network III

Telecommunications Networks have evolved over time to accommodate voice, texts, images, web browsing, video streaming and social media. Radio, transport and core networks have seen radical evolution to accommodate these. Recently, Cloud computing has influenced telecom networks designs, bringing separation of control /user plane, hardware /software and centralization of management, configuration, orchestration and administration functions.

Telecom networks have always generated and managed enormous amounts of data which have historically been stored in local appliances, then offloaded to larger specialized data storage systems for analysis, post processing and analytics. The journey between the creation of the data to its availability for insight was 5-10 minutes. This was fine as long as data was used for alarming, dashboards and analytics.

Lately, Machine Learning, used to detect patterns in large data sets and to provide actionable insights, has undergone a dramatic acceleration with advances in Artificial Intelligence. AI has changed the way we look at data by opening the promises of network and patterns predictability, automation at scale and ultimately autonomous networks. Generative AI, Interpretative AI and Predictive AI are the three main applications of the technology. 

Generative AI is able to use natural language as an input and to create text, documentation, pictures, videos, avatars and agents, intuiting the intent behind the prompt by harnessing Large Language Models.

Interpretative AI provides explanation and insight from large datasets, to highlight patterns, correlation and causations that go unnoticed if processed manually.

Predictive AI draws from time series and correlation pattern analysis to propose predictions on the evolution of these patterns.

Implementing an AI-Native network requires careful consideration - the way data is extracted, collected, formatted, exported, stored before processing has an enormous impact on the quality and precision of the AI output.

To provide its full benefit, AI is necessarily distributed, with Large Language Models training better suited for large compute clusters in private or public clouds, while inference and feedback loop management is more adequately deployed at the edge of the network, particularly for latency sensitive services.

In particular, the extraction and speed of transmission of the data, throughout the compute continuum, from edge to cloud is crucial to an effective AI native infrastructure strategy.

In a telecom network, the compute continuum consists of the device accessing the network, the Radio Access Network with its Edge, the Metro and Regional Central Offices, the National Data Centers hosting the Private Cloud and the International Data Centers hosting the Public Clouds.

As network operators examine the implications of running AI in their networks, enhancing, distributing and linking compute, storage and networking throughout the continuum becomes crucial.

Compute is an essential part of the AI equation but it is not the only one. For AI to perform at scale, connectivity and storage architecture are key.

To that end, large investments are made to deploy advanced GPUs, SmartNICs and next generation storage from the edge to the cloud, to allow for hierarchized levels of model training and inference.

One of the applications of AI is the detection of patterns in large data sets, allowing the prediction of an outcome or the generation of an output based on statistical analysis. The larger the datasets, the more precise the pattern detection, the more accurate the prediction, the more human-like the output.

In many cases, AI engines can create extremely good predictions and output based on large datasets. The data needs to be accurate but not necessarily recent. Predicting seasonal variations in data traffic in a network, for instance, requires accurate time series, but not up to the minute refresh.

However, networks automation and the path to autonomous require datasets to be perpetually enriched and refreshed with real time data streams, enabling fast optimization or adaptation.

Telecoms networks are complex, composed of many domains, layers and network functions. While they are evolving towards cloud native technology, all networks have a certain amount of legacy, deployed in closed appliances, silos or monolithic virtual machines.

To function at scale in its mission of automation towards autonomous networks, AI needs a real time understanding of the network state, health, performance, across all domains and network functions. The faster data can be extracted and processed, the faster the feedback loop and the reaction or anticipation of network and demand events.

As AI applications scale, the network infrastructure must be able to handle increased data traffic without compromising performance. High-speed data transmission and low latency are key to maintaining scalability. For applications like autonomous vehicles, real-time fraud detection, and other AI-driven services, low latency ensures a seamless and responsive user experience. Data transmission speed and low latency are essential for the efficient and effective operation of AI-based network automation, enabling real-time processing, efficient data handling, improved performance, scalability, and enhanced user experience.

There are several elements that impact latency and data transmission in a telecom network. Among those is how fast traffic can be computed throughout the continuum.

To that end, AI-Native Telco networks have been rethinking the basic architecture and infrastructure necessary for the networking, compute and storage functions.

I will examine in the subsequent posts the evolution of compute, networking and storage functions to enable networks to evolve to an AI-Native architecture.


No comments: