Showing posts with label ML. Show all posts
Showing posts with label ML. Show all posts

Thursday, July 31, 2025

The Orchestrator Conundrum strikes again: Open RAN vs AI-RAN

10 years ago (?!) I wrote about the overlaps and potential conflicts of the different orchestration efforts between SDN and NFV. Essentially, observing that, ideally, it is desirable to orchestrate network resources with awareness of services and that service and resource orchestration should have hierarchical and prioritized interactions, so that a service deployment and lifecycle is managed within resource capacity and when that capacity fluctuates, priorities can be enforced.

Service orchestrators have not really been able to be successfully deployed at scale for a variety a reasons, but primarily due to the fact that this control point was identified early on as a strategic effort for network operators and traditional network vendors. A few network operators attempted to create an open source orchestration model (Open Source MANO), while traditional telco equipment vendors developed their own versions and refused to integrate their network functions with the competition. In the end, most of the actual implementation focused on Virtual Infrastructure Management (VIM) and vertical VNF management, while orchestration remained fairly proprietary per vendor. Ultimately, Cloud Native Network Functions appeared and were deployed in Kubernetes inheriting its native resource management and orchestration capabilities.

In the last couple of years, Open RAN has attempted to collapse RAN Element Management Systems (EMS), Self Organizing Networks (SON) and Operation Support Systems (OSS) with the concept of Service Management and Orchestration (SMO). Its aim is to ostensibly provide a control platform for RAN infrastructure and services in a multivendor environment. The non real time RAN Intelligent Controller (RIC) is one of its main artefacts, allowing the deployment of rApps designed to visualize, troubleshoot, provision, manage, optimize and predict RAN resources, capacity and capabilities.

This time around, the concept of SMO has gained substantial ground, mainly due to the fact that the leading traditional telco equipment manufacturers were not OSS / SON leaders and that Orchestration was an easy target for non RAN vendors wanting to find a greenfield opportunity. 

As we have seen, whether for MANO or SMO, the barriers to adoption weren't really technical but rather economic-commercial as leading vendors were trying to protect their business while growing into adjacent areas.

Recently, AI-RAN as emerged as an interesting initiative, positing that RAN compute would evolve from specialized, proprietary and closed to generic, open and disaggregated. Specifically, RAN compute could see an evolution, from specialized silicon to GPU. GPUs are able to handle the complex calculations necessary to manage a RAN workload, with spare capacity. Their cost, however, greatly outweighs their utility if used exclusively for RAN. Since GPUs are used in all sorts of high compute environments to facilitate Machine Learning, Artificial Intelligence, Large and Small Language Models, Models Training and inference, the idea emerged that if RAN deploys open generic compute, it could be used both for RAN workloads (AI for RAN), as well as workloads to optimize the RAN (AI on RAN and ultimately AI/ML workloads completely unrelated to RAN (AI and RAN).

While this could theoretically solve the business case of deploying costly GPUs in hundreds of thousands of cell site, provided that the compute idle capacity could be resold as GPUaaS or AIaaS, this poses new challenges from a service / infrastructure orchestration standpoint. AI RAN alliance is faced with understanding orchestration challenges between resources and AI workloads

In an open RAN environment. Near real time and non real time RICs deploy x and r Apps. The orchestration of the apps, services and resources is managed by the SMO. While not all App could be categorized as "AI", it is likely that SMO will take responsibility for AI for and on RAN orchestration. If AI and RAN requires its own orchestration beyond K8, it is unlikely that it will be in isolation from the SMO.

From my perspective, I believe that the multiple orchestration, policy management and enforcement points will not allow a multi vendor environment for the control plane. Architecture and interfaces are still in flux, specialty vendors will have trouble imposing their perspective without control of the end to end architecture. As a result, it is likely that the same vendor will provide SMO, non real time RIC and AI RAN orchestration functions (you know my feelings about near real time RIC)

If you make the Venn diagram of vendors providing / investing in all three, you will have a good idea of the direction the implementation will take.

Wednesday, April 16, 2025

Is AI-RAN the future of telco?

 AI-RAN has emerged recently as an interesting evolution of telecoms networks. The Radio Access Network (RAN) has been undergoing a transformation over the last 10 years, from a vertical, proprietary highly concentrated market segment to a disaggregated, virtualized, cloud native ecosystem.

Product of the maturation of a number of technologies, including telco cloudification, RAN virtualization and open RAN and lately AI/ML, AI-RAN has been positioned as a means to disaggregate and open up further the RAN infrastructure.

This latest development has to be examined from an economic standpoint. RAN accounts roughly for 80% of a telco deployment (excluding licenses, real estate...) costs. 80% of these costs are roughly attributable to the radios themselves and their electronics. The market is dominated by few vendors and telecom operators are exposed to substantial supply chain risks and reduced purchasing power.

The AI RAN alliance was created in 2024 to accelerate its adoption. It is led by network operators (T-Mobile, Softbank, Boost Mobile, KT, LG Uplus, SK Telecom...) telecom and IT vendors (Nvidia, arm, Nokia, Ericsson Samsung, Microsoft, Amdocs, Mavenir, Pure Storage, Fujitsu, Dell, HPE, Kyocera, NEC, Qualcomm, Red Hat, Supermicro, Toyota...).

If you are familiar with this blog, you already know of the evolution from RAN to cloud RAN and Open RAN, and more recently the forays into RAN intelligence with the early implementations of near and non real time RAN Intelligence Controller (RIC)

AI-RAN goes one step further in proposing that the specialized electronics and software traditionally embedded in RAN radios be deployed on high compute, GPU based commercial off the shelf servers and that these GPUs manage the complex RAN computation (beamforming management, spectrum and power optimization, waveform management...) and double as a general high compute environment for AI/ML applications that would benefit from deployment in the RAN (video surveillance, scene, object, biometrics recognition, augmented / virtual reality, real time digital twins...). It is very similar to the edge computing early market space.

The potential success of AI-RAN relies on a number of techno / economic assumptions:

For Operators:

  • It is desirable to be able to deploy RAN management, analytics, optimization, prediction, automation algorithms in a multivendor environment that will provide deterministic, programmable results.
  • Network operators will be able and willing to actively configure, manage and tune RAN parameters.
  • Deployment of AI-RAN infrastructure will be profitable (combination of compute costs being offloaded by cost reduction by optimization and new services opportunities).
  • AI-RAN power consumption, density, capacity, performance will exceed traditional architectures in time.
  • Network Operator will be able to accurately predict demand and deploy infrastructure in time and in the right locations to capture it.
  • Network Operators will be able to budget the CAPEX / OPEX associated with this investment before revenue materialization.
  • An ecosystem of vendors will develop that will reduce supply chain risks

For vendors:

  • RAN vendors will open their infrastructure and permit third parties to deploy AI applications.
  • RAN vendors will let operators and third parties program the RAN infrastructure.
  • There is sufficient market traction to productize AI-RAN.
  • The rate of development of AI and GPU technologies will outpace traditional architecture.
  • The cost of roadmap disruption and increased competition will be outweighed by the new revenues or is the cost to survive.
  • AI-RAN represents an opportunity for new vendors to emerge and focus on very specific aspects of the market demand without having to develop full stack solutions.

For customers:

  • There will be a market and demand for AI as a Service whereas enterprises and verticals will want to use a telco infrastructure that will provide unique computing and connectivity benefits over on-premise or public cloud solutions.
  • There are AI/ML services that (will) necessitate high performance computing environments, with guaranteed, programmable connectivity with a cost profile that is better mutualized through a multi tenant environment
  • Telcom operators are the best positioned to understand and satisfy the needs of this market
  • Security, privacy, residency, performance, reliability will be at least equivalent to on premise or cloud with a cost / performance benefit. 
As the market develops, new assumptions are added every day. The AI-RAN alliance has defined three general groups to create the framework to validate them: 
  1. AI for RAN: AI to improve RAN performance. This group focuses on how to program and optimize the RAN with AI. The expectations is that this work will drastically reduce the cost of RAN, while allowing sophisticated spectrum, radio waves and traffic manipulations for specific use cases.
  2. AI and RAN: Architecture to run AI and RAN on the same infrastructure. This group must find the multitenant architecture allowing the system to develop into a platform able to host a variety of AI workloads concurrently with the RAN. 
  3. AI on RAN: AI applications to run on RAN infrastructure. This is the most ambitious and speculative group, defining the requirements on the RAN to support the AI workloads that will be defined
As for Telco Edge Computing, and RAN intelligence, while the technological challenges appear formidable, the commercial and strategic implications are likely to dictate whether AI RAN will succeed. Telecom operators are pushing for its implementation, to increase control over spending, and user experience of the RAN, while possibly developing new revenue with the diffusion of AIaaS. Traditional RAN vendors see the nascent technology as further threat to their capacity to sell programmable networks as black boxes, configured, sold and operated by them. New vendors see the opportunity to step into the RAN market and carve out market share at the expense of legacy vendors.

Monday, August 26, 2024

of AI, automation, complex and complicated systems

 

I get drawn these days into discussions about the soft spot of AI. What is the best use of AI/ML, its utility in generative AI and its use in network automation, optimisation and autonomic functions.

In many cases, these discussions stumble upon misconceptions about the mechanics of statistics and their applications.

To put it simply, many do not distinguish between complexity and complication, which has a great effect on expectations of problem solving, automation and outcome prediction. A complex problem is an assembly of problems that can be broken down in subsets until simple unique problems can be identified, tagged, troubleshooted and resolved. These problems are ideal targets for automation. No matter how complex the task, if it can be broken down, if a method of procedure (mop) can be written for each subtask and eventually for the whole problem, it can be measured, automated, predicted and efficiency gains can be achieved.

Complicated problems are a different animal altogether. They might have sub task that can be identified and broken down, but other parts that have a large level of unknown and uncertainty.

Large Language Models can try to reduce the uncertainty by having larger samples, enabling even outlier patterns to emerge and be identified, but in many cases, complicated problems have dependencies that cannot be easily resolved from a pure mathematical standpoint.

This is where domain expertise comes in. In many cases, whenever issues arise in a telecoms network, it is not necessarily identified immediately from the source of the issue. Troubleshooting in many case requires knowledge of network topology, call flows, protocols, and multi domain expertise across core, transport, access, peering point, connectivity, data centers...

It is not possible to automate what you do not operate well. You cant operate well a system that you can't measure well and you can't measure well a system without a consolidated data storage and management strategy. In many cases, telco systems still produce logs in a proprietary format, on siloed systems and collecting, cleaning, exporting, processing, storing these data in a fully integrated data system is still in its infancy. This is however the very first step before even the categorization into complex or complicated issues can take place.

In many casse, data literacy need to pervade the entire organization to ensure that a data-driven strategy can be enacted, let alone moving to automation, autonomic or AI predictive systems. 

It becomes therefore very important to try and isolate complex from complicated systems and issues and try to apply as much data science and automation to the former, before trying to force AI/ML to the latter. As a rule of thumb, as the number of tasks or variables and the complexity increases, one can move from optimization, using scripting to automation, using scripting + ML, to prediction using AI / ML. As the number of unknowns and complication increases, one has to use subject matter experts and domain experts, to multi domain experts with end to end view of the system. 

As complications and tasks increase, the possibility to achieve autonomous systems decrease, as human expertise and manual intervention increase. Data science becomes less an operator than an attendant or an assistant to detect, automate the subset of tasks with identified outcome and patterns, accelerating resolution of the more complicated problem.

Thursday, August 8, 2024

The journey to automated and autonomous networks

 

The TM Forum has been instrumental in defining the journey towards automation and autonomous telco networks. 

As telco revenues from consumers continue to decline and the 5G promise to create connectivity products that enterprises, governments and large organizations will be able to discover, program and consume remains elusive, telecom operators are under tremendous pressure to maintain profitability.

The network evolution started with Software Defined Networks, Network Functions Virtualization and more recently Cloud Native evolution aims to deliver network programmability for the creation of innovative on-demand connectivity services. Many of these services require deterministic connectivity parameters in terms of availability, bandwidth, latency, which necessitate end to end cloud native fabric and separation of control and data plane. A centralized control of the cloud native functions allow to abstract resource and allocate them on demand as topology and demand evolve.

A benefit of a cloud native network is that, as software becomes more open and standardized in a multi vendor environment, many tasks that were either manual or relied on proprietary interfaces can now be automated at scale. As layers of software expose interfaces and APIs that can be discovered and managed by sophisticated orchestration systems, the network can evolve from manual, to assisted, to automated, to autonomous functions.


TM Forum defines 5 evolution stages from full manual operation to full autonomous networks.

  • Condition 0 - Manual operation and maintenance: The system delivers assisted monitoring capabilities, but all dynamic tasks must be 0 executed manually
  • Step 1 - Assisted operations and maintenance: The system executes a specific, repetitive subtask based on pre-configuration, which can be recorded online and traced, in order to increase execution efficiency.
  • Step 2: - Partial autonomous network: The system enables closed-loop operations and maintenance for specific units under certain external environments via statically configured rules.
  • Step 3 - Conditional autonomous network: The system senses real-time environmental changes and in certain network domains will optimize and adjust itself to the external environment to enable, closed-loop management via dynamically programmable policies.
  • Step 4 - Highly autonomous network: In a more complicated cross-domain environment, the system enables decision-making based on predictive analysis or active closed-loop management of service-driven and customer experience-driven networks via AI modeling and continuous learning.
  • Step 5 - Fully autonomous network: The system has closed-loop automation capabilities across multiple services, multiple domains (including partners’ domains) and the entire lifecycle via cognitive self-adaptation.
After describing the framework and conditions for the first 3 steps, the TM Forum has recently published a white paper describing the Level 4 industry blueprints.

The stated goals of level 4 are to enable the creation and roll out of new services within 1 week with deterministic SLAs and the delivery of Network as a service. Furthermore, this level should allow fewer personnel to manage the network (1000's of person-year) while reducing energy consumption and improving service availability.

These are certainly very ambitious objectives. The paper goes on to describe "high value scenarios" to guide level 4 development. This is where we start to see cognitive dissonance creeping in between the stated objectives and the methodology.  After all, much of what is described here exists today in cloud and enterprise environments and I wonder whether Telco is once again reinventing the wheel in trying to adapt / modify existing concepts and technologies that are already successful in other environments.

First, the creation of deterministic connectivity is not (only) the product of automation. Telco networks, in particular mobile networks are composed of a daisy chain of network elements that see customer traffic, signaling, data repository, look up, authentication, authorization, accounting, policy management functions being coordinated. On the mobile front, the signal effectiveness varies over time, as weather, power, demand, interferences, devices... impact the effective transmission. Furthermore, the load on the base station, the backhaul, the core network and the  internet peering point also vary over time and have an impact on its overall capacity. As you understand, creating a connectivity product with deterministic speed, latency capacity to enact Network as a Service requires a systemic approach. In a multi vendor environment, the RAN, the transport, the core must be virtualized, relying on solid fiber connectivity as much as possible to enable the capacity and speed. The low latency requires multiple computing points, all the way to the edge or on premise. The deterministic performance requires not only virtualization and orchestration of the RAN, but also the PON fiber and end to end slicing support and orchestration. This is something that I led at Telefonica with an open compute edge computing platform, a virtualized (XGS) PON on a ONF ONOS VOLTHA architecture with an open virtualized RAN. This was not automated yet, as most of these elements were advanced prototype at that stage, but the automation is the "easy" part once you have assembled the elements and operated them manually for enough time. The point here is that deterministic network performances is attainable but still a far objective for most operators and it is a necessary condition to enact NaaS, before even automation and autonomous networks.

Second, the high value scenarios described in the paper are all network-related. Ranging from network troubleshooting, to optimization and service assurance, these are all worthy objectives, but still do not feel "high value" in terms of creation of new services. While it is natural that automation first focuses on cost reduction for roll out, operation, maintenance, healing of network, one would have expected more ambitious "new services" description.

All in all, the vision is ambitious, but there is still much work to do in fleshing out the details and linking the promised benefits to concrete services beyond network optimization.

Wednesday, January 31, 2024

The AI-Native Telco Network

AI, and more particularly generative AI has been a big buzzword since the public launch of GTP. The promises of AI to automate and operate complex tasks and systems are pervading every industry and telecom is not impervious to it. 

Most telecom equipment vendors have started incorporating AI or brushed up their big data / analytics skills at least in their marketing positioning. 
We have even seen a few market acquisitions where AI / automation has been an important part of the investment narrative / thesis (HPE / Juniper Networks)
Concurrently, many startups are being founded or are pivoting towards AI /ML to take advantage of this investment cycle. 

In telecoms, there has been use for big data, machine learning, deep learning and other similar methods for a long time. I was leading such a project at Telefonica on 2016, using advanced prediction algorithms to detect alarming patterns, infer root cause analysis and suggest automated resolutions. 

While generative AI is somewhat new, the use of data to analyze, represent, predict network conditions is well known. 

AI in telecoms is starting to show some promises, particularly when it comes to network planning, operation, spectrum optimization, traffic prediction, and power efficiency. It comes with a lot of preconditions that are often glossed over by vendors and operators alike. 

Like all data dependent technologies, one has first to have the ability to collect, normalize, sanitize and clean data before storing it for useful analysis. In an environment as idiosyncratic as a telecoms network, this is not an easy task. Not only networks are composed of a mix of appliances, virtual machines and cloud native functions, they have had successive technological generations deployed along each other, with different data schema, protocols, interface, repository which makes the extraction arduous. After that step, normalization is necessary to ensure that the data is represented the same way, with the same attributes, headers, … so that it can be exploited. Most vendors have their proprietary data schemes or “augment” standard with “enhanced” headers and metadata. In many case the data need to be translated in a format that can be normalized for ingestion. The cleaning and sanitizing is necessary to ensure that redundant or outlying data points do not overweight the data set. As always, “garbage in / garbage out” is an important concept to keep in mind. 

These difficult steps are unfortunately not the only prerequisite for an AI native network. The part that is often overlooked is that the network has to be somewhat cloud native to take full advantage of AI. The automation in telecoms networks requires interfaces and APIs to be defined, open and available at every layer, from access to transport to the core, from the physical to the virtual and cloud native infrastructure. NFV, SDN, network disaggregation, open optical, open RAN, service based architecture, … are some of the components that can enable a network to take full advantage of AI. 
Cloud networks and data centers seem to be the first to adopt AI, both for the hosting of the voracious GPUs necessary to train the Large Language Models and for the resale / enablement of AI oriented companies. 

For that reason, the more recent greenfield networks that have been recently deployed with the state of the art cloud native technologies should be the prime candidates for AI / ML based network planning, deployment and optimization. The amount of work necessary for the integration and deployment of AI native functions is objectively much lower than their incumbent competitors. 
We haven’t really seen sufficient evidence that this level of cloud "nativeness" enables mass optimization and automation with AI/ML that would result in massive cost savings in at least OPEX, creating a unfair competitive advantage against their incumbents. 

As the industry approaches Mobile World Congress 2024, with companies poised to showcase their AI capabilities, it is crucial to remain cognizant of the necessary prerequisites for these technologies to deliver tangible benefits. Understanding the time and effort required for networks to truly benefit from AI is essential in assessing the realistic impact of these advancements in the telecom sector.

Thursday, August 10, 2023

What RICs and Apps developers need to succeed

 

We spoke a bit about my perspective on the Non and Near-Real Time RIC likely trajectories and what value rApps and xApps have for operators and the industry. As I conclude the production of my report and workshop on Open RAN RICs and Apps, after many discussions with the leaders in that field, I have come with a few conclusions.

There are many parameters for a company to be successful in telecoms, and in the RIC and Apps area, there are at least three key skill sets that are necessary to make it.

Artificial Intelligence is a popular term many in the industry use as a shorthand for their excel macros linear projection and forecast mastery. Data literacy is crucial here, as big data / machine learning / deep learning / artificial intelligence terms are bandied around for marketing purposes. I am not an expert in the matter, but I have a strong feeling that the use cases for algorithmic fall into a few categories. I will try to expose them in my terms, apologies in advance to the specialists as the explanation will be basic and profane.

  • Anomaly / pattern detection provide a useful alarming system if the system's behavior has a sufficiently long time series and the variance is somewhat reduced or predictable. This does not require more than data knowledge, it is a math problem.
  • Optimization / correction should allow, provided the anomaly / pattern detection is accurate to pinpoint specific actions that would allow a specific outcome. This where RAN knowledge is necessary. It is crucial to be able to identify from the inputs whether the output is accurate and to which element it corresponds. Again, a long time series of corrections / optimizations and their impact / deviation is necessary for the model to be efficient.
  • Prediction / automation is the trickiest part. Ideally, given enough knowledge of the system's patterns, variances and deviations, one can predict with some accuracy its behavior over time in steady state and when anomalies occur and take a preemptive /corrective action. Drawn to its logical conclusion, full automation and autonomy would be possible. This is where most companies overpromise in my mind. The system here is a network. Not only is it vast and composed of millions of elements (after all that is just a computing issue), it is also always changing. Which means that there no steady state and that the time series is a collection of dynamically changing patterns. Achieving full automation under these conditions seems impossible. Therefore, it is necessary to reframe expectations, especially in a multi vendor environment and to settle for pockets of automation, with AI/ML augmented limited automation.

Platform and developer ecosystem management is also extremely important in the RIC and Apps segment if one wants to deploy multi vendor solutions. The dream of being able to instantiate Apps from different vendors and orchestrate them harmoniously is impossible without a rich platform, with many platform services attributes (lifecycle management, APIs, SDK, Data / messaging bus, orchestration...). This does not necessarily require much RAN knowledge and this why we are seeing many new entrants in this field.

The last, but foremost, in my mind, is the RAN knowledge. The companies developing RAN Intelligent Controllers and apps need to have deep understanding of the RAN, its workings and evolution. Deep knowledge may probably not necessary for the most pedestrian use cases around observability and representation of the health and performance of the system or the network, but any App that would expect a retro feedback and to send instruction to the lower elements of the architecture needs understanding of not only of the interfaces, protocols and elements but also their function, interworking and capabilities. If the concept of RICs and Apps is to be successful, several Apps will need to be able to run simultaneously and ideally from different vendors. Understanding the real-life consequences of an energy efficiency App and its impact on quality of service, quality of experience, signaling is key in absolute. It becomes even more crucial to understand how Apps can coexist and simultaneously, or by priority implement power efficiency, spectrum optimization and handover optimization for instance. The intricacies of beamforming, beam weight, beam steering in mMIMO systems, together with carrier aggregation and dynamic spectrum sharing mandate a near / real time control capability. The balance is delicate and it is unlikely that scheduler priorities could conceivably be affected by an rApp that has little understanding of these problematics. You don't drive a formula one car while messing about the gear settings.

If you want to know how I rank the market leaders in each of these categories, including Accelleran, Aira technologies, Airhop, Airspan, Capgemini, Cohere technologies, Ericsson, IS - Wireless, Fujitsu, Juniper, Mavenir, Nokia, Northeastern University, NTT DOCOMO, Parallel Wireless, Radisys, Rakuten, Rimedo Labs, Samsung, VIAVI, VMware and others, you'll have to read my report or register for my workshop.

Wednesday, June 21, 2023

Near real time RIC and xApps market considerations

An extract from my upcoming report "Open RAN RIC and Apps 2023"   


As mentioned, near real time RIC and xApps capabilities are today embedded in gNodeB and RU/CU/DU code. The constraints of developing applications that have an actual effect in milliseconds on the RAN offer two main challenges, one technical, the second commercial.

The technical challenge associated with the development and roll out of xApps and the near real time RIC itself is related to the RAN Scheduler. The RAN scheduler, within the radio architecture is extremely processing intensive and is responsible, among other operations of the real time the uplink and downlink radio decoding and encoding.

Running concurrently with the L1/Phy, RLC and running on the MAC layer, the scheduler reads data from the upstream RLC and transmits to the downstream PHY. The scheduler effectively determines the number of bytes to transmit to each UE in real time.

Since the scheduler is in essence a real time forwarding engine, it is instantiated in the DU and the fronthaul connectivity should have less than 1ms latency to the RU. This stringent latency envelope requires extremely tight integration between the DU, the RU and the near real time RIC (and its associated xApps). While theoretically functionally feasible, the level of integration between all these vendors necessary to realize xApp with the appropriate level of control and performance is generally not there.

The vendors, naturally, first prioritize integration between their own products and in this case, the DU vendors are in control of that value chain.

Understanding that today, there is a very limited number of DU vendors, who are all in the process of realizing the O-RAN first generation implementation and integration, and understanding that all their resources are mobilized on commercial deployments where the priority is the functional, stable and performing implementation of RU, CU and DU, it is not a surprise that we do not see much multi vendor activity on near real time RIC, xApps integration with real RU and CU DU.

 

While we have several examples of trials with either non MIMO CU DU RU or proof of concepts with RU, CU, DU emulators, we are still far from real end to end deployment even in trial situation of a an end to end implementation close to commercial grade.

The second impediment to near real time RIC xApp multi vendor implementation is commercial and can be found in the report.

Thursday, July 19, 2018

How Telefonica uses AI / ML to connect the unconnected



This presentation details how Telefonica has been using data science to systematically identify and locate the unconnected and to evolve its networks and operations to sustainably bring connectivity to the most parts of Latin America.

The project Internet para Todos is Telefonica's flagship program to connect the unconnected in LatAm. There are today more than 100 million people who live outside of reliable internet connectivity in the Telefonica footprint. The reasons are multiple, ranging from geography, population density and socio-economical conditions.

Fixed and mobile networks have historically been designed for maximum efficiency in dense, urban environments. Deploying these technologies in remote, low density, rural areas is possible but inefficient, which challenges the financial sustainability of the model.

To deliver internet in these environments in a sustainable manner, it is necessary to increase efficiency through systematic cost reduction, investment optimization and targeted deployments.

Systematic optimization necessitates continuous measurement of the financial, operational, technological and organizational data sets.

1.Finding the unconnected
The first challenge the team had to tackle was to understand how many unconnected there are and where. The data set was scarce and incomplete, census was old and population had much mobility. In this case, the team used high definition satellite imagery at the scale of the country and used neural network models, coupled with census data as training. Implementing visual machine learning algorithms, the model literally counted each house and each settlement at the scale of the country. The model was then enriched with crossed reference coverage data from regulatory source, as well as Telefonica proprietary data set consisting of geolocalized data sessions and deployment maps. The result is a model with a visual representation, providing a map of the population dispersion, with superimposed coverage polygons, allowing to count and localize the unconnected populations with good accuracy (95% of the population with less than 3% false positive and less than 240 meters deviation in the location of antennas).
2. Optimizing transport
Transport networks are the most expensive part of deploying connectivity to remote areas. Optimizing transport route has a huge impact on the sustainability of a network. This is why the team selected this task as the next challenge to tackle.
The team started with adding road and infrastructure data to the model form public sources, and used graph generation to cluster population settlements. Graph analysis (shortest path, Steiner tree) yielded population density-optimized transport routes.

3. AI to optimize network operations
To connect very remote zones, optimizing operations and minimizing maintenance and upgrade is key to a sustainable operational model. This line of work is probably the most ambitious for the team. When it can take 3 hours by plane and 4 days by boat to reach some locations, being able to make sure you can detect, or better, predict if / when you need to perform maintenance on your infrastructure. Equally important is how your devise your routes so that you are as efficient as possible. In this case, the team built a neural network trained with historical failure analysis and fed with network metrics to provide a model capable of supervising the network health in an automated manner, with prediction of possible failure and optimized maintenance route.
I think that the type of data driven approach to complex problem solving demonstrated in this project is the key to network operators' sustainability in the future. 

It is not only a rural problem, it is necessary to increase efficiency and optimize deployment and operations to keep decreasing the costs.