Why Microservices Are the Next Step for Software-Defined Networks

During my Emerging Network Technologies Master’s class at KFUPM, I had the opportunity to present some really fascinating research on the future of network design. The paper titled Orchestrating Microservice-based SDN Controllers: the MSN Realistic Use Case (Herrera et al.) takes a hard look at a major bottleneck in how we currently deploy software-defined networks. Here is a detailed breakdown of the problem, the math behind the solution, and why it matters.

The Monolithic Bottleneck in SDN

Software-Defined Networking (SDN) is great because it separates the physical switches (the data plane) from the control plane. But if you look at the software architecture of these controllers, it usually consists of two main layers: a runtime on the bottom, and the application plane on top. The issue is that this application plane is still treated as a giant monolithic block.

Traditionally, if you want to run network applications like firewalls, routing, or service discovery, you have to add them directly to the controller’s codebase. This generates a single massive software artifact that must be installed and replicated across every single piece of SDN hardware on the network. It is heavy, rigid, and inefficient.

Moving to Microservices (The MSN Framework)

The obvious fix is to treat network applications as loosely coupled microservices. Using an approach called MSN, the controller runtime (the Event Manager) is totally decoupled from the applications themselves. If you need a firewall in one specific zone, you just deploy that one microservice instance rather than copying the whole monolithic stack everywhere.

This gives us amazing flexibility. You can even run completely different, or incompatible, network applications across different controllers on the same network.

The Catch: Orchestration is NP-Hard

But doing this in practice is messy. If you break your application plane into dozens of microservices, how do you know exactly where to put them so you don’t cause massive network latency?

This actually combines two NP-hard problems: distributing the computation across decentralized hardware, and optimizing the traffic routing.

To answer that, the paper proposes an orchestration framework named Grex. It acts as a bridge between the virtualization layer and the physical hardware. The core of Grex is a Mixed-Integer Linear Programming (MILP) optimizer. Its whole job is to figure out the perfect placement to minimize the average latency of all the traffic flows. It calculates this by combining the physical path latency with the control latency, which is the extra time it takes to divert traffic to the controller hosting the specific microservice.

The math also has to respect hard constraints. It ensures it doesn’t exceed the processing limits of the controller hardware or max out the bandwidth capacities of the network links.

The Testbed Results

What I really liked about this research is that they didn’t just rely on theoretical simulations. They built a real testbed using a 32-core AMD machine running the Mininet emulator and the Ryu SDN framework. They modeled their network topology on a section of the real-world GARR network.

They pushed 40 different traffic flows through the network across four different load matrices, scaling from light to heavy traffic. The MILP model turned out to be highly accurate. The average difference between the simulated round-trip time and the actual testbed measurements was only 18 milliseconds. It also handled bandwidth well, showing a tiny mean difference of just -1.01 Mbit/s between the expected and actual capacities.

In terms of processing time, it took the system about 27 seconds to optimize the lightest traffic matrix, and around 98 seconds for the heaviest, most complex load. Since real-world network management updates usually happen on scheduled cycles—like every few minutes or hours—taking a minute and a half to process the math is completely acceptable for deployment.

Final Thoughts

One important limitation to keep in mind is that Grex assumes the classic Controller Placement Problem (deciding where the physical hardware actually goes) is already solved. Grex just dictates where the software goes on top of that established hardware.

As edge computing and networks become more distributed, we won’t be able to rely on monolithic designs. While the MILP math takes about 90 seconds right now, future work will likely focus on faster heuristic algorithms so this can be applied to massive infrastructures. But overall, frameworks like Grex show that we can actually balance the flexibility of microservices with strict network latency requirements.

References

J. L. Herrera, D. Scotece, J. Galán-Jiménez, J. Berrocal, G. Di Modica, P. Bellavista, and L. Foschini, “Orchestrating Microservice-based SDN Controllers: the MSN Realistic Use Case.”