A Manufacturing-Type NetworKit 11.2.1 Coding Tutorial for Giant-Scale Graph Analytics, Communities, Cores, and Sparsification

On this tutorial, we implement a production-grade, large-scale graph analytics pipeline in NetworKit, specializing in velocity, reminiscence effectivity, and version-safe APIs in NetworKit 11.2.1. We generate a large-scale free community, extract the biggest linked element, after which compute structural spine alerts through k-core decomposition and centrality rating. We additionally detect communities with PLM and quantify high quality utilizing modularity; estimate distance construction utilizing efficient and estimated diameters; and, lastly, sparsify the graph to scale back price whereas preserving key properties. We export the sparsified graph as an edgelist so we will reuse it in downstream workflows, benchmarking, or graph ML preprocessing.

!pip -q set up networkit pandas numpy psutil


import gc, time, os
import numpy as np
import pandas as pd
import psutil
import networkit as nk


print("NetworKit:", nk.__version__)
nk.setNumberOfThreads(min(2, nk.getMaxNumberOfThreads()))
nk.setSeed(7, False)


def ram_gb():
   p = psutil.Course of(os.getpid())
   return p.memory_info().rss / (1024**3)


def tic():
   return time.perf_counter()


def toc(t0, msg):
   print(f"{msg}: {time.perf_counter()-t0:.3f}s | RAM~{ram_gb():.2f} GB")


def report(G, identify):
   print(f"n[{name}] nodes={G.numberOfNodes():,} edges={G.numberOfEdges():,} directed={G.isDirected()} weighted={G.isWeighted()}")


def force_cleanup():
   gc.acquire()


PRESET = "LARGE"


if PRESET == "LARGE":
   N = 120_000
   M_ATTACH = 6
   AB_EPS = 0.12
   ED_RATIO = 0.9
elif PRESET == "XL":
   N = 250_000
   M_ATTACH = 6
   AB_EPS = 0.15
   ED_RATIO = 0.9
else:
   N = 80_000
   M_ATTACH = 6
   AB_EPS = 0.10
   ED_RATIO = 0.9


print(f"nPreset={PRESET} | N={N:,} | m={M_ATTACH} | approx-betweenness epsilon={AB_EPS}")

We arrange the Colab atmosphere with NetworKit and monitoring utilities, and we lock in a secure random seed. We configure thread utilization to match the runtime and outline timing and RAM-tracking helpers for every main stage. We select a scale preset that controls graph dimension and approximation knobs so the pipeline stays giant however manageable.

t0 = tic()
G = nk.mills.BarabasiAlbertGenerator(M_ATTACH, N).generate()
toc(t0, "Generated BA graph")
report(G, "G")


t0 = tic()
cc = nk.elements.ConnectedComponents(G)
cc.run()
toc(t0, "ConnectedComponents")
print("elements:", cc.numberOfComponents())


if cc.numberOfComponents() > 1:
   t0 = tic()
   G = nk.graphtools.extractLargestConnectedComponent(G, compactGraph=True)
   toc(t0, "Extracted LCC (compactGraph=True)")
   report(G, "LCC")


force_cleanup()

We generate a big Barabási–Albert graph and instantly log its dimension and runtime footprint. We compute linked elements to know fragmentation and shortly diagnose topology. We extract the biggest linked element and compact it to enhance the remainder of the pipeline’s efficiency and reliability.

t0 = tic()
core = nk.centrality.CoreDecomposition(G)
core.run()
toc(t0, "CoreDecomposition")
core_vals = np.array(core.scores(), dtype=np.int32)
print("degeneracy (max core):", int(core_vals.max()))
print("core stats:", pd.Sequence(core_vals).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())


k_thr = int(np.percentile(core_vals, 97))


t0 = tic()
nodes_backbone = [u for u in range(G.numberOfNodes()) if core_vals[u] >= k_thr]
G_backbone = nk.graphtools.subgraphFromNodes(G, nodes_backbone)
toc(t0, f"Spine subgraph (ok>={k_thr})")
report(G_backbone, "Spine")


force_cleanup()


t0 = tic()
pr = nk.centrality.PageRank(G, damp=0.85, tol=1e-8)
pr.run()
toc(t0, "PageRank")


pr_scores = np.array(pr.scores(), dtype=np.float64)
top_pr = np.argsort(-pr_scores)[:15]
print("Prime PageRank nodes:", top_pr.tolist())
print("Prime PageRank scores:", pr_scores[top_pr].tolist())


t0 = tic()
abw = nk.centrality.ApproxBetweenness(G, epsilon=AB_EPS)
abw.run()
toc(t0, "ApproxBetweenness")


abw_scores = np.array(abw.scores(), dtype=np.float64)
top_abw = np.argsort(-abw_scores)[:15]
print("Prime ApproxBetweenness nodes:", top_abw.tolist())
print("Prime ApproxBetweenness scores:", abw_scores[top_abw].tolist())


force_cleanup()

We compute the core decomposition to measure degeneracy and determine the community’s high-density spine. We extract a spine subgraph utilizing a excessive core-percentile threshold to give attention to structurally vital nodes. We run PageRank and approximate betweenness to rank nodes by affect and bridge-like conduct at scale.

t0 = tic()
plm = nk.group.PLM(G, refine=True, gamma=1.0, par="balanced")
plm.run()
toc(t0, "PLM group detection")


half = plm.getPartition()
num_comms = half.numberOfSubsets()
print("communities:", num_comms)


t0 = tic()
Q = nk.group.Modularity().getQuality(half, G)
toc(t0, "Modularity")
print("modularity Q:", Q)


sizes = np.array(listing(half.subsetSizeMap().values()), dtype=np.int64)
print("group dimension stats:", pd.Sequence(sizes).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())


t0 = tic()
eff = nk.distance.EffectiveDiameter(G, ED_RATIO)
eff.run()
toc(t0, f"EffectiveDiameter (ratio={ED_RATIO})")
print("efficient diameter:", eff.getEffectiveDiameter())


t0 = tic()
diam = nk.distance.EstimatedDiameter(G)
diam.run()
toc(t0, "EstimatedDiameter")
print("estimated diameter:", diam.getDiameter().distance)


force_cleanup()

We detect communities utilizing PLM and file the variety of communities discovered on the big graph. We compute modularity and summarize community-size statistics to validate the construction reasonably than merely trusting the partition. We estimate world distance conduct utilizing efficient diameter and estimated diameter in an API-safe means for NetworKit 11.2.1.

t0 = tic()
sp = nk.sparsification.LocalSimilaritySparsifier(G, 0.7)
G_sparse = sp.getSparsifiedGraph()
toc(t0, "LocalSimilarity sparsification (alpha=0.7)")
report(G_sparse, "Sparse")


t0 = tic()
pr2 = nk.centrality.PageRank(G_sparse, damp=0.85, tol=1e-8)
pr2.run()
toc(t0, "PageRank on sparse")
pr2_scores = np.array(pr2.scores(), dtype=np.float64)
print("Prime PR nodes (sparse):", np.argsort(-pr2_scores)[:15].tolist())


t0 = tic()
plm2 = nk.group.PLM(G_sparse, refine=True, gamma=1.0, par="balanced")
plm2.run()
toc(t0, "PLM on sparse")
part2 = plm2.getPartition()
Q2 = nk.group.Modularity().getQuality(part2, G_sparse)
print("communities (sparse):", part2.numberOfSubsets(), "| modularity (sparse):", Q2)


t0 = tic()
eff2 = nk.distance.EffectiveDiameter(G_sparse, ED_RATIO)
eff2.run()
toc(t0, "EffectiveDiameter on sparse")
print("efficient diameter (orig):", eff.getEffectiveDiameter(), "| (sparse):", eff2.getEffectiveDiameter())


force_cleanup()


out_path = "/content material/networkit_large_sparse.edgelist"
t0 = tic()
nk.graphio.EdgeListWriter("t", 0).write(G_sparse, out_path)
toc(t0, "Wrote edge listing")
print("Saved:", out_path)


print("nAdvanced large-graph pipeline full.")

We sparsify the graph utilizing native similarity to scale back the variety of edges whereas retaining helpful construction for downstream analytics. We rerun PageRank, PLM, and efficient diameter on the sparsified graph to test whether or not key alerts stay constant. We export the sparsified graph as an edgelist so we will reuse it throughout classes, instruments, or extra experiments.

In conclusion, we developed an end-to-end, scalable NetworKit workflow that mirrors actual large-network evaluation: we began from era, stabilized the topology with LCC extraction, characterised the construction by way of cores and centralities, found communities and validated them with modularity, and captured world distance conduct by way of diameter estimates. We then utilized sparsification to shrink the graph whereas retaining it analytically significant and saving it for repeatable pipelines. The tutorial gives a sensible template we will reuse for actual datasets by changing the generator with an edgelist reader, whereas retaining the identical evaluation levels, efficiency monitoring, and export steps.

Take a look at the Full Codes right here. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 120k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be part of us on telegram as effectively.

Sample Page Title

Related Articles

UAE to speed up oil pipeline challenge to bypass Strait of Hormuz | Oil and Fuel Information

How the US Crypto Framework Stacks Up In opposition to MiCA, MAS, and VARA

My High Canadian Dividend Shares You will Need to Personal Ceaselessly

LEAVE A REPLY Cancel reply

Latest Articles

UAE to speed up oil pipeline challenge to bypass Strait of Hormuz | Oil and Fuel Information

How the US Crypto Framework Stacks Up In opposition to MiCA, MAS, and VARA

My High Canadian Dividend Shares You will Need to Personal Ceaselessly

The Finest Commencement Speech Is One No person Remembers

Agency Noticed Dumping AAVE After Brutal 55% Drawdown

EDITOR PICKS

UAE to speed up oil pipeline challenge to bypass Strait of...

How the US Crypto Framework Stacks Up In opposition to MiCA,...

My High Canadian Dividend Shares You will Need to Personal Ceaselessly

POPULAR POSTS

Qubic’s Mining Pool Attacking Monero Falls Beneath Assault

Feedback on the brand new buying and selling dialog in Metatrader...

What’s nano-texture glass and do I would like it?

POPULAR CATEGORY