We implement a sophisticated, end-to-end Kornia tutorial and reveal how fashionable, differentiable pc imaginative and prescient might be constructed solely in PyTorch. We begin by setting up GPU-accelerated, synchronized augmentation pipelines for photos, masks, and keypoints, then transfer into differentiable geometry by optimizing a homography instantly by means of gradient descent. We additionally present how realized characteristic matching with LoFTR integrates with Kornia’s RANSAC to estimate sturdy homographies and produce a easy stitched output, even below constrained or offline-safe circumstances. Lastly, we floor these concepts in follow by coaching a light-weight CNN on CIFAR-10 utilizing Kornia’s GPU augmentations, highlighting how research-grade imaginative and prescient pipelines translate naturally into studying methods. Try the FULL CODES right here.
import os, math, time, random, urllib.request
from dataclasses import dataclass
from typing import Tuple
import sys, subprocess
def pip_install(pkgs):
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)
pip_install([
"kornia==0.8.2",
"torch",
"torchvision",
"matplotlib",
"numpy",
"opencv-python-headless"
])
import numpy as np
import torch
import torch.nn as nn
import torch.nn.useful as F
import torchvision
import torchvision.transforms.useful as TF
import matplotlib.pyplot as plt
import cv2
import kornia
import kornia.augmentation as Okay
import kornia.geometry.remodel as KG
from kornia.geometry.ransac import RANSAC
from kornia.characteristic import LoFTR
torch.manual_seed(0)
np.random.seed(0)
random.seed(0)
print("Torch:", torch.__version__)
print("Kornia:", kornia.__version__)
print("System:", gadget)We start by organising a totally reproducible surroundings, putting in Kornia and its core dependencies to make sure GPU-accelerated, differentiable pc imaginative and prescient runs easily in Google Colab. We then import and manage PyTorch, Kornia, and supporting libraries, establishing a clear basis for geometry, augmentation, and feature-matching workflows. We set the random seed and choose the out there compute gadget so that every one subsequent experiments stay deterministic, debuggable, and performance-aware. Try the FULL CODES right here.
def to_tensor_img_uint8(img_bgr_uint8: np.ndarray) -> torch.Tensor:
img_rgb = cv2.cvtColor(img_bgr_uint8, cv2.COLOR_BGR2RGB)
t = torch.from_numpy(img_rgb).permute(2, 0, 1).float() / 255.0
return t.unsqueeze(0)
def present(img_t: torch.Tensor, title: str = "", max_size: int = 900):
x = img_t.detach().float().cpu().clamp(0, 1)
if x.form[1] == 1:
x = x.repeat(1, 3, 1, 1)
x = x[0].permute(1, 2, 0).numpy()
h, w = x.form[:2]
scale = min(1.0, max_size / max(h, w))
if scale < 1.0:
x = cv2.resize(x, (int(w * scale), int(h * scale)), interpolation=cv2.INTER_AREA)
plt.determine(figsize=(7, 5))
plt.imshow(x)
plt.axis("off")
plt.title(title)
plt.present()
def show_mask(mask_t: torch.Tensor, title: str = ""):
x = mask_t.detach().float().cpu().clamp(0, 1)[0, 0].numpy()
plt.determine(figsize=(6, 4))
plt.imshow(x)
plt.axis("off")
plt.title(title)
plt.present()
def obtain(url: str, path: str):
os.makedirs(os.path.dirname(path), exist_ok=True)
if not os.path.exists(path):
urllib.request.urlretrieve(url, path)
def safe_download(url: str, path: str) -> bool:
attempt:
os.makedirs(os.path.dirname(path), exist_ok=True)
if not os.path.exists(path):
urllib.request.urlretrieve(url, path)
return True
besides Exception as e:
print("Obtain failed:", e)
return False
def make_grid_mask(h: int, w: int, cell: int = 32) -> torch.Tensor:
yy, xx = torch.meshgrid(torch.arange(h), torch.arange(w), indexing="ij")
m = (((yy // cell) % 2) ^ ((xx // cell) % 2)).float()
return m.unsqueeze(0).unsqueeze(0)
def draw_matches(img0_rgb: np.ndarray, img1_rgb: np.ndarray, pts0: np.ndarray, pts1: np.ndarray, max_draw: int = 200) -> np.ndarray:
h0, w0 = img0_rgb.form[:2]
h1, w1 = img1_rgb.form[:2]
out = np.zeros((max(h0, h1), w0 + w1, 3), dtype=np.uint8)
out[:h0, :w0] = img0_rgb
out[:h1, w0:w0+w1] = img1_rgb
n = min(len(pts0), len(pts1), max_draw)
if n == 0:
return out
idx = np.random.alternative(len(pts0), measurement=n, exchange=False) if len(pts0) > n else np.arange(n)
for i in idx:
x0, y0 = pts0[i]
x1, y1 = pts1[i]
x1_shift = x1 + w0
p0 = (int(spherical(x0)), int(spherical(y0)))
p1 = (int(spherical(x1_shift)), int(spherical(y1)))
cv2.circle(out, p0, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA)
cv2.circle(out, p1, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA)
cv2.line(out, p0, p1, (255, 255, 255), 1, lineType=cv2.LINE_AA)
return out
def normalize_img_for_loftr(img_rgb01: torch.Tensor) -> torch.Tensor:
if img_rgb01.form[1] == 3:
return kornia.shade.rgb_to_grayscale(img_rgb01)
return img_rgb01We outline a set of reusable helper utilities for picture conversion, visualization, protected information downloading, and artificial masks era, retaining the imaginative and prescient pipeline clear and modular. We additionally implement sturdy visualization and matching helpers that permit us to examine augmented photos, masks, and LoFTR correspondences instantly throughout experimentation. We normalize picture inputs to the precise tensor codecs anticipated by Kornia and LoFTR, guaranteeing that every one downstream geometry and feature-matching parts function constantly and appropriately. Try the FULL CODES right here.
print("n[1] Differentiable augmentations: picture + masks + keypoints")
B, C, H, W = 1, 3, 256, 384
img = torch.rand(B, C, H, W, gadget=gadget)
masks = make_grid_mask(H, W, cell=24).to(gadget)
kps = torch.tensor([[
[40.0, 40.0],
[W - 50.0, 50.0],
[W * 0.6, H * 0.8],
[W * 0.25, H * 0.65],
]], gadget=gadget)
aug = Okay.AugmentationSequential(
Okay.RandomResizedCrop((224, 224), scale=(0.6, 1.0), ratio=(0.8, 1.25), p=1.0),
Okay.RandomHorizontalFlip(p=0.5),
Okay.RandomRotation(levels=18.0, p=0.7),
Okay.ColorJiggle(0.2, 0.2, 0.2, 0.1, p=0.8),
data_keys=["input", "mask", "keypoints"],
same_on_batch=True
).to(gadget)
img_aug, mask_aug, kps_aug = aug(img, masks, kps)
print("picture:", tuple(img.form), "->", tuple(img_aug.form))
print("masks :", tuple(masks.form), "->", tuple(mask_aug.form))
print("kps :", tuple(kps.form), "->", tuple(kps_aug.form))
print("Instance keypoints (earlier than -> after):")
print(torch.cat([kps[0], kps_aug[0]], dim=1))
present(img, "Authentic (artificial)")
show_mask(masks, "Authentic masks (artificial)")
present(img_aug, "Augmented (synced)")
show_mask(mask_aug, "Augmented masks (synced)")We assemble a synchronized, totally differentiable augmentation pipeline that applies the identical geometric transformations to pictures, masks, and keypoints on the GPU. We generate artificial information to obviously reveal how spatial consistency is preserved throughout modalities whereas nonetheless introducing life like variability by means of cropping, rotation, flipping, and shade jitter. We visualize the before-and-after outcomes to confirm that the augmented photos, segmentation masks, and keypoints stay completely aligned after transformation. Try the FULL CODES right here.
print("n[2] Differentiable homography alignment by optimization")
base = torch.rand(1, 1, 240, 320, gadget=gadget)
present(base, "Base picture (grayscale)")
true_H_px = torch.eye(3, gadget=gadget).unsqueeze(0)
true_H_px[:, 0, 2] = 18.0
true_H_px[:, 1, 2] = -12.0
true_H_px[:, 0, 1] = 0.03
true_H_px[:, 1, 0] = -0.02
true_H_px[:, 2, 0] = 1e-4
true_H_px[:, 2, 1] = -8e-5
goal = KG.warp_perspective(base, true_H_px, dsize=(base.form[-2], base.form[-1]), align_corners=True)
present(goal, "Goal (base warped by true homography)")
p = torch.zeros(1, 8, gadget=gadget, requires_grad=True)
def params_to_H(p8: torch.Tensor) -> torch.Tensor:
Bp = p8.form[0]
Hm = torch.eye(3, gadget=p8.gadget).unsqueeze(0).repeat(Bp, 1, 1)
Hm[:, 0, 0] = 1.0 + p8[:, 0]
Hm[:, 0, 1] = p8[:, 1]
Hm[:, 0, 2] = p8[:, 2]
Hm[:, 1, 0] = p8[:, 3]
Hm[:, 1, 1] = 1.0 + p8[:, 4]
Hm[:, 1, 2] = p8[:, 5]
Hm[:, 2, 0] = p8[:, 6]
Hm[:, 2, 1] = p8[:, 7]
return Hm
choose = torch.optim.Adam([p], lr=0.08)
losses = []
for step in vary(120):
choose.zero_grad(set_to_none=True)
H_est = params_to_H(p)
pred = KG.warp_perspective(base, H_est, dsize=(base.form[-2], base.form[-1]), align_corners=True)
loss_photo = (pred - goal).abs().imply()
loss_reg = 1e-3 * (p ** 2).imply()
loss = loss_photo + loss_reg
loss.backward()
choose.step()
losses.append(loss.merchandise())
print("Last loss:", losses[-1])
plt.determine(figsize=(6,4))
plt.plot(losses)
plt.title("Homography optimization loss")
plt.xlabel("step")
plt.ylabel("loss")
plt.present()
H_est_final = params_to_H(p.detach())
pred_final = KG.warp_perspective(base, H_est_final, dsize=(base.form[-2], base.form[-1]), align_corners=True)
present(pred_final, "Recovered warp (optimized)")
present((pred_final - goal).abs(), "Abs error (recovered vs goal)")
print("True H (pixel):n", true_H_px.squeeze(0).detach().cpu().numpy())
print("Est H:n", H_est_final.squeeze(0).detach().cpu().numpy())We reveal that geometric alignment might be handled as a differentiable optimization drawback by instantly recovering a homography by way of gradient descent. We first generate a goal picture by warping a base picture with a identified homography after which be taught the transformation parameters by minimizing a photometric reconstruction loss with regularization. Additionally, we visualize the optimized warp and error map to substantiate that the estimated homography intently matches the ground-truth transformation. Try the FULL CODES right here.
print("n[3] LoFTR matching + RANSAC homography + stitching (403-safe)")
data_dir = "/content material/kornia_demo"
os.makedirs(data_dir, exist_ok=True)
img0_path = os.path.be part of(data_dir, "img0.png")
img1_path = os.path.be part of(data_dir, "img1.png")
ok0 = safe_download(
"https://uncooked.githubusercontent.com/opencv/opencv/grasp/samples/information/graf1.png",
img0_path
)
ok1 = safe_download(
"https://uncooked.githubusercontent.com/opencv/opencv/grasp/samples/information/graf3.png",
img1_path
)
if not (ok0 and ok1):
print("⚠️ Utilizing artificial fallback photos (no community / blocked downloads)")
base_rgb = torch.rand(1, 3, 480, 640, gadget=gadget)
H_syn = torch.tensor([[
[1.0, 0.05, 40.0],
[-0.03, 1.0, 25.0],
[1e-4, -8e-5, 1.0]
]], gadget=gadget)
t0 = base_rgb
t1 = KG.warp_perspective(base_rgb, H_syn, dsize=(480, 640), align_corners=True)
img0_rgb = (t0[0].permute(1,2,0).detach().cpu().numpy() * 255).astype(np.uint8)
img1_rgb = (t1[0].permute(1,2,0).detach().cpu().numpy() * 255).astype(np.uint8)
else:
img0_bgr = cv2.imread(img0_path, cv2.IMREAD_COLOR)
img1_bgr = cv2.imread(img1_path, cv2.IMREAD_COLOR)
if img0_bgr is None or img1_bgr is None:
elevate RuntimeError("Did not load downloaded photos.")
img0_rgb = cv2.cvtColor(img0_bgr, cv2.COLOR_BGR2RGB)
img1_rgb = cv2.cvtColor(img1_bgr, cv2.COLOR_BGR2RGB)
t0 = to_tensor_img_uint8(img0_bgr).to(gadget)
t1 = to_tensor_img_uint8(img1_bgr).to(gadget)
present(t0, "Picture 0")
present(t1, "Picture 1")
g0 = normalize_img_for_loftr(t0)
g1 = normalize_img_for_loftr(t1)
loftr = LoFTR(pretrained="outside").to(gadget).eval()
with torch.inference_mode():
correspondences = loftr({"image0": g0, "image1": g1})
mkpts0 = correspondences["keypoints0"]
mkpts1 = correspondences["keypoints1"]
mconf = correspondences.get("confidence", None)
print("Uncooked matches:", mkpts0.form[0])
if mkpts0.form[0] < 8:
elevate RuntimeError("Too few matches to estimate homography.")
if mconf is just not None:
mconf = mconf.detach()
topk = min(2000, mkpts0.form[0])
idx = torch.topk(mconf, ok=topk, largest=True).indices
mkpts0 = mkpts0[idx]
mkpts1 = mkpts1[idx]
print("Saved high matches:", mkpts0.form[0])
ransac = RANSAC(
model_type="homography",
inl_th=3.0,
batch_size=4096,
max_iter=10,
confidence=0.999,
max_lo_iters=5
).to(gadget)
with torch.inference_mode():
H01, inliers = ransac(mkpts0, mkpts1)
print("Estimated H form:", tuple(H01.form))
print("Inliers:", int(inliers.sum().merchandise()), "/", int(inliers.numel()))
vis = draw_matches(
img0_rgb,
img1_rgb,
mkpts0.detach().cpu().numpy(),
mkpts1.detach().cpu().numpy(),
max_draw=250
)
plt.determine(figsize=(10,5))
plt.imshow(vis)
plt.axis("off")
plt.title("LoFTR matches (subset)")
plt.present()
H01 = H01.unsqueeze(0) if H01.ndim == 2 else H01
warped0 = KG.warp_perspective(t0, H01, dsize=(t1.form[-2], t1.form[-1]), align_corners=True)
stitched = torch.max(warped0, t1)
present(warped0, "Image0 warped into Image1 body (by way of RANSAC homography)")
present(stitched, "Easy stitched mix (max)")We carry out realized characteristic matching utilizing LoFTR to ascertain dense correspondences between two photos, whereas guaranteeing robustness by means of a network-safe fallback mechanism. We then apply Kornia’s RANSAC to estimate a steady homography from these matches and warp one picture into the coordinate body of the opposite. We visualize the correspondences and produce a easy stitched end result to validate the geometric alignment end-to-end. Try the FULL CODES right here.
print("n[4] Mini coaching loop with Kornia augmentations (quick subset)")
cifar = torchvision.datasets.CIFAR10(root="/content material/information", practice=True, obtain=True)
num_samples = 4096
indices = np.random.permutation(len(cifar))[:num_samples]
subset = torch.utils.information.Subset(cifar, indices.tolist())
def collate(batch):
imgs = []
labels = []
for im, y in batch:
imgs.append(TF.to_tensor(im))
labels.append(y)
return torch.stack(imgs, 0), torch.tensor(labels)
loader = torch.utils.information.DataLoader(
subset, batch_size=256, shuffle=True, num_workers=2, pin_memory=True, collate_fn=collate
)
aug_train = Okay.ImageSequential(
Okay.RandomHorizontalFlip(p=0.5),
Okay.RandomAffine(levels=12.0, translate=(0.08, 0.08), scale=(0.9, 1.1), p=0.7),
Okay.ColorJiggle(0.2, 0.2, 0.2, 0.1, p=0.8),
Okay.RandomGaussianBlur((3, 3), (0.1, 1.5), p=0.3),
).to(gadget)
class TinyCifarNet(nn.Module):
def __init__(self, num_classes=10):
tremendous().__init__()
self.conv1 = nn.Conv2d(3, 48, 3, padding=1)
self.conv2 = nn.Conv2d(48, 96, 3, padding=1)
self.conv3 = nn.Conv2d(96, 128, 3, padding=1)
self.head = nn.Linear(128, num_classes)
def ahead(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv3(x))
x = x.imply(dim=(-2, -1))
return self.head(x)
mannequin = TinyCifarNet().to(gadget)
choose = torch.optim.AdamW(mannequin.parameters(), lr=2e-3, weight_decay=1e-4)
mannequin.practice()
t_start = time.time()
working = []
for it, (xb, yb) in enumerate(loader):
xb = xb.to(gadget, non_blocking=True)
yb = yb.to(gadget, non_blocking=True)
xb = aug_train(xb)
logits = mannequin(xb)
loss = F.cross_entropy(logits, yb)
choose.zero_grad(set_to_none=True)
loss.backward()
choose.step()
working.append(loss.merchandise())
if (it + 1) % 10 == 0:
print(f"iter {it+1:03d}/{len(loader)} | loss {np.imply(working[-10:]):.4f}")
if it >= 39:
break
print("Executed in", spherical(time.time() - t_start, 2), "sec")
plt.determine(figsize=(6,4))
plt.plot(working)
plt.title("Coaching loss (fast demo)")
plt.xlabel("iteration")
plt.ylabel("loss")
plt.present()
xb0, yb0 = subsequent(iter(loader))
xb0 = xb0[:8].to(gadget)
xbA = aug_train(xb0)
def tile8(x):
x = x.detach().cpu().clamp(0,1)
grid = torchvision.utils.make_grid(x, nrow=4)
return grid.permute(1,2,0).numpy()
plt.determine(figsize=(10,5))
plt.imshow(tile8(xb0))
plt.axis("off")
plt.title("CIFAR batch (unique)")
plt.present()
plt.determine(figsize=(10,5))
plt.imshow(tile8(xbA))
plt.axis("off")
plt.title("CIFAR batch (Kornia-augmented on GPU)")
plt.present()
print("n✅ Tutorial full.")
print("Subsequent concepts:")
print("- Feathered stitching (delicate masks) as an alternative of max-blend.")
print("- Examine LoFTR vs DISK/LightGlue utilizing kornia.characteristic.")
print("- Multi-scale homography optimization + SSIM/Charbonnier losses.")We reveal how Kornia’s GPU-based augmentations combine instantly into a regular coaching loop by making use of them on the fly to a subset of the CIFAR-10 dataset. We practice a light-weight convolutional community end-to-end, demonstrating that differentiable augmentations incur minimal overhead whereas bettering information variety. Ultimately, we visualize unique versus augmented batches to substantiate that the transformations are utilized constantly and effectively throughout studying.
In conclusion, we demonstrated that Kornia allows a unified imaginative and prescient workflow the place information augmentation, geometric reasoning, characteristic matching, and studying stay differentiable and GPU-friendly inside a single framework. By combining LoFTR matching, RANSAC-based homography estimation, and optimization-driven alignment with a sensible coaching loop, we confirmed how classical imaginative and prescient and deep studying complement one another relatively than compete. It serves as a basis for extending towards production-grade stitching, sturdy pose estimation, or large-scale coaching pipelines, and we emphasize that the identical patterns we used right here scale naturally to extra advanced, real-world imaginative and prescient methods.
Try the FULL CODES right here. Additionally, be at liberty to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be part of us on telegram as properly.