Patch classification

DigiLut is an object detection challenge. Yet the images are too big to apply YOLO/DETR models on them directly. Thus, I decided to rephrase the problem as a patch classification problem, followed by a post processing step, that converts predicted patches heat maps into bounding boxes.

Steps:

Convert the dataset of WSI slides into a labelled dataset of JPG patches
Train a patch classification classifier

graph TD
  A[Whole Slide Images dataset] -->|PyFAST patchifies and keeps patches with tissue in it| C[Patches 256x256];
  C --> D[Assign binary labels to patches];
  D --> H[Train/Test split over patient IDs];
  H --> E[Train patches];
  H --> F[Test patches];
  E --> K[Undersample the train dataset to ensure a better label balance]
  K --> L[Cross validation split: GroupKFold over the patient IDs]
  L --> N[Train a ResNet model]
  N -->|Repeat over N folds| L;
  N --> O[Monitor training with Tensorboard]
  N ---> Q[N models trained = 1 ensemble of models];
  N --> P[Log experiment configuration and metrics in MLflow]
  F --------> R[Predict on test samples with the ensemble of models];
  Q --> R;

Dataset creation

Whole Slide Images (WSI .tiff images) are too big, we tile them into \(256 \times 256\) patches, at \(\times 20\) magnification level.
FAST, a medical image processing library, is used to tile the images into patches. FAST has tissue segmentation tools, so we only save patches that contain cells. This saves a lot of memory and I/O operations because most of the WSI is white background.
Then we assign binary (0/1) labels to each patch:
- 1 (positive) if IoU > threshold, with one of the ground truth bounding boxes of the slide
- 0 (negative) else.

Doing so we save around 10k patches per slide. The labels are highly imbalanced (1:1000 positive)

Training

We run cross validation (n=5) using a groupKFold strategy over the patient IDs to avoid train/test leakage.
We then train a simple MLP classifier with a pretrained ResNet backbone using a BCE or focal loss
- I considered other models pretrained on WSI images (like Phikon from Owkin), but they were too big for my machine. Maybe a LORA approach could have mitigated this issue.
By infering with this classifier on the whole slide, we get a scatter point of positive and negative patches at (x,y) coordinates.