🧪 New Study: Enhancing Reproducibility in Deep Learning Model Training for Computational Pathology

We’re pleased to announce a new contribution from the AI4Path LabHyperparameter Optimization and Reproducibility in Deep Learning Model Training, led by Usman Afzaal, Ziyu Su, Usama Sajjad, Hao Lu, Mostafa Rezapour, Metin Nafi Gurcan, and Muhammad Khalid Khan Niazi.


🔍 What’s the Study About?

Reproducibility remains a major challenge in foundation model training for histopathology.
Software randomness, hardware non-determinism, and incomplete hyperparameter reporting
often lead to inconsistent results across research groups.

To address this, the team systematically evaluated reproducibility by training a CLIP model on the
QUILT-1M dataset, exploring how different hyperparameter settings and augmentation strategies
influence downstream performance on three key datasets — PatchCamelyon, LC25000-Lung, and LC25000-Colon.

Figure 1. Joint training of image and text encoders for multimodal embedding alignment


Figure 1. Overview of our joint image–text representation learning framework.
The model jointly trains an image encoder and a text encoder to learn a shared multimodal embedding space by
maximizing the cosine similarity of matched image–text pairs within a batch.
Image patches are processed by the image encoder to obtain latent visual representations
(u1, u2, …, un), while corresponding textual descriptions are embedded through the text encoder
into feature vectors (v1, v2, …, vn).
Pairwise similarities (ui·vj) form a contrastive learning objective that aligns semantically
related histopathology images and diagnostic texts in a unified latent space, enabling the model to capture morphological–linguistic correlations crucial for computational pathology.

📊 Key Findings

  • Optimal augmentation: RandomResizedCrop values of 0.7–0.8 outperformed more extreme settings.
  • Training stability: Distributed training without local loss produced the most consistent convergence.
  • Learning rate sensitivity: Rates below 5.0e−5 consistently degraded model performance.
  • Benchmark robustness: The LC25000 (Colon) dataset showed the highest reproducibility across runs.

⚙️ These experiments highlight that achieving reproducible AI in digital pathology depends not only on open reporting
but also on careful experimental design and hyperparameter tuning.
The authors provide practical recommendations for building reliable, reproducible foundation models in the field.

👥 Meet the Authors

Usman Afzaal, Ziyu Su, Usama Sajjad, Hao Lu, Mostafa Rezapour, Metin Nafi Gurcan, and
Muhammad Khalid Khan Niazi (PI, AI4Path Lab)

Stay tuned for more studies from AI4Path at the intersection of foundation models,
computational pathology, and AI reproducibility.

Leave a Reply

Your email address will not be published. Required fields are marked *