Publications

Towards an Explainable Comparison and Alignment of Feature Embeddings

Under Review, 2025

In this work, we introduce the Spectral Pairwise Embedding Comparison (SPEC) framework, a novel method for comparing feature embeddings by analyzing how they cluster similar samples. SPEC leverages kernel matrices and eigendecomposition to reveal mismatches in clustering between embeddings. We also present an optimization strategy to align embeddings.

Scanning Trojaned Models Using Out-of-Distribution Samples

Published in NeurIPS, 2024

In this work, we’ve introduced TRODO, a new method for detecting backdoor attacks in deep neural networks. TRODO identifies trojans by adversarially shifting out-of-distribution (OOD) samples toward in-distribution (ID) and detecting when classifiers mistakenly classify them as ID. This approach is effective even without training data and works against adversarially trained trojaned classifiers, making it adaptable across different scenarios and datasets.