Structural detection of hidden backdoor patterns in trained models, analyzing coupling between data artifacts and model behavior.
Models trained on poisoned data can contain hidden backdoors — behavioral patterns that produce specific (typically malicious) outputs when triggered by carefully crafted inputs while behaving normally otherwise. The structural problem is that these backdoors are embedded in the model's learned representations through coupling between poisoned training examples and model parameters, making them difficult to detect through standard evaluation.
Backdoors represent a structural coupling between data artifacts (the poisoned examples) and model behavior (the triggered response). This coupling is designed to be invisible under normal conditions and only activates when the specific trigger pattern is present.
This application addresses AI supply chain security, where models may be trained on data from untrusted sources or by third parties. The relevant system boundary includes training data provenance, the training process, the model's learned representations, and the deployment context where backdoors could be activated.
AI supply chain security is becoming critical as organizations increasingly rely on pre-trained models, open-source weights, and third-party training data. Structural backdoor detection provides the diagnostic capability needed to verify model integrity in an environment where trust in data and model provenance cannot be assumed.
The SORT framework addresses this application through four structural dimensions, each providing a distinct analytical layer.
Models show unexpected behavior on certain triggers.
Backdoors arise through coupling between training artifacts and model.
Structural detection of backdoor patterns.
Data security, training pipeline audit, supply chain security.