Biosimilar approval hinges on demonstrating analytical and functional similarity to the reference product across a panel of quality attributes. In current practice, comparability is often assessed by comparing individual attributes against predefined acceptance ranges. Univariate comparisons can overlook the correlation structure that links glycosylation, charge heterogeneity, potency, and stability. This creates the possibility that a batch may appear acceptable attribute by attribute while remaining atypical in the broader multivariate quality space. This manuscript proposes a predictive model for estimating the overall comparability of a biosimilar batch to its reference product. The model is designed to integrate glycosylation, charge variant, potency, and stability data while identifying the attributes most responsible for predicted dissimilarity. A gradient-boosted classification framework is conceptually trained on historical batch-level characterization data from reference and biosimilar development programs. Input features encode N-glycan profiles, charge variant distributions, relative potency, and forced-degradation stability behavior, with SHAP used to explain predictions.
Conceptually, the model would provide a single comparability score for each biosimilar batch. It would also generate an interpretable attribution profile showing which quality attributes contributed most strongly to any predicted deviation. Such a predictive tool could strengthen biosimilar development by providing a transparent, multivariate assessment of analytical similarity. It could help reduce the risk of failed comparability studies and support regulatory discussions with data-driven evidence.