Interest of locally weighted regression to overcome nonlinear effects during in situ NIR monitoring of CHO cell culture parameters and antibody glycosylation

Animal cell culture processes have become the standard platform to produce therapeutic proteins such as recombinant monoclonal antibodies (mAb). Since the mAb quality could be subject to significant changes depending on manufacturing process conditions, real time monitoring and control systems are required to ensure mAb specifications mainly glycosylation and patient safety. Up to now, real time monitoring glycosylation of proteins has received scarce attention. In this article, the use of near infrared (NIR) to monitor mAb glycosylation has been reported for the first time. Whereas monitoring models are mainly constructed using linear partial least squares regressions (PLSR), evidences presented in this study indicate nonlinearity relationship between in situ captured spectra and compound concentrations, compromising the PLSR performances. A novel and simple approach was proposed to fit nonlinearity using the locally weighted regression (LWR). The LWR models were found to be more appropriate for handling information contained in spectra so that real time monitoring of cultures were accurately performed. Moreover, for the first time, the LWR calibration models allowed mAb glycosylation to be monitored, in a real time manner, by using in situ NIR spectroscopy. These results represent a further step toward developing active‐control feedback of animal cell processes, particularly for ensuring properties of biologics.

The production of biologicals, especially recombinant monoclonal antibodies (mAb), remains a challenge due to the structural complexity of these molecules and their sensitivity to changes in the manufacturing process. That is why strict quality control systems are required to ensure mAb specifications and patient safety. For that, regulatory agencies proposed the quality by design strategy, 1 which is rendered possible through the process analytical technology (PAT) approach. [2][3][4] The main objective is to real time monitor the concentrations of some process parameters, such as viable cell, nutrient, and metabolite concentrations, whose variability may have an impact on mAb quality attributes. 5 As one of the main quality attributes, glycosylation pattern confers chemical and therapeutic properties to mAb (serum half-life, immunogenicity, antibody-dependent cellular cytotoxicity, and complementdependent cytotoxicity). [6][7][8] Therefore, its control is essential to ensure efficacy of the product and safety for patients. However, glycosylation monitoring is usually performed at the end of the culture process because its analysis is time-and labor-consuming. 9 Even in advanced production processes supporting the PAT initiative, glycosylationmonitoring still requires some sample handling through the semiautomatic off-line analysis. [10][11][12] Moreover, delays due to off-line analysis may also compromise real time control of the process. A new challenging objective for PAT is thus to control mAb glycosylation as well as cell metabolism using online spectroscopy. [13][14][15][16] Consequently accurate monitoring models must be developed so that advanced active feedback control systems for controlling processes could become feasible.
In recent decades, vibrational near infrared (NIR) and Raman spectroscopies in combination with multivariate data analysis have been proven to be promising tools for monitoring cell culture process parameters. [16][17][18][19] The most widely used multivariate method for developing calibration models from spectroscopic data are partial least squares regression (PLSR). 20 PLSR maps linearly spectroscopic spectrum into a low-dimensional space of coordinates called latent variables (LV), which are employed to generate the regression or calibration equations using only linear combinations. 21 In this context, in situ spectroscopic monitoring has been claimed as an ideal analysis method since it provides real time multicomponent information directly without sample treatments, thus avoiding contamination risks and perturbation of compound properties. 22 Nevertheless, this in-line implementation remains a challenge because analyses can be subject to perturbations due to the dynamics of the cell culture. Indeed, scattering compounds are generated during cultures, such as cells, cellular debris, or mAb aggregates. Their accumulation in the bioreactor may induce modifications of the scattered light reaching the detector, causing changes in apparent absorbance, and thus resulting in nonlinear spectra slopes changes. 23 These effects, which induce or increase the nonlinear relationships between spectra and com-  [29][30][31][32] The studies dealing with animal cell culture monitoring have been mostly restricted to selection of the best spectra variables using linear regressions. However, such approaches fail to properly address potential nonlinear relationships between spectra variabilities and compound concentration changes during cell cultures.
One way to solve nonlinear behavior using the widely known linear regression methods is to perform regression locally, such as the locally weighted regression (LWR) method. This method aims to model nonlinearity using several local linear regressions. 33

| Off-line measurements
Viable cell density (VCD) was measured using the Vi-Cell XR™ cell counter (Beckman Coulter). Off-line concentrations of glucose, lactate, glutamine, and mAb were determined with enzymatic kits using the automated photometric analyzer Gallery™ (Thermo Fisher Scientific). Using UHPLC-MS as previously described, 26 mAb glycosylation heterogeneity was elucidated.

| Development of NIR calibration models with chemometric methods
Main calibration set, which consisted in data from batch, repeated-batch and glucose-spiked cultures, was randomly partitioned into calibration (80%) and validation set (20%). The calibration set involved 134 samples, while the validation set 34. The calibration set was used for cross-validation. Since validation set was not used for calibration processes, it was used to infer the performance of models in cultures completely unknown by the models. A maximum of four observations were deleted when they were identified as influential outliers (Hotelling's T-square method, p = .95). 28 Firstly, in order to generate the PLSR calibration models for concentration of viable cells, glucose, lactate, glutamine, mAb and nonglycosylated mAb, special attention was given to NIR spectra preprocessing. Selection of spectra preprocessing methods was an exhaustive qualitative process to determine and mitigate additivemultiplicative effects and wavelength-dependent baseline variation. 27 Not less than 30 pretreatments and their combinations were compared based on model performance evaluated by the root mean squared error of crossvalidation (RMSECV) and the root mean squared error of prediction (RMSEP). RMSECV and RMSEP allow a direct measure of accuracy using calibration data and independent data, respectively. A model with lower RMSECV or RMSEP is considered more accurate. The relative error (RE) was also used to evaluate the models. RE is the relationship of RMSECV or RMSEP with the maximum concentration of a compound during the calibration process. It is used as a contextualized error of a model so that comparison of accuracy is more meaningful to expected measures during real time calculations. The particular spectra pretreatment that led to higher accuracy for a compound PLSR model was given in Table 1.
Subsequently, calibration models were generated using the LWR method. An optimization process was carried out to determine the local areas for regression in terms of number of local points (LPs) and the level of spectra compression in term of principal components (PC). 28 LP is the number of nearest calibration samples in the principal component space to be used for a particular local regression, and it represents a good measure of nonlinearity. A model with few LPs suggests the presence of strong nonlinearity, since, more local regressions based on only few samples are needed to properly outline the nonlinear relationship between spectra and compounds. As PC is a linear spectrum mapping procedure, it will require more PC to fit a nonlinear relationship, then a large number of PC used in a LWR model also suggest the presence of nonlinearity. Parallel to a former optimization, a spectrum pretreatment determination approach as used for PLRS was also performed. The particular spectrum pretreatment that led to higher accuracy for a compound LWR model is given in Table 1.
As established for analytical procedures, linearity means the ability of a model to obtain results directly and linearly proportional to actual concentration within a compound concentration range. 35 Then nonlinearity and incapacity of models to handle it, can be detected by a systematic deviation of residuals from the zero line, usually with a curve tendency. 28 In order to properly detect nonlinear behavior during calibration, residual plots (off-line measured values against difference between off-line measurements and in-line calculated values) were analyzed as suggested elsewhere. 36

| In situ monitoring of cell cultures
To evaluate the predictive capacity of the models, NIR spectra were automatically acquired in situ every 20 min throughout CHO cell

| Nonlinear relationships between cell culture parameters and spectra
The way a model fits data induces some specific structure distribution of the residuals. If a linear model such as PLSR fits a curve relationship between spectra and compound concentrations, residuals will have a systematic deviation. In contrast, if the linear model fits a linear relationship, the residuals would scatter randomly from the zero line. This criterion was then used to graphically detect nonlinear relationships and adequateness of PLSR ( Figure 1) observed for NG-mAb residuals, indicating that a strong nonlinear relationship occurred between spectra and NG-mAb concentrations for the whole concentration range tested. In such a case, a linear PLSR model is inadequate and therefore nonlinear regression approaches are required to generate proper calibration models. This nonlinear behavior was also observed for mAb to a lesser extent. In addition, a flattened parabola profile was observed for viable cells and lactate concentrations, while a curve tendency was detected for glutamine at concentrations over 3 mM. In these cases, PLSR is considered as adequate only for the concentration range where a linear relationship between actual and estimated compound concentrations is observed, corresponding to a random distribution of the residuals.
Nonlinearity was then statistically analyzed by the use of the Durbin-Watson test. The test provides a d-value related to the nature of the residual distribution, which is then used to evaluate correlation taking into account the nature of the calibration set. Results from this analysis revealed correlation of residuals for all PLSR models (Table 3), which particularly indicated a strong nonlinearity for lactate and NG-mAb. These results suggested, from a statistical point of view, the inability of the PLSR to obtain results directly and linearly proportional to actual concentrations. Other widely used linear approaches, such as principal component regression, has also been assessed with similar results (data not shown). This is in agreement with a former study reporting similar or even lower performances of these linear approaches. 40 Therefore, the novel implementation of other regression methods considering linearity analysis must be considered.
The presence of nonlinear behavior, which has limited the performance of PLSR, may be explained by the complexity of the cell culture medium used during the process. As NIR spectra contain both physical and chemical information of the samples, it is likely that nonlinearity F I G U R E 1 Detection of nonlinearity by inspection of residual distribution within the concentration range tested during calibration for viable cells, glutamine, glucose, lactate, mAb and NG-mAb concentration (off-line concentration against difference between off-line and estimated concentration by models). Residuals were obtained using PLSR (•) or LWR () regression method. LWR, locally weighted regression; PLSR, partial least squares regressions resulted from a wide variety of phenomena, such as variations of light diffusion profiles during cultures. 41 Chemical phenomena are mainly related to changes in the interaction of several absorbing functional groups, which may lead to shifts of absorption bands or to effects such as the Fermi and Darling-Dennison resonances. 42 Such resonance phenomena might require management of nonlinearity relationship to properly extract the information within spectra. 43 Therefore, the use of the LWR method has been evaluated with the aim of overcoming the limitations of the linear PLSR method.

| Development of LWR models to handle nonlinear relationships and comparison with PLSR models
The development of LWR models firstly comprised an optimization process to select the size of local areas and the number of PC. If the size of local areas is small, more local regressions must be launched to fit global nonlinearities, which also influence the way information within spectra should be handled. This is particularly related to the number of PC required to perform local regressions. Consequently, a compromise between local areas size, in terms of LPs and PC had to be found to avoid LWR overfitting. The final structure of LWR models is given in Table 2. The size of local areas in term of number of local points varied from 5 to 21 for the different compounds, which repre-sented~10% of the calibration set and depicted a strong nonlinear behavior. This nonlinearity is likely attributed to the dynamics of the culture process since the locations of the LPs used for the local regressions were mainly determined as a function of culture progression. On the other hand, a high number of PC likely depicts a nonlinearity caused by an inherent nonlinear relationship between spectra and concentration. Once the global nonlinearity was broken, relatively few PC were required to build the local regressions for viable cells, lactate, glutamine, and mAb concentration. However, for NG-mAb and glucose, a relatively high number of PC, depending on the number of local points, was required. The LWR models related concentrations of lactate and glutamine with spectral variability more efficiently, as shown by the higher R 2 in contrast with results obtained using PLSR (Table 2). This enhanced management of spectral variability by the LWR method resulted in a reduction of RMSECV and RMSEP values, which corresponded to decreased REs of about 35% for lactate and glutamine. Such reductions allowed concentration estimates with R.E lower than 10%. The LWR method also enhanced the accuracy for a viable cell model, resulting in a reduction of R.E of~30%. LWR displayed a similar performance to PLSR for mAb concentration whereas PLSR was higher for glucose (Table 2).
A remarkable characteristic of the LWR method was its capability to handle the strong nonlinearity behavior previously detected between spectra and concentration, as shown by the higher R 2 , particularly during cross-validation, in comparison to PLSR ( Table 2). Analysis of residual plots (Figure 1) confirmed that in general, LWR not only enhanced accuracy, but also drastically limited the effects of nonlinearity, particularly for NG-mAb. This was statistically confirmed by the Durbin-Watson test (Table 3), which indicated that, excepting NG-mAb and glutamine models, all LRW models properly handled nonlinearity and estimated compound concentration linearly to actual concentrations. In this context, the LWR method allowed the development of a calibration model with REs of~9%, opening the possibility to monitor the quality of mAb in terms of glycosylation site occupancy. As a general rule, the LWR method appeared to be the most appropriate model method for the majority of compounds, particularly NG-mAb.

| Real time monitoring of animal cell culture processes
Performances of prediction models based on both PLSR and LWR methods were evaluated during CHO cell culture processes producing mAb in a discontinuous mode. In-line monitoring of viable cells, glutamine, glucose, lactate, mAb, and NG-mAb concentrations was carried out using NIR spectroscopy ( Figure 2). In-line predictions were compared with off-line measurements to verify model accuracy. In all cases, the LWR method showed enhanced performance during real time monitoring.
Contrary to what was expected on the basis of a relative high R 2 (  44 In addition, even with promising PLSR models established for viable cell concentrations, low precision and accuracy appeared once viable cell densities exceeded 80.10 5 cells ml −1 . 45 The lack of accuracy of the results could be attributed to the nonlinear relationship between spectra and viable cell concentrations. This was previously observed by the right segment of the parabola in the residual plot for viable cell concentrations, which corresponds to concentration values over 75.10 5 cells.ml −1 (Figure 1).  Figure 2). The analysis of spectra location in the PLSR space revealed that for the first 10 hr of culture they were totally outside the calibration space at 95% confidence limit (data not shown). Thus, the observed mis-prediction at the beginning of the culture could be attributed to an inappropriate extrapolation. On the contrary, the LWR method was shown to be more robust during this phase since culture spectra were always incorporated into the local PLSR space in order to perform a regression. The misestimations of lactate, glucose, and glutamine concentrations after 120 hr of the culture by PLSR model could also be partially explained by the nonlinearity phenomenon as formerly observed in Figure 1. They could also be attributed to the lack of precision or to the presence of noise since even the LWR method showed prediction errors, although weaker, after 120 hr. In addition, after 120 hr, the culture is characterized by a decrease in cell viability, resulting in the release of intracellular metabolites. Such metabolites could interact with the monitored molecules causing discrepancies in NIR spectra. This lack of precision of PLSR models at the end of cultures has already been reported, probably due to unknown components within some batches, while some other batches showed F I G U R E 2 Real time monitoring of a CHO cell culture in batch reactor by in situ NIR spectroscopy. Comparison of inline prediction by models using PLSR (•) or LWR (−) or regression methods with experimental off-line results (). LWR, locally weighted regression; PLSR, partial least squares regressions good predictions. 44 Most of the studies, which present the use of PLSR as an efficient method to predict animal cell culture parameters using NIR spectroscopy, do not consider concentration ranges as broad as those reported here, neither the cell death-phase for the process monitoring. 18,22,46 Some other authors have overcome nonlinearity behavior of NIR monitoring models by using at-line analyses of clarified culture medium, as already described. [47][48][49] In this context, LWR has been proven as a promising method to perform in situ monitoring since it allowed calibration in a wide concentration frame, and also considered the nature of the culture medium.

| Real time monitoring of mAb concentration and quality during animal cell culture processes
Previously developed methods using PLSR and LWR were used to monitor total mAb concentration and nonglycosylated mAb. While the PLSR method showed a good trend for the evaluations of total mAb concentrations during the first days of cell culture, a reduced precision was observed from 140 hr. This resulting misestimation was likely due to the enrichment of NG-mAb within the total mAb molecules. Consequently, results indicated that even models apparently able to monitor properly total mAb concentration could be strongly influenced by changes in mAb properties. This justifies the use of other regression methods, as confirmed by the good performance of LWR prediction model of total-mAb ( Figure 2).
The glycosylation pattern is a key quality parameter of mAb since it confers important properties such as ADCC or serum half-life. [6][7][8] Therefore its monitoring and control during process productions as proposed by the PAT initiative, is mandatory to ensure efficacy of mAb and safety of patients. Glycosylation analysis is time-consuming and usually performed by off-line approaches, which may induce a monitoring delay, and then compromise corrective action in order to maintain desired glycosylation properties. In this work, in situ NIR spectroscopy capability to real time monitor nonglycosylated mAb concentrations has been proven, provided the LWR method was used in place of the PLSR.
The performance of PLSR to monitor NG-mAb concentrations was first evaluated. As shown in Figure 2, the PLSR method completely failed to predict accurately NG-mAb concentrations throughout the culture, accordingly to low R 2 values (Table 2). This result was expected because during calibration, a strong nonlinear relationship between residuals and NG-mAb concentrations was observed (Figure 1), indicating the need for a more efficient regression method to monitor NG-mAb concentrations by in situ NIR spectroscopy. Indeed, the use of the LWR method made it possible to reduce the prediction errors and to obtain a good monitoring of NG-mAb concentrations ( Figure 2).

| CONCLUSIONS
In this study, experimental evidence of nonlinear parameter behavior in animal cell culture processes was provided. Consequently, the widely used PLSR method was incapable of relating spectra with compound concentrations, indicating that such a widely used regression method is not always appropriate for the monitoring of animal cell culture processes. The novel use of the LWR method was shown to overcome PLSR limitations, which led to more accurate predictions of culture compound concentrations. Using NIR spectroscopy, the enhanced capability of LWR to handle nonlinearities permitted for the first time, the in situ monitoring of mAb glycosylation site occupancy.
Overall, the results highlighted the fact that in situ NIR spectroscopy could have a broader potential as a PAT tool provided that effect of culture dynamics and nonlinearity be considered. In this context, NIR spectroscopy could be used to develop innovative spectroscopic calibration models so that effective control approaches to guarantee quality of antibodies could be implemented.