Serum prostate-specific antigen (PSA) assays differ in calibration and response to different PSA forms. We examined intermethod differences in total PSA (tPSA) and free PSA (fPSA) measurements. We tested 157 samples with tPSA concentrations of 2 to 10 ng/mL (2-10 µg/L) using 6 PSA/fPSA method pairs and 1 tPSA method: ADVIA Centaur (complexed and total; Siemens Diagnostics, Tarrytown, NY), ARCHITECT i2000SR (Abbott Diagnostics, Abbott Park, IL), AxSYM (Abbott Diagnostics), IMMULITE 2000 (Siemens Diagnostics), Modular E170 (Roche Diagnostics, Indianapolis, IN), UniCel DxI 800 (Beckman Coulter, Brea, CA), and VITROS ECi (tPSA only; Ortho-Clinical Diagnostics, Raritan, NJ). Regression analysis was performed for PSA, fPSA, and percentage of fPSA with the ARCHITECT i2000SR comparison method. Differences between test and comparison methods were estimated at 2.5, 4.0, and 10.0 ng/mL (2.5, 4.0, and 10.0 µg/L) for tPSA and 15%, 20%, and 25% for percentage of fPSA. Relative differences were more than 10% at 4.0 ng/mL (4.0 µg/L) tPSA for the Centaur, IMMULITE, ECi, and DxI methods. At 20% fPSA, the relative difference was more than 10% for all methods except the AxSYM. Additional harmonization is needed for tPSA and fPSA methods.
Prostate cancer is the second leading cause of cancer-related deaths for men in the United States. It has been suggested that screening for prostate cancer may have reduced prostate cancer mortality rates, but this remains controversial. Current American Cancer Society guidelines for prostate cancer screening recommend a digital rectal examination and prostate-specific antigen (PSA) measurement annually for all men 50 or older if they have a minimum life expectancy of 10 years. Although PSA is cleared by the Food and Drug Administration for monitoring patients with prostate cancer for disease recurrence, it is unique in that it is the only tumor marker that is currently approved by the Food and Drug Administration as an aid in the detection of prostate cancer. PSA testing for prostate cancer detection, however, is complicated for many reasons. First, PSA exists in multiple isoforms. Although there are many complexed forms that circulate in low concentrations in blood, the 2 principal forms that are measured by current methods are PSA complexed with α1-antichymotrypsin (complexed PSA) and uncomplexed, or free, PSA (fPSA). Many automated analyzers have assays to measure total PSA (tPSA), and most can measure at least 1 of the 2 principal forms directly.
Second, screening with PSA is problematic because PSA is really an organ-specific marker for the prostate rather than a specific marker for cancer. Controversy exists regarding the medical decision levels for tPSA and percentage of fPSA within the United States and abroad. For tPSA concentrations, a diagnostic gray zone exists for patients with PSA concentrations between 4.0 and 10.0 ng/mL (4.0-10.0 µg/L). Many patients with PSA concentrations greater than 10.0 ng/mL (10.0 µg/L) have advanced disease, but within this gray zone only 25% have cancer, and controversy exists whether these are the patients who have early-stage disease and would benefit from detection or whether these are the patients who have indolent forms of prostate cancer. Therefore, recommendations indicate that a biopsy should be performed in any patient with a tPSA result that is greater than 10.0 ng/mL (10.0 µg/L), and further investigation is warranted in patients who are in the diagnostic gray zone and have a result between 4.0 and 10.0 ng/mL (4.0-10.0 µg/L) tPSA.[5-9] In high-risk populations, a biopsy is recommended if the tPSA is 2.5 ng/mL (2.5 µg/L) or more. It has been recently suggested that this lower tPSA cutoff should be adopted for screening men of all ages. Although the medical decision points for percentage of fPSA are also equally controversial, it is generally accepted that measuring fPSA and calculating the percentage of fPSA aids in distinguishing cancer from other benign prostate conditions such as benign prostate hyperplasia, particularly for the population in the diagnostic gray zone.[3,11,12] The lower the fPSA/tPSA ratio, the greater the likelihood of cancer. Currently, biopsy is recommended for 15%, 20%, or 25% fPSA.[11,13,14]
Third, PSA measurements are further complicated by the fact that different assays measure different PSA isoforms to varying extents. Although a certain amount of variation is inevitable because different methods have antibodies that recognize different epitopes of PSA, ideally, assays that measure tPSA should be equimolar and unbiased in the detection of free and complexed PSA. Historically, however, different assays have produced significantly different results for PSA on the same sample. Although lack of an equimolar response was in part responsible for this phenomenon, the other problem was the lack of calibration against a universal standard. In an effort to standardize PSA methods, the World Health Organization (WHO) developed a standard, the WHO (90:10) (National Institute for Biological Standards and Control 96/670) PSA reference preparation, that consists of 90% PSA complexed to α1-antichymotrypsin and 10% fPSA. Recent studies evaluated automated methods using this standard, and although bias between different methods has been significantly reduced, results between assays are not interchangeable for tPSA or fPSA,[16-19] which complicates absolute PSA value recommendations for tPSA cutoffs and interpretation of the fPSA/tPSA ratio.
The aim of our study was to compare 6 commercially available automated methods for PSA concentrations (total and free) by using real patient samples that would minimize matrix effects to determine the degree of method-dependent bias in patient results at critical cutoffs. Our study is unique not only because we evaluated differences using Passing-Bablok analysis but also because statistical differences between methods and concordance at critical cutoffs were determined.
Materials and Methods
A total of 157 surplus samples submitted to our laboratory for PSA testing with results of 2.0 to 10.0 ng/mL (2.0-10.0 µg/L) on the ARCHITECT i2000SR (Abbott Diagnostics, Abbott Park, IL) were selected and further evaluated by 6 additional automated methods. The institutional review board of the University of Utah, Salt Lake City, approved all studies using human samples.
The methods included the ADVIA Centaur (Siemens Diagnostics, Tarrytown, NY), AxSYM (Abbott Diagnostics), IMMULITE 2000 (Siemens Diagnostics), Modular E170 (Roche Diagnostics, Indianapolis, IN), UniCel DxI 800 (Beckman Coulter, Brea, CA) and VITROS ECi (Ortho-Clinical Diagnostics, Raritan, NJ). The ARCHITECT i2000SR was used as the comparison method. Specimens were tested for PSA and free or complexed PSA as available, according to manufacturers' package instructions. In the case of the ADVIA Centaur, by which only total and complexed forms are measured, the following equation was used to estimate fPSA: tPSA – complexed PSA = fPSA. The percentage of fPSA was calculated for all methods using the equation: (fPSA/tPSA) × 100 = % fPSA.
Method comparison studies were conducted using the ARCHITECT i2000SR as the comparison method because it has been demonstrated that this method is equimolar and well standardized to the WHO PSA standard for measuring tPSA and fPSA concentrations.[16,17] Passing-Bablok analysis was performed using Analyse-It, version 1.71 (Analyse-It Software, Leeds, England). The intermethod difference was estimated by using the equation:
Difference = [(a + b) × Xc] – Xc
where a is the y-intercept, b is the slope (as determined from the Passing-Bablok analysis), and Xc is the critical concentration. Critical concentrations of 2.5, 4.0, and 10.0 ng/mL (2.5, 4.0, and 10.0 µg/L) were chosen for tPSA and 15%, 20%, and 25% for percentage of fPSA. A 10% relative bias limit was chosen based on studies by Roddam et al[16,20] that demonstrated that bias of more than 10% significantly impacts the clinical classification of patients.
The Wilcoxon signed rank test was used to evaluate PSA paired comparisons for all samples (PSA, fPSA, and percentage of fPSA) on each method vs the ARCHITECT i2000SR. The κ statistic was used to evaluate concordance with the comparison method at the critical cutoffs of 2.5 and 4.0 ng/mL (2.5 and 4.0 µg/L) for tPSA and 15%, 20%, and 25% for percentage of fPSA. The Wilcoxon signed rank test was performed using R software package, version 2.5 (R Foundation for Statistical Computing, 2007), and κ values were calculated using SAS software v.9.1.3 (SAS Institute, Cary, NC).
Results for tPSA for the 6 automated methods and the ARCHITECT i2000SR comparison method are shown Figure 1. The slopes ranged from 0.836 to 1.143 for the ADVIA Centaur and VITROS ECi methods, respectively. The y-intercepts ranged from –0.02 to 0.34 ng/mL for the AxSYM and VITROS ECi methods, respectively. The correlation coefficients ranged from 0.95 to 0.97. Results for fPSA for 5 methods and the ARCHITECT i2000SR comparison method are shown Figure 2. The slopes ranged from 0.870 to 1.046 for the UniCel DxI 800 and AxSYM methods, respectively. The y-intercepts ranged from 0.02 ng/mL for the AxSYM and IMMULITE 2000 methods to 0.23 ng/mL for the Modular E170 method. The correlation coefficients ranged from 0.87 to 0.99 for the IMMULITE 2000 and AxSYM methods, respectively. Results for the percentage of fPSA for 5 methods and the ARCHITECT i2000SR comparison method are shown Figure 3. The slopes ranged from 0.728 to 1.080 for the UniCel DxI 800 and AxSYM methods, respectively. The y-intercepts ranged from –0.32% to 6.64% for the AxSYM and Modular E170 methods, respectively. The correlation coefficients ranged from 0.89 to 0.97, for the IMMULITE and AxSYM methods, respectively.
Wilcoxon signed rank statistical analysis of the distribution of results for tPSA concentrations gave P values less than .001 with the exception of the AxSYM (P = .093; Figure 1). For fPSA concentrations, all methods gave P values less than .001 except the ADVIA Centaur (P = .686; Figure 2). For percentage of fPSA, all methods had P values less than .001.
For tPSA, at a critical concentration of 2.5 ng/mL (2.5 µg/L), a relative difference of 10% was exceeded by all methods except the AxSYM Table 1 . The κ values for tPSA studies at the critical concentration point of 2.5 ng/mL (2.5 µg/L) ranged from –0.01 to 0.56, for the VITROS ECi and Modular E170, respectively Table 2 . At a critical concentration of 4.0 ng/mL (4.0 µg/L), a 10% relative difference was exceeded by 4 of 6 methods, the exceptions being the AxSYM and E170 ( Table 1 ). At this critical concentration, the κ values ranged from 0.52 for VITROS ECi to 0.90 for the Modular E170 and UniCel DxI 800 ( Table 2 ). At a critical concentration of 10.0 ng/mL (10.0 µg/L), a relative difference of 10% was exceeded by 3 methods ( Table 1 ). These were the ADVIA Centaur, IMMULITE 2000, and VITROS ECi. For the percentage of fPSA Table 3 , at 15% and 20% fPSA, the 10% relative difference limit was exceeded by all methods that were evaluated except the AxSYM. At a critical concentration of 25% fPSA, the ADVIA Centaur, IMMULITE 2000, and UniCel DxI 800 methods exceeded the 10% relative difference limit, whereas the AxSYM and Modular E170 methods did not. For percentage of fPSA studies, the κ values ranged from 0.63 (IMMULITE 2000) to 0.87 (AxSYM) at a critical cutoff of 15% fPSA Table 4 . At 20% fPSA, the κ values ranged from 0.61 (IMMULITE 2000) to 0.90 (AxSYM), and at 25% fPSA, the range was 0.58 (IMMULITE 2000) to 0.83 (AxSYM).
Our studies demonstrate that all tPSA assays tested showed good correlation in a concentration range from 2.0 to 10.0 ng/mL (2.0-10.0 µg/L). The fPSA and percentage of fPSA results associated with samples also demonstrated good correlations. The results of Passing-Bablok analysis for tPSA and fPSA methods were consistent with the results of previously published studies.[17,18] Wilcoxon signed rank test results demonstrated that all methods were significantly different from the ARCHITECT i2000SR except the AxSYM for tPSA and that all methods except the ADVIA Centaur were significantly different from the ARCHITECT i2000SR for fPSA. All methods were significantly different from the ARCHITECT i2000SR for percentage of fPSA. Furthermore, when we examined the relative difference between assays for tPSA at clinical decision concentrations of 2.5, 4.0, and 10.0 ng/mL (2.5, 4.0, and 10.0 µg/L), the majority of the assays exceeded a 10% relative difference limit. In addition, concordance studies using the κ statistic and a suggested cutoff of 0.75 or more as a measure of excellent concordance demonstrated that at the medical decision threshold of 2.5 ng/mL (2.5 µg/L), none of the methods were acceptable, and at 4.0 ng/mL (4.0 µg/L), only 3 of the 6 methods (AxSYM, Modular E170, and UniCel DxI 800) were comparable to the ARCHITECT i2000SR. These data, together with Passing–Bablok and Wilcoxon signed rank analyses, suggest that significant assay bias is present and clinical categorization of patients may be different, depending on which assay is used.
The difference between methods was even more pronounced when the percentage of fPSA was examined. At the clinical decision thresholds of 15% and 20%, all methods except the AxSYM had 10% or more relative difference when compared with the ARCHITECT i2000SR comparison method. This is particularly troubling because 20% fPSA is most often recommended as a clinical decision threshold. Both tPSA and fPSA bias may have contributed to the differences in the percentage of fPSA. At a cutoff of 25% fPSA, 3 of the 6 methods showed good agreement. These data suggest that harmonization efforts should focus on percentage of fPSA values of 20% or less.
Method comparison differences for tPSA and fPSA results were least pronounced for the AxSYM compared with the ARCHITECT i2000SR comparison method. This finding may be explained in part by the fact that the ARCHITECT i2000SR and AxSYM are calibrated against the same WHO standard and use the same antibodies, although the detection method for the ARCHITECT i2000SR is chemiluminescence and that for the AxSYM is fluorescence. The ARCHITECT i2000SR, AxSYM, Modular E170, and IMMULITE 2000 are all calibrated against the WHO standard for tPSA; the ADVIA Centaur is calibrated to a purified PSA standard that is traceable to the WHO standard; and the VITROS ECi and UniCel DxI 800 are calibrated using a purified PSA standard.[17,22,23] For fPSA, the ARCHITECT i2000SR, AxSYM, Modular E170, and IMMULITE 2000 are calibrated using a WHO standard. The ADVIA Centaur, VITROS ECi, and UniCel DxI 800 are calibrated using other standards, such as purified PSA.[17,22-24]
Our studies demonstrate that when comparing automated PSA assays using patient samples, all methods correlate, but further harmonization efforts are required for clinical concordance. There is growing evidence that despite standardization attempts, significant differences still exist in automated PSA methods.[16-19] The current PSA methods still cannot be used interchangeably as demonstrated by the more than 10% relative differences between methods at relevant medical decision concentrations, making PSA interpretation difficult. Lack of agreement at critical concentrations for tPSA of 2.5, 4.0, and 10.0 ng/mL (2.5, 4.0, and 10.0 µg/L) and for percentage of fPSA of 15%, 20%, and 25% can potentially result in misclassification of patients. Because clinically the decision for further diagnostic procedures such as prostate biopsy is made based on the tPSA and fPSA results at these critical concentrations, the lack of harmonization and its potential impact on patient care remain concerns.
Table 1. Summary of Bias Estimates for Total Prostate-Specific Antigen for Six Methods Compared With the ARCHITECT i2000SR Comparison Method*
Table 2. Summary of κ Statistics for Total Prostate-Specific Antigen for Each Method Compared With the ARCHITECT i2000SR Comparison Method