Interobserver reproducibility of the Paris system for reporting urinary cytology
Theresa Long BS 1, Lester J Layfield MD 1, Magda Esebua MD 1, Shellaine R Frazier DO 1, D Tamar Giorgadze MD, PhD 2, Robert L Schmidt MD, PhD, MBA 3
1 Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, Missouri, USA
2 Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, USA
3 Department of Pathology and Laboratory Medicine and ARUP Laboratories, University of Utah, Salt Lake City, Utah, USA
|Date of Submission||22-Feb-2017|
|Date of Acceptance||05-May-2017|
|Date of Web Publication||24-Jul-2017|
Lester J Layfield
Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, Missouri
Source of Support: None, Conflict of Interest: None
Background: The Paris System for Reporting Urinary Cytology represents a significant improvement in classification of urinary specimens. The system acknowledges the difficulty in cytologically diagnosing low-grade urothelial carcinomas and has developed categories to deal with this issue. The system uses six categories: unsatisfactory, negative for high-grade urothelial carcinoma (NHGUC), atypical urothelial cells, suspicious for high-grade urothelial carcinoma, high-grade urothelial carcinoma, other malignancies and a seventh subcategory (low-grade urothelial neoplasm). Methods: Three hundred and fifty-seven urine specimens were independently reviewed by four cytopathologists unaware of the previous diagnoses. Each cytopathologist rendered a diagnosis according to the Paris System categories. Agreement was assessed using absolute agreement and weighted chance-corrected agreement (kappa). Disagreements were classified as low impact and high impact based on the potential impact of a misclassification on clinical management. Results: The average absolute agreement was 65% with an average expected agreement of 44%. The average chance-corrected agreement (kappa) was 0.32. Nine hundred and ninety-nine of 1902 comparisons between rater pairs were in agreement, but 12% of comparisons differed by two or more categories for the category NHGUC. Approximately 15% of the disagreements were classified as high clinical impact. Conclusions: Our findings indicated that the scheme recommended by the Paris System shows adequate precision for the category NHGUC, but the other categories demonstrated unacceptable interobserver variability. This low level of diagnostic precision may negatively impact the applicability of the Paris System for widespread clinical application.
Keywords: Interobserver agreement, Paris System, urinary cytology, urothelial carcinoma
|How to cite this article:|
Long T, Layfield LJ, Esebua M, Frazier SR, Giorgadze D T, Schmidt RL. Interobserver reproducibility of the Paris system for reporting urinary cytology. CytoJournal 2017;14:17
|How to cite this URL:|
Long T, Layfield LJ, Esebua M, Frazier SR, Giorgadze D T, Schmidt RL. Interobserver reproducibility of the Paris system for reporting urinary cytology. CytoJournal [serial online] 2017 [cited 2018 Mar 23];14:17. Available from: http://www.cytojournal.com/text.asp?2017/14/1/17/211453
To ensure the integrity and highest quality of CytoJournal publications, the review process of this manuscript was conducted under a double-blind model (authors are blinded for reviewers and vice versa) through automatic online system.
| » Introduction|| |
The diagnosis of urothelial carcinoma and posttherapy follow-up requires examination of either histologic biopsy specimens or cytologic examination of voided urine or instrumented urinary tract specimens. The diagnosis of urothelial carcinoma by cytologic methods can be challenging, especially the recognition of low-grade urothelial carcinomas. A number of diagnostic approaches and classification schemes have been proposed over the last seven decades.,,,, Diagnostic classifications and criteria have been proposed by Papanicolaou, Koss et al., Murphy et al., Ooms and Veldhuizer, and The Papanicolaou Society of Cytopathology. All of these systems have been highly successful for the recognition of high-grade urothelial carcinomas (HGUCs) but have had low sensitivity for the recognition of low-grade urothelial carcinomas.,,,, A number of technologies have been brought to bear to improve diagnostic sensitivity of urinary cytology,, but the success of these methodologies has often been insufficient to justify their cost. Fluorescence in situ hybridization analysis has been one of the more popular, but some studies have shown that its sensitivity for low-grade papillary urothelial carcinoma is not significantly superior to that of cytology alone.
The recently proposed Paris System for Reporting Urinary Cytology advocates a classification to improve the sensitivity and specificity for the diagnosis of HGUC. This system utilizes seven categories designated: (1) inadequate/less than optimal adequacy; (2) negative for high-grade urothelial carcinoma (NHGUC); (3) atypical urothelial cells (AUCs); (4) low-grade urothelial neoplasm (LGUN); (5) suspicious for high-grade urothelial carcinoma (SHGUC); (6) HGUC; and (7) other malignancies primary and metastatic. Each of these categories is well defined by specific criteria and is associated with a known risk for malignancy. The Paris System for Reporting Urinary Cytology also suggested management options for each diagnostic category.
Currently, little published data exist documenting the interobserver reproducibility of these categories. For clinical utility, a categorization scheme must be both accurate and precise. We investigated the interobserver reproducibility of five categories used in The Paris System for Reporting Urinary Cytology. The analysis was performed by four cytopathologists who had not participated in the development of The Paris System for Reporting Urinary Cytology. Herein, we report the results of our reproducibility study for evaluation of precision of the Paris System.
| » Methods|| |
The study design was reviewed by the Institutional Review Board at the University of Missouri for compliance with university, national, and international standards. The Institutional Review Board designated the study as “exempt.” Three hundred and fifty-seven urinary cytology specimens (328 voided urines, 13 catheterized urines, 10 ureteral brushings and washings, 4 obtained during cystoscopy, and 2 obtained from the kidney) obtained over a 10-year period were selected for the study. Only cases with well-fixed, well-prepared, liquid-based preparations were chosen for inclusion in the study. The majority of cases were voided urines. All specimens were Papanicolaou-stained ThinPrep® preparations. Each case was reviewed independently by four cytopathologists unaware of the previous diagnoses and unaware of the diagnoses given by other cytopathologists participating in the study. The cytopathologists had between 6 and 25 years' experience with interpreting urinary cytology. The categories used for assignment were those of the Paris System and included: unsatisfactory, NHGUC, AUC, LGUN, SHGUC, HGUC, and other malignancies. The four review cytopathologists had not been involved in the development of The Paris System for Reporting Urinary cytology, but each cytopathologist read the monograph entitled, The Paris System for Reporting Urinary Cytology and applied the criteria as outlined in the relevant chapters (3, 4, 5, 6, and 7). The nuclear-cytoplasmic (N/C) ratio was estimated visually using the definitions proffered in the Paris System monograph [Table 1] for criteria used]. One of the cytopathologists had attended lectures at national meetings outlining the Paris System. The category LGUN was recognized as a subcategory of NHGUC and was only used where papillary groups of urothelial cells were present with well-defined fibrovascular cores.
|Table 1: Criteria for assignment to the diagnostic categories of The Paris System for Reporting Urinary Cytology|
Click here to view
Overall agreement was assessed using absolute agreement and weighted chance-corrected agreement (kappa). For weighted kappa, concordant results were given full credit (1) and discordant results were given half credit if the discordance was off by a single category [Table 2]. For kappa calculations, the categories were ordered as follows: NHGUC greater than AUC greater than LGUN greater than SHGUC greater than HGUC. The categories of “unsatisfactory” and “other malignancy” were excluded from the agreement analysis. The overall average absolute agreement and expected agreement were calculated by determining the agreement between each pair of observers and calculating the weighted average (each pair had a different number of cases due to exclusion of cases classified as unsatisfactory or other). Statistical calculations were performed using Stata 14 (StataCorp., College Station, Texas, USA). Weighted averages were calculated using the MetaProp command. The average kappa statistics were calculated using the Kapci command.
To further characterize agreement, we investigated the reliability of specific rating categories. To that end, we tabulated the results of all the rater comparisons and determined the frequency of each possible combination of ratings (NHGUC vs. AUC, NHGUC vs. SHGUC, etc.). We categorized each combination as high clinical impact, low clinical impact, or no impact according to the potential impact of a misclassification on clinical management. We determined the reliability of individual categories by determining the conditional probability of a disagreement by a second rater, given a particular rating by the first rater.
| » Results and Discussion|| |
The diagnoses were unevenly distributed across categories [Table 3]. NHGUC (44.5%) and AUC (22%) were the most common diagnoses. Of 1902 comparisons between rater pairs, 999 (52.5%) showed complete agreement, 451 (23.7%) differed by a single category, 228 (12%) differed by two categories, 199 (10.5%) differed by three categories, and 25 (1.3%) differed by four categories. Using the impact categories in [Table 4], 14.7% of the disagreements were considered “high clinical impact.” The average absolute agreement was 65% (95% confidence interval: 63–67). The average expected agreement (due to chance alone) was 44% (95% confidence interval: 42–46). The average chance-corrected agreement (kappa) was 0.32 (95% confidence interval: 0.28–0.32).
The agreement by category is shown in [Table 5]. Raters had the highest agreement (63%) on NHGUC. Raters showed poorer agreement on the other categories. For example, given a diagnosis of SHGUC by one rater, the chance of agreement was only 13% and the second diagnoses were evenly distributed over the other categories. [Figure 1], [Figure 2], [Figure 3], [Figure 4] show examples of specimens with poor interobserver agreement.
|Figure 1: Photomicrograph of representative field from a case evaluated as “atypical urothelial cell” by two reviewers, suspicious for high-grade urothelial carcinoma by one observer, and high-grade urothelial carcinoma by the final reviewer (Papanicolaou, ×600)|
Click here to view
|Figure 2: Photomicrograph of a specimen designated as atypical urothelial cell by two reviewers and as suspicious for high-grade urothelial carcinoma by two reviewers|
Click here to view
|Figure 3: Photomicrograph of a specimen designated as atypical urothelial cell by three observers and as negative for high-grade urothelial carcinoma by one observer|
Click here to view
|Figure 4: Photomicrograph of a specimen designated as atypical urothelial cell by two observers, as suspicious for high-grade urothelial carcinoma by a single observer, and as malignant by the final observer|
Click here to view
Cytologic analysis of specimens obtained from the urinary tract remains a cornerstone for the diagnosis and surveillance of patients with urothelial carcinoma. The sensitivity and specificity of cytologic analysis are excellent for HGUCs. Cytologic analysis is accurate for both superficial high-grade lesions as well as those extending deeper into the bladder wall. Unfortunately, the sensitivity of cytologic study for the diagnosis of LGUNs is significantly poorer., In addition, noncarcinomatous papillary lesions exist (papilloma and papillary urothelial lesion of low malignant potential) complicating the cytologic diagnosis of low-grade urothelial carcinomas. The Papanicolaou Society of Cytopathology Task Force recommendations acknowledged this issue and included the category “atypical urothelial cells.” More recently, a number of investigators have studied the clinical significance of the atypical category and attempted to define useful morphologic criteria to further stratify malignancy risk. Published data are inconclusive as to whether specimens designated “atypical” have increased risk over those designated “negative.”, Cytologic features have been identified which correlate with increased risk for low-grade urothelial carcinoma.
Previously, many cytopathologists subdivide the atypical category into “atypical, suspicious for malignancy” and “atypical, favor reactive.”, Such a subdivision may increase diagnostic accuracy, but the malignancy risk of these two categories and their reproducibility are incompletely understood. The relationship between the category “atypical suspicious for malignancy” and the presence of low-grade or HGUC is also poorly defined.
Recently, The Paris System for Reporting Urinary Cytology was developed to improve diagnostic accuracy and clarify the usage and implications of the atypical category or categories. This system effectively has two atypical categories named as: (1) AUCs and (2) SHGUC. The categories AUCs and SHGUC are defined by cytologic features including the N/C ratio. Recently, Layfield et al. demonstrated imperfect interobserver reproducibility of estimates of N/C ratio, and Zhang et al. demonstrated that morphologists overestimated the N/C ratio. The category LGUN is well defined by both cellular and architectural features. The category LGUN should have limited usage and is considered a subcategory of NHGUC. While the categories of The Paris System have good estimates of malignancy risk, we are unaware of any study documenting the interobserver reproducibility of these categories. Such data are needed to estimate the precision of The Paris System for Reporting Urinary Cytology.
Our study was designed to evaluate diagnostic precision not diagnostic accuracy of The Paris System for Reporting Urinary Cytology, and thus, follow-up histologic diagnoses were not necessary. We investigated the reproducibility of category assignment by four cytopathologists reviewing 357 specimens. The clinical utility of any classification system depends on both its precision (reproducibility) and its diagnostic accuracy. Prior classifications of urinary tract specimens were often imprecise in relation to the term “atypical” and for the diagnosis of low-grade urothelial carcinomas. The frequency of use of the term “atypical” has ranged from 2% to 31%, and the risk of malignancy associated with the term has varied from 8.3% to 37.5%.,,,,,, Hence, use of the term varies widely as does its relationship to the presence of malignancy. Recently, The Paris System for Reporting Urinary Cytology was developed to improve both precision and accuracy for the diagnosis of urothelial carcinoma. Definitive diagnostic criteria were proposed. While malignancy risk was associated with each category, little information is available on the precision of assignment of specimens to these diagnostic categories. Brimo and Auger pointed out that even when the “atypical” category is precisely defined and accurately applied, variability in the incidence of the “atypical” diagnosis and its association with carcinoma will vary between laboratories and patient populations studied. Our study reported an AUC rate of 22%. While this is high in comparison to some reports,,, it is less than the AUC rates reported by Rosenthal et al. (31%) and Brimo andAuger (26%). Our finding of an AUC rate of 22% continues to demonstrate the difficulty associated with classification of some urine specimens despite the more precise definitions of The Paris System. The atypical category is often associated with poor reproducibility or precision as recognized in the Papanicolaou Society of Cytopathology guidelines for the pancreaticobiliary and respiratory systems. The atypical cells of uncertain significance (ACUS) category also presents challenges for the Bethesda system for cervical cytology. This was addressed by adding categories ACUS high and ACUS low, which do not necessarily improve precision but appear to have increased the clinical utility of the Bethesda system for cervical cytology. Despite the poor precision of the category, AUCs, this category appears to have clinical value because it stratifies malignancy risk. Because of the clinical value of the atypical category, more reproducible criteria are desirable. While the N/C ratio would appear to be measurable and reproducible, it appears imperfect in clinical practice and in a number of morphometric studies. A potential problem with the Paris System for urinary cytology is not the individual criteria including N/C ratio but the propensity for cytopathologists to use atypical categories as a “wastebasket” for cases where a cytopathologist is not certain a specimen is benign but does not see sufficient changes to be comfortable with a suspicious for malignancy diagnosis. In some of these cases just calling a specimen atypical may seem like a safe diagnostic option for some cytopathologists despite the published criteria. Large linear regression studies may be helpful in developing more reproducible criteria for the atypical category.
We assessed clinical impact of disagreement for the diagnostic pairs as shown in [Table 4]. In the majority of diagnostic pairings, the diagnoses varied by only one category and the clinical impact of such a disagreement between observers would be low with little effect on management. However, disagreement by two or more categories could have a significant impact on clinical management. Hence, classification discrepancy pairs or “NHGUC vs. SHGUC,” “AUC vs. HGUC,” “LGUN vs. HGUC,” and “NHGUC vs. HGUC” could have a significant impact on how a patient is clinically managed. Such “high impact” disagreement occurred in approximately 15% of cases [Table 4]. The probability that a second rater would agree with the initial rater varied considerably [Table 5]. Agreement for the category NHGUC was good at 63%. Agreement for the other categories was poor, especially for the categories LGUN (6%) and SHGUC (13%). Even the categories AUC and HGUC had relatively poor agreement between raters (21%). The Paris System for Reporting Urinary Cytology demonstrates good precision for the diagnosis of NHGUC but had poor interobserver reproducibility (precision) for the other categories. This finding is similar to issues identified with prior categorization proposals.,,,, Kassouf observed that the category “atypical urothelial cells” is frequently reported raising concerns regarding the possible presence of cancer which may lead to over investigation and excessively prolonged surveillance. The new Paris System may reduce the percentage of urinary tract samples designated as “atypical.” Our series reported an AUS rate of 22%, which is lower than that found by some authors using other classification systems but is still higher than an optimal AUS rate of below 10% as reported by some authorities.,,
| » Conclusions|| |
Clinical outcomes analysis will be necessary to determine the implications of our findings of poor interobserver agreement for the majority of the categories used in The Paris System for Reporting Urinary Cytology. Our findings suggest that the Paris System suffers from an unacceptably low precision in some of its diagnostic categories. Further, large studies will be necessary to demonstrate the precision and accuracy of The Paris System for Reporting Urinary Cytology.
| » Competing Interests Statement by All Authors|| |
The author(s) declare that they have no competing interests.
| » Authorship Statement by All Authors|| |
All authors of this article declare that we qualify for authorship as defined by ICMJE.
Each author has participated sufficiently in the work and take public responsibility for appropriate portions of the content of this article.
TL carried out data collection and organized. LJL developed the concept and wrote the article. ME, SRF, and TG reviewed slides. RLS completed the statistical analysis. All authors read and approved the final manuscript.
| » Ethics Statement by All Authors|| |
This study was conducted with approval from Institutional Review Board (IRB) of all the institutions associated with this study as applicable.
Authors take responsibility to maintain relevant documentation in this respect.
| » List of Abbreviations (In Alphabetic Order)|| |
ACUS - Atypical cells of uncertain significance
AUC - Atypical urothelial cells
HGUC - High grade urothelial carcinoma
LGUN - Low grade urothelial neoplasm
N/C - Nuclear/cytoplasmic
NHGUC - Negative for high grade urothelial carcinoma
SHGUC - Suspicious for high grade urothelial carcinoma
USA - United States of America.
| » References|| |
Papanicolaou GN. Cytology of the urine sediment in neoplasms of the urinary tract. J Urol 1947;57:375-9.
Koss LG, Bartels PH, Sychra JJ, Wied GL. Diagnostic cytologic sample profiles in patients with bladder cancer using TICAS system. Acta Cytol 1978;22:392-7.
Murphy WM, Soloway MS, Jukkola AF, Crabtree WN, Ford KS. Urinary cytology and bladder cancer. The cellular features of transitional cell neoplasms. Cancer 1984;53:1555-65.
Ooms EC, Veldhuizen RW. Cytological criteria and diagnostic terminology in urinary cytology. Cytopathology 1993;4:51-4.
Layfield LJ, Elsheikh TM, Fili A, Nayar R, Shidham V; Papanicolaou Society of Cytopathology. Review of the state of the art and recommendations of the Papanicolaou Society of Cytopathology for urinary cytology procedures and reporting: The Papanicolaou Society of Cytopathology Practice Guidelines Task Force. Diagn Cytopathol 2004;30:24-30.
Toma MI, Friedrich MG, Hautmann SH, Jäkel KT, Erbersdobler A, Hellstern A, et al
. Comparison of the ImmunoCyt test and urinary cytology with other urine tests in the detection and surveillance of bladder cancer. World J Urol 2004;22:145-9.
Moonen PM, Merkx GF, Peelen P, Karthaus HF, Smeets DF, Witjes JA. UroVysion compared with cytology and quantitative cytology in the surveillance of non-muscle-invasive bladder cancer. Eur Urol 2007;51:1275-80.
Rosenthal DL, Wojcik EM, Kurtycz DF. In: The Paris System for Reporting Urinary Cytology. Cham, Switzerland: Springer International Publishing; 2016. p. 1-159.
Hughes JH, Raab SS, Cohen MB. The cytologic diagnosis of low-grade transitional cell carcinoma. Am J Clin Pathol 2000;114(Suppl 1):S59-67.
Renshaw AA, Nappi D, Weinberg DS. Cytology of grade 1 papillary transitional cell carcinoma. A comparison of cytologic, architectural and morphometric criteria in cystoscopically obtained urine. Acta Cytol 1996;40:676-82.
Deshpande V, McKee GT. Analysis of atypical urine cytology in a tertiary care center. Cancer 2005;105:468-75.
Brimo F, Vollmer RT, Case B, Aprikian A, Kassouf W, Auger M. Accuracy of urine cytology and the significance of an atypical category. Am J Clin Pathol 2009;132:785-93.
Renshaw AA. Subclassifying atypical urinary cytology specimens. Cancer 2000;90:222-9.
Layfield LJ, Schmidt RL, Esebua M, Frazier SR, Hammer RD, Bivin WW, et al
. Accuracy and reproducibility of nuclear/cytoplasmic ratio assessments: An intra-and interobserver study. Diagn Cytopathol 2017;45:107-12.
Zhang ML, Guo AX, VandenBussche CJ. Morphologists overestimate the nuclear-to-cytoplasmic ratio. Cancer Cytopathol 2016;124:669-77.
Piaton E, Decaussin-Petrucci M, Mege-Lechevallier F, Advenier AS, Devonec M, Ruffion A. Diagnostic terminology for urinary cytology reports including the new subcategories 'atypical urothelial cells of undetermined significance' (AUC-US) and 'cannot exclude high grade' (AUC-H). Cytopathology 2014;25:27-38.
Muus Ubago J, Mehta V, Wojcik EM, Barkan GA. Evaluation of atypical urine cytology progression to malignancy. Cancer Cytopathol 2013;121:387-91.
Bhatia A, Dey P, Kakkar N, Srinivasan R, Nijhawan R. Malignant atypical cell in urine cytology: A diagnostic dilemma. Cytojournal 2006;3:28.
] [Full text]
Mokhtar GA, Al-Dousari M, Al-Ghamedi D. Diagnostic significance of atypical category in the voided urine samples: A retrospective study in a tertiary care center. Urol Ann 2010;2:100-6.
] [Full text]
VandenBussche CJ, Sathiyamoorthy S, Owens CL, Burroughs FH, Rosenthal DL, Guan H. The Johns Hopkins Hospital template for urologic cytology samples: Parts II and III: Improving the predictability of indeterminate results in urinary cytologic samples: An outcomes and cytomorphologic study. Cancer Cytopathol 2013;121:21-8.
Brimo F, Auger M. The atypical urothelial cell category in the Paris system: Strengthening the Achilles' heel. Cancer Cytopathol 2016;124:305-6.
Barasch S, Choi M, Stewart J 3rd
, Das K. Significance of atypical category in voided urine specimens prepared by liquid-based technology: Experience of a single institution. J Am Soc Cytopathol 2014;3:118-25.
Rosenthal DL, Vandenbussche CJ, Burroughs FH, Sathiyamoorthy S, Guan H, Owens C. The Johns Hopkins Hospital template for urologic cytology samples: Part I-creating the template. Cancer Cytopathol 2013;121:15-20.
Kassouf W. The value of urine cytology in the workup of hematuria. Cancer Cytopathol 2016;124:303-4.
[Figure 1], [Figure 2], [Figure 3], [Figure 4]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]