RadNLP 2024 shared task: Natural Language Processing for Radiology (NTCIR-18)

RadNLP 2024 shared task

Natural Language Processing for Radiology (NTCIR-18)

RadNLP 2024 shared task

Natural Language Processing for Radiology (NTCIR-18)

English

RadNLP 2024 shared task

Natural Language Processing for Radiology (NTCIR-18)

English

Task overview

Important dates

How to participate

FAQ

About us

Contact

RadNLP 2024 (Natural Language Processing for Radiology) is a shared task in the international conference NTCIR-18, organized by the National Institute of Informatics in Japan.

We propose the tasks, publish the dataset, and call for solutions from participants. RadNLP 2024 aims to create open medical data and contribute insights back to medical and informatic communities.

Recent updates

Jan 16, 2025: Registration has been closed
Nov 28, 2024: Extended registration period to Jan 15, 2025
Nov 3, 2024: Updated schedule & announced the evaluation metrics
Aug 28, 2024: Released sample data of the main task & sub task
Jul 8, 2024: Updated FAQ
Jul 3, 2024: Updated FAQ
Jun 21, 2024: Updated the sub task instruction
Jun 19, 2024: Updated staff list
Jun 12, 2024: Published FAQ
Jun 5, 2024: Added the hyperlinks to our past shared tasks
Jun 4, 2024: Announced about the sub task
May 29, 2024: Opened the registration period!
Apr 16, 2024: Added the description of English / Japanese tracks
Apr 14, 2024: Added the official X (formerly Twitter) account (@radnlp)
Apr 14, 2024: Renewed the theme color and the key visual
Apr 7, 2024: Launched the English website

Task overview

1. Motivation

RadNLP 2024 aims to automatically determine the stage (i.e., the degree of progression) of lung cancer from radiology reports.

Management of lung cancer is based on the stage, and radiology reports provide various related information by describing medical images such as CT and MRI.

However, radiology reports do not always specify the stage explicitly¹. This imposes extra workload on human experts for careful manual information extraction, which can be aided by automation.

¹ Sexauer R et al. Towards more structure: comparing TNM staging completeness and processing time of text-based reports versus fully segmented and annotated PET/CT data of non-small-cell lung cancer. Contrast Media Mol Imaging 2018:5693058.

2. Dataset

All radiology reports in RadNLP 2024 diagnose lung cancer at the initial evaluation. There are no reports of lung cancer post-treatment evaluation or different cancer.

Our datasets contain NO personal health information. The radiology reports are not derived from real medical institutions but are created with crowdsourcing by diagnosing de-identified images on Radiopaedia². Task participants requires no complex applications to use our datasets.

² Nakamura Y et al. Clinical Comparable Corpus Describing the Same Subjects with Different Expressions. Stud Health Technol Inform 2022:290:253-257.

3. Tracks and Tasks

RadNLP 2024 opens two independent tracks, English Track and Japanese Track:

Participants are welcome to join either the English track, the Japanese track, or both. Scoring and ranking will be conducted independently for each track.

Also, RadNLP 2024 consists of two tasks, the sub task and main task:

3-1. Sub task: document segmentation

3-1-1. Overview

Sub task is a document segmentation to identify up to eight spans related to the following topics:

(i) Omittable – Spans that are clearly free of any positive findings or clearly unrelated to lung cancer staging. In this definition, “clearly” means the high clarity that does not require detailed knowledge of the lung cancer staging criteria.

(ii) Measure – Span describing mainly the existence and diameter of the primary lesion.

(iii) Extension – Span describing the range of the primary lesion’s extension outside the lung parenchyma.

(iv) Atelectasis – Span pointing out atelectasis or obstructive pneumonia.

(v) Satellite – Span pointing out intrapulmonary metastasis or lymphangiomatosis carcinomatosa.

(vi) Lymphadenopathy – Span pointing out enlarged regional lymph nodes.

(vii) Pleural – Span pointing out pleural/pericardial effusion/dissemination.

(viii) Distant – Span pointing out distant metastasis outside the lung parenchyma.

In NLP terminology, this sub task is a multi-label sentence binary classification.

The segmentation is at the sentence level, and it is possible for a topic span to be discontinuous or for the same sentence to belong to more than one topic.

Participants are requested to determine whether each sentence falls into categories (i) to (viii). If it does, mark it as “1,” and if it does not, mark it as “0.” Therefore, each sentence requires eight binary answers.

Note that every sentence will either fall into only category (i) Omittable, or into one or more of categories (ii) Measure to (viii) Distant.

3-1-2. Data

The dataset for Sub Task consists of the following two files:

Sample data for Sub Task is available below:

3-1-3. Evaluation metrics

[UPDATED] We calculate the following evaluation metrics, with the one marked with ★ used to sort the leaderboard:

Overall micro F2.0 - Sentence-wise average of F2.0 score calculated throughout seven labels from Measure to Distant
Inclusion micro F2.0 - Sentence-wise average of F2.0 score for Omittable label, where digits 0 and 1 are inverted while calculation
Measure micro F2.0 - Sentence-wise average of F2.0 score for Measure label
Extension micro F2.0 - Sentence-wise average of F2.0 score for Extension label
Atelectasis micro F2.0 - Sentence-wise average of F2.0 score for Atelectasis label
Satellite micro F2.0 - Sentence-wise average of F2.0 score for Satellite label
Lymphadenopathy micro F2.0 - Sentence-wise average of F2.0 score for Lymphadenopathy label
Pleural micro F2.0 - Sentence-wise average of F2.0 score for Pleural label
Distant micro F2.0 - Sentence-wise average of F2.0 score for Distant label

3-2. Main task: multi-label document classification for lung cancer staging

3-2-1. Overview

Main task is a multi-label document classification to correctly determine T, N, and M categories for each radiology report.

T category - Assessment of the size and/or extension of the primary lesion. Available choices: T0, Tis, T1mi, T1a, T1b, T1c, T2a, T2b, T3, T4
N category - Assessment of the extent of lymph node metastasis. Available choices: N0, N1, N2, N3
M category - Assessment of the extent of distant metastasis. Available choices: M0, M1a, M1b, M1c

In RadNLP 2024, we follow the staging criteria by the Japan Lung Cancer Society (JLCS). Note that the JLCS criteria closely aligns the global standard, namely the 8th edition of the TNM Classification of Malignant Tumours by the Union for International Cancer Control (UICC).

3-2-2. Data

The dataset for Main Task consists of the following two files:

Sample data for Main Task is available below:

3-2-3. Evaluation metrics

[UPDATED] We calculate the following evaluation metrics, with the one marked with ★ used to sort the leaderboard:

Joint accuracy (fine) - The proportion of radiology reports with accurate predictions for all the T, N, and M factors.
T accuracy (fine) - The proportion of radiology reports with accurate predictions for the T factor.
N accuracy (fine) - The proportion of radiology reports with accurate predictions for the N factor.
M accuracy (fine) - The proportion of radiology reports with accurate predictions for the M factor.
Joint accuracy (coarse) - Joint accuracy that ignores distinctions between Tis/T1mi/T1a/T1b/T1c, T2a/T2b, and M1a/M1b/M1c.
T accuracy (coarse) - T accuracy that ignores distinctions between Tis/T1mi/T1a/T1b/T1c and T2a/T2b.
N accuracy (coarse) - Identical to N accuracy (fine).
M accuracy (coarse) - M accuracy that ignores distinctions between M1a/M1b/M1c.

Important dates

~~May 29, 2024: Kick-off event~~
~~July 2024: Release of the training and validation datasets~~
~~November 2024 -> January 15, 2025: Registration deadline~~
~~November 2024 -> January 15, 2025: Release of the test dataset~~
~~11:59 PM (UTC), January 31, 2025: Submission deadline of the prediction results~~
~~0:00 AM (UTC), February 1, 2025: Return of scores~~
~~11:59 AM (UTC), March 1, 2025: Submission deadline of the system paper draft~~
11:59 AM (UTC), May 1, 2025: Submission deadline of the camera ready version of the system paper
June 10–13, 2025 (JST): NTCIR-18 conference at the National Institute of Informatics, Tokyo, Japan

How to participate

Registration has been closed.

~~1. Prepare your email address and click the link below:~~

~~2. Read the instruction carefully and open the online registration form.~~

~~3. In the registration form, choose “Yes” in Question 12, and check the track(s) to join in Question 13.~~

* Deciding whether to participate in sub task and/or main task is not mandatory at the time of registration.

FAQ

Q1. How does the sub task relate to the main task?

We expect that the sub task will support the main task.

Our aim is the NLP application for staging (i.e., the main task), which was also the focus of the last NTCIR-17 shared task.

However, the results from NTCIR-17 indicated that even the most advanced solutions at that time had potential for improvement.

Therefore, we are providing new sentence-level annotations (i.e., the sub task), in addition to document-level annotations, to enable participants to explore various new approaches.

Q2. Can I join two different teams?

Yes. It is fine for one person to join two or more different teams.

Q3. Can we solve the English track using the dataset provided in the Japanese track, or vice versa?

Yes. It is no problem to use the Japanese track’s dataset to solve the English track, or to use the English track’s dataset to solve the Japanese track.

If you re-use one track’s dataset in the other, please specify the detail in your system paper for the sake of reproducibility.

Our policy is that RadNLP 2024 is, rather than a competition, a workshop to welcome participants’ diverse approaches and share the insights widely.

Q4. Can we use external resources?

Yes. We will not pose any specific limits on the use of any external resources, including models, dictionaries, corpora, or datasets.

If you use external resources, please specify the detail in your system paper for the sake of reproducibility.

Our policy is that RadNLP 2024 is, rather than a competition, a workshop to welcome participants’ diverse approaches and share the insights widely.

Should you use external resources, please handle them with a maximum consideration not to violate human rights, especially privacy.

Q5. If we join two tracks, must we prepare two system papers and two presentations?

No. We request every team to submit ONE system paper and make ONE presentation, even if you participate in both the English and Japanese tracks.

Q6. If we solve two tasks, must we prepare two system papers and two presentations?

No. We request every team to submit ONE system paper and make ONE presentation, even if you solve both the main and sub tasks.

If you have any other questions, please feel free to contact us.

About us

Organizers

Co-chair

Yuta Nakamura

Department of Computational Diagnostic Radiology and Preventive Medicine, the University of Tokyo Hospital

Co-chair

Shouhei Hanaoka

Department of Radiology, Graduate School of Medicine, the University of Tokyo

Co-chair

Eiji Aramaki

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Co-chair

Shuntaro Yada

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Staff / Co-annotator

Jun Kanzawa

Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo

Staff / Co-annotator

Akira Katayama

Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo

Staff / Co-annotator

Tomohiro Kikuchi

Data Science Center, Jichi Medical University

Staff / Co-annotator

Ryo Kurokawa

Department of Radiology, The University of Tokyo Hospital

Staff / Co-annotator

Wataru Gonoi

Department of Radiology, Graduate School of Medicine, the University of Tokyo

Adviser

Koji Fujimoto

Department of Advanced Imaging in Medical Magnetic Resonance, Graduate School of Medicine, Kyoto University

Adviser

Jonas Kluckert

Institute of Diagnostic and Interventional Radiology, University Hospital Zurich

Adviser

Michael Krauthammer

Department of Quantitative Biomedicine, University of Zurich

Collaborators

Assistant Students

Staff

Kiyoto Hashimoto

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Staff

Peitao Han

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Staff

Yuki Tashiro

Kyushu University

Past shared tasks

RadNLP 2024 has two preceding shared tasks, whose websites are available below:

Contact

E-mail: radnlp [at] googlegroups.com

Designed with WordPress