RadNLP 2024 shared task: Natural Language Processing for Radiology (NTCIR-18)

RadNLP 2024 shared task

Natural Language Processing for Radiology (NTCIR-18)

RadNLP 2024 shared task

Natural Language Processing for Radiology (NTCIR-18)

English

RadNLP 2024 shared task

Natural Language Processing for Radiology (NTCIR-18)

English

Task overview

Important dates

How to participate

FAQ

About us

Contact

RadNLP 2024 (Natural Language Processing for Radiology) is a shared task in the international conference NTCIR-18, organized by the National Institute of Informatics in Japan.

We propose the tasks, publish the dataset, and call for solutions from participants. RadNLP 2024 aims to create open medical data and contribute insights back to medical and informatic communities.

Aug 28, 2024: Released sample data of the main task & sub task
Jul 8, 2024: Updated FAQ
Jul 3, 2024: Updated FAQ
Jun 21, 2024: Updated the sub task instruction
Jun 19, 2024: Updated staff list
Jun 12, 2024: Published FAQ
Jun 5, 2024: Added the hyperlinks to our past shared tasks
Jun 4, 2024: Announced about the sub task
May 29, 2024: Opened the registration period!
Apr 16, 2024: Added the description of English / Japanese tracks
Apr 14, 2024: Added the official X (formerly Twitter) account (@radnlp)
Apr 14, 2024: Renewed the theme color and the key visual
Apr 7, 2024: Launched the English website

Task overview

1. Motivation

RadNLP 2024 aims to automatically determine the stage (i.e., the degree of progression) of lung cancer from radiology reports.

Management of lung cancer is based on the stage, and radiology reports provide various related information by describing medical images such as CT and MRI.

However, radiology reports do not always specify the stage explicitly¹. This imposes extra workload on human experts for careful manual information extraction, which can be aided by automation.

¹ Sexauer R et al. Towards more structure: comparing TNM staging completeness and processing time of text-based reports versus fully segmented and annotated PET/CT data of non-small-cell lung cancer. Contrast Media Mol Imaging 2018:5693058.

2. Dataset

All radiology reports in RadNLP 2024 diagnose lung cancer at the initial evaluation. There are no reports of lung cancer post-treatment evaluation or different cancer.

Our datasets contain NO personal health information. The radiology reports are not derived from real medical institutions but are created with crowdsourcing by diagnosing de-identified images on Radiopaedia². Task participants requires no complex applications to use our datasets.

² Nakamura Y et al. Clinical Comparable Corpus Describing the Same Subjects with Different Expressions. Stud Health Technol Inform 2022:290:253-257.

3. Tracks and Tasks

RadNLP 2024 opens two independent tracks, English Track and Japanese Track:

English Track

The dataset consists of English radiology reports.

Japanese Track

The dataset consists of Japanese radiology reports.

Participants are welcome to join either the English track, the Japanese track, or both. Scoring and ranking will be conducted independently for each track.

Also, RadNLP 2024 consists of two tasks, the sub task and main task:

Sub task

Auxiliary task to classify sentences in radiology reports.

Main task

Automated lung cancer staging, the goal of this shared task.

3-1. Sub task: document segmentation

[UPDATED] Sub task is a document segmentation to identify up to eight spans related to the following topics:

(i) Omittable – Spans that are clearly free of any positive findings or clearly unrelated to lung cancer staging. In this definition, “clearly” means the high clarity that does not require detailed knowledge of the lung cancer staging criteria.

(ii) Measure – Span describing mainly the existence and diameter of the primary lesion.

(iii) Extension – Span describing the range of the primary lesion’s extension outside the lung parenchyma.

(iv) Atelectasis – Span pointing out atelectasis or obstructive pneumonia.

(v) Satellite – Span pointing out intrapulmonary metastasis or lymphangiomatosis carcinomatosa.

(vi) Lymphadenopathy – Span pointing out enlarged regional lymph nodes.

(vii) Pleural – Span pointing out pleural/pericardial effusion/dissemination.

(viii) Distant – Span pointing out distant metastasis outside the lung parenchyma.

In NLP terminology, this sub task is a multi-label sentence binary classification.

The segmentation is at the sentence level, and it is possible for a topic span to be discontinuous or for the same sentence to belong to more than one topic.

Participants are requested to determine whether each sentence falls into categories (i) to (viii). If it does, mark it as “1,” and if it does not, mark it as “0.” Therefore, each sentence requires eight binary answers.

Note that every sentence will either fall into only category (i) Omittable, or into one or more of categories (ii) Measure to (viii) Distant.

The dataset for Sub Task consists of the following two files:

Sample data for Sub Task is available below:

3-2. Main task: multi-label document classification for lung cancer staging

[NEW] Main task is a multi-label document classification to correctly determine T, N, and M categories for each radiology report.

T category - Assessment of the size and/or extension of the primary lesion. Available choices: T0, Tis, T1mi, T1a, T1b, T1c, T2a, T2b, T3, T4
N category - Assessment of the extent of lymph node metastasis. Available choices: N0, N1, N2, N3
M category - Assessment of the extent of distant metastasis. Available choices: M0, M1a, M1b, M1c

In RadNLP 2024, we follow the staging criteria by the Japan Lung Cancer Society (JLCS). Note that the JLCS criteria closely aligns the global standard, namely the 8th edition of the TNM Classification of Malignant Tumours by the Union for International Cancer Control (UICC).

The dataset for Main Task consists of the following two files:

Sample data for Main Task is available below:

Important dates

~~May 29, 2024: Kick-off event~~
July 2024: Release of the training and validation datasets
November 2024: Registration deadline
November 2024: Release of the test dataset
January 2025: Submission deadline of the prediction results
February 2025: Return of scores
March 2025: Submission deadline of the system paper draft
May 2025: Submission deadline of the camera ready version of the system paper
June 10–13, 2025 (JST): NTCIR-18 conference at the National Institute of Informatics, Tokyo, Japan

How to participate

1. Prepare your email address and click the link below:

2. Read the instruction carefully and open the online registration form.

3. In the registration form, choose “Yes” in Question 12, and check the track(s) to join in Question 13.

* Deciding whether to participate in sub task and/or main task is not mandatory at the time of registration.

FAQ

Q1. How does the sub task relate to the main task?

We expect that the sub task will support the main task.

Our aim is the NLP application for staging (i.e., the main task), which was also the focus of the last NTCIR-17 shared task.

However, the results from NTCIR-17 indicated that even the most advanced solutions at that time had potential for improvement.

Therefore, we are providing new sentence-level annotations (i.e., the sub task), in addition to document-level annotations, to enable participants to explore various new approaches.

Q2. Can I join two different teams?

Yes. It is fine for one person to join two or more different teams.

Q3. Can we solve the English track using the dataset provided in the Japanese track, or vice versa?

Yes. It is no problem to use the Japanese track’s dataset to solve the English track, or to use the English track’s dataset to solve the Japanese track.

If you re-use one track’s dataset in the other, please specify the detail in your system paper for the sake of reproducibility.

Our policy is that RadNLP 2024 is, rather than a competition, a workshop to welcome participants’ diverse approaches and share the insights widely.

Q4. Can we use external resources?

Yes. We will not pose any specific limits on the use of any external resources, including models, dictionaries, corpora, or datasets.

If you use external resources, please specify the detail in your system paper for the sake of reproducibility.

Our policy is that RadNLP 2024 is, rather than a competition, a workshop to welcome participants’ diverse approaches and share the insights widely.

Should you use external resources, please handle them with a maximum consideration not to violate human rights, especially privacy.

Q5. If we join two tracks, must we prepare two system papers and two presentations?

No. We request every team to submit ONE system paper and make ONE presentation, even if you participate in both the English and Japanese tracks.

Q6. If we solve two tasks, must we prepare two system papers and two presentations?

No. We request every team to submit ONE system paper and make ONE presentation, even if you solve both the main and sub tasks.

If you have any other questions, please feel free to contact us.

About us

Organizers

Co-chair

Yuta Nakamura

Department of Computational Diagnostic Radiology and Preventive Medicine, the University of Tokyo Hospital

Co-chair

Shouhei Hanaoka

Department of Radiology, Graduate School of Medicine, the University of Tokyo

Co-chair

Eiji Aramaki

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Co-chair

Shuntaro Yada

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Staff

Jun Kanzawa

Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo

Staff

Akira Katayama

Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo

Staff

Tomohiro Kikuchi

Data Science Center, Jichi Medical University

Staff

Ryo Kurokawa

Department of Radiology, The University of Tokyo Hospital

Staff

Wataru Gonoi

Department of Radiology, Graduate School of Medicine, the University of Tokyo

Staff

Peitao Han

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Staff

Kiyoto Hashimoto

Social Computing Laboratory, Nara Institute of Science and Technology (NAIST)

Collaborators

Adviser

Koji Fujimoto

Department of Advanced Imaging in Medical Magnetic Resonance, Graduate School of Medicine, Kyoto University

Adviser

Jonas Kluckert

Institute of Diagnostic and Interventional Radiology, University Hospital Zurich

Adviser

Michael Krauthammer

Department of Quantitative Biomedicine, University of Zurich

Past shared tasks

RadNLP 2024 has two preceding shared tasks, whose websites are available below:

Contact

E-mail: radnlp [at] googlegroups.com

Designed with WordPress