Real-MedNLP Test Collection

Update: July 27, 2022, Download file: zip

Overview

NTCIR-16 Real-MedNLP (Real document-based Medical Natural Language Processing) is a shared task workshop for medical language processing using actual medical documents (case reports and radiology reports). The goal of this task is to promote the development of practical systems that support various medical services. The Real-MedNLP task has two corpus-based tracks (MedTxt-CR Track and MedTxt-RR Track) in Japanese and English, each with three subtasks.

We created MedTxt-CR Corpus (100 reports as the training set and 100 reports as the test set) in Japanese and English and MedTxt-RR Corpus (72 texts as the training set and 63 texts as the test set) in Japanese and English for the Real-MedNLP task. Each is a cross-lingual corpus of Japanese texts translated into English. For this test collection, we will provide a part of the MedTxt-CR Corpus (100 reports as the training set only due to licensing issues) and the whole MedTxt-RR Corpus (72 texts as the training set and 63 texts as the test set).

For more details, please check references and task web page.

References

Shuntaro Yada, Yuta Nakamura, Shoko Wakamiya, and Eiji Aramaki: Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop, Methods of Information in Medicine (2024) [OPEN ACCESS]

Licence

CC BY 4.0