Skip to content



About MedNLP-SC

Medical Natural Language Processing for Social media and Clinical texts (MedNLP-SC) is one of the core tasks in NTCIR-17 for medical natural language processing using clinical texts written by physicians (radiology reports) or social media data. The goal of this shared task is to promote the development of practical systems that support various medical services.

The MedNLP-SC task has two corpus-based subtasks: Social media subtask (Adverse Drug Event detection for social media texts in Japanese, English, French, and German) and Radiology report subtasks (Named Entity Recognition and TNM staging for radiology reports in Japanese).


Social Media Corpus

We plan to provide multilingual corpus consisting of 20,000-30,000 artificial tweets with symptom labels in Japanese, English, French, and Germany, respectively.

Specifically, we plan to generate a maximum of 68,000 short messages in Japanese using T5, which is one of the state of the art pre-trained language models. After manual checking and symptom labeling, we extract 20,000 tweets and then translate them into English, French, and German by machine translation with manual check.

MedTxt-RR Corpus

This dataset comprises a set of 15 cases, in which 9 different radiologists describe the findings for each report. In total, 135 texts will be made available.

  • Training set: 72 texts
  • Test set: 63 texts

We plan to make the dataset larger by recruiting additional cancer cases and radiologists.

A radiology report is a type of clinical document that is written by a radiologist. Basically, they focus on a single radiology image and describes all potential findings (including potential diseases) that can be expected from the image. While reports and target images are paired, most research on radiology reports tends to focus only on images, due to the hype surrounding image-based AI  (such as automatic diagnosis of X-rays, CT, and MRI). One of the biggest problems when handling radiology reports is in the variety of writing styles. Although a diagnosis can be written in a variety of ways (diversity of expression), conventionally, only one report is created per image. As such, simply collecting reports from medical institutions may not yield enough information on the variability in reporting styles for the same diagnosis. Consequently, we included independent reports from multiple doctors for the same CT image.

An example of a radiology report with named entities recognized

Task Overview

Social Media (SM) Subtask
(Languages: Japanese, English, French, and German)​

SM-Subtask: Adverse drug event detection (ADE)

This task is to identify a set of symptoms caused by a drug. Technically, this subtask is a multi-labeling task for each tweet.

Given an input tweet: “After FU5 starts, I am suffering from dry cough and many mouth ulcers,” the expected output labels are “dry cough” and “mouth ulcers.” This task is similar to a hashtag recommendation because we can represent symptom labels as hashtags. After FU5 starts, I am suffering from dry cough and many mouth ulcers. #dry_cough #mouth_ulcers Thus, it is easy to challenge even for non-medical NLP groups.

Radiology Report (RR) Subtasks
(Language: Japanese)​

RR-Subtask (a): Named Entity Recognition (NER)

Because NER is the most fundamental information extraction task for MedNLP, we designed challenges regarding NER for our real clinical reports, which include only 100-200 reports. This corpus size scale tends to be regarded as “few-resource machine learning,” which is the de facto standard among any kind of MedNLP in general

This task is planned to be a guideline learning, which is an original task proposed and evaluated in Real-MedNLP in NTCIR-16. In the guideline learning, for each tag, we give only a handful of sentence examples. This simulates the training of human annotators, who often learn from annotation guidelines provided by researchers. Task participants can use any other resources outside this task if they find them useful for their methods. Participants are asked to output reports with IOB (the inside, outside, and beginning) tags.

<article id="JP0217-29" title="著明な好酸球増多を伴った非昏睡型急性肝不全の一例"> 
Case Study: <TIMEX3 type="AGE">53 year old</TIMEX3> female patient.
Chief Complaint: <d certainty="positive">Fever</d>.
Progress: Patient was <cc state="executed">seen</cc> at the dermatology department of our hospital <TIMEX3 type="DATE">2 years before 20XX</TIMEX3> and presented a <d certainty="positive">skin rash</d> that the diagnosis identified as <d certainty="positive">bullous pemphigoid</d>.
<m-key state="executed">Prednisolone (PSL)</m-key> <m-val>1 mg/kg/day</m-val> was introduced and the patient <TIMEX3 type="TIME">was</TIMEX3> managed with concomitant <m-key state="executed">immunomodulators</m-key> with <c>a gradual decrease</c> in the <m-key state="executed">PSL</m-key> level.
<m-key state="negated">PSL</m-key> was voluntarily discontinued in <TIMEX3 type="DATE">August, 20XX</TIMEX3> when the <m-key state="executed">PSL</m-key> dosage had been <c>reduced</c> to <m-val>6 mg/day</m-val>, but there was no <d certainty="negative">worsening of the skin rash</d>.
RR-Subtask (b): TNM staging​

This is a MedNLP original task to evaluate the generalization ability of NLP models when classifying radiology reports under multiple criteria. Focusing on cancer staging, task participants are required to assign four labels, tumor (T), lymph node (N), metastasis (M), and clinical stage (cStage) for each report according to predefined criteria. Early cancers will be labeled with small numbers such as T1N0M0, cStage 1, and advanced cancers will be labeled with large numbers such as T3N2M1, cStage 4. Cancer staging is essential for treatment and research, but it requires complex reasoning. We aim to explore the capability of NLP for automated cancer staging to aid clinical professionals.

The uniqueness of this task lies in that systems must be robust to revision of cancer staging criteria. The criteria are updated every few years reflecting medical advancements. Existing T, N, M, and cStage labels have to be reassigned under the new criteria, which is called “stage migration.” This impacts clinical research because stage migration must be applied to large patient databases, which is laborious if performed manually. This task would examine the possibility of NLP systems to continue to provide useful automation for the future.

We plan to simulate criteria revision by providing multiple sets of cancer staging criteria to the participants. Participants will be asked to output cancer stages for each of the criteria. 


  • March 2023: Dataset Release
  • (March-June 2023: Dry Run)
  • June 1, 2023: Registration Deadline
  • July 2023: Formal Run
  • August 1, 2023: Evaluation Result Release
  • August 1, 2023: Task overview paper release (draft)
  • September 1, 2023: Submission due of participant papers (draft)
  • November 1, 2023: Camera-ready participant paper due
  • December 2023: NTCIR-17 Conference in NII, Tokyo


Co-chair (general)

Eiji Aramaki, Ph.D. (Nara Institute of Science and Technology, Japan)

Co-chair (general)

Shoko Wakamiya, Ph.D. (Nara Institute of Science and Technology, Japan)

Co-chair (Social Media Subtask)​

Shuntaro Yada, Ph.D. (Nara Institute of Science and Technology, Japan)

Co-chair (Radiology Report Subtasks)​

Yuta Nakamura, M.D. (The University of Tokyo, Japan)

Akiko Aizawa, Ph.D. (NII, Japan)
Gabriel Herman Bernardim Andrade​​ (Nara Institute of Science and Technology, Japan)
Cyril Grouin, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Shouhei Hanaoka, M.D., Ph.D. (The University of Tokyo, Japan)
Thomas Lavergne, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Yuji Matsumoto, Ph.D. (RIKEN, Japan)
Faith Wavinya Mutinda​​ (Nara Institute of Science and Technology, Japan)
Aurélie Névéol, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Noriaki Nishida, Ph.D. (RIKEN, Japan)

Tomohiro Nishiyama (NAIST, Japan)
Patrick Paroubek, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Lisa Raithel (DFKI, Germany)
Roland Roller, Ph.D. (DFKI, Germany)
Hiroki Teranishi, Ph.D. (RIKEN, Japan)
Philippe Thomas, Ph.D. (DFKI, Germany)
Narumi Tokunaga (RIKEN, Japan)
Hui-Syuan Yeh (Université Paris-Saclay, CNRS, LISN, France)
Pierre Zweigenbaum‬, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)


Social Computing Laboratory, NAIST
Y’s READING inc.