Skip to content



  • March 23, 2023: The schedule is updated.
  • March 10, 2023: Sample data of Social Media Corpus are available.

About MedNLP-SC

Medical Natural Language Processing for Social media and Clinical texts (MedNLP-SC) is one of the core tasks in NTCIR-17 for medical natural language processing using clinical texts written by physicians (radiology reports) or social media data. The goal of this shared task is to promote the development of practical systems that support various medical services.

The MedNLP-SC task has two corpus-based subtasks: Social media subtask (Adverse Drug Event detection for social media texts in Japanese, English, French, and German) and Radiology report subtask (TNM staging for radiology reports in Japanese).


Social Media Corpus

The data we provide for the challenge are artificially created tweets in several languages. We first generated Japanese tweets using a T5 model [1] fine-tuned on original Japanese Twitter messages. The generated tweets were then manually classified into two classes: those containing Adverse Drug Events vs. those not containing Adverse Drug Events. Finally, our annotators labeled the occurring symptoms in the tweets containing ADEs. The most frequent symptoms are listed below, all other symptoms are collected below the class “other”. After annotation, the tweets were translated to English, German, and French using DeepL. The labels are the same for each language.

For each language, we provide 10,000 tweets, divided into 80% training and 20% test set. All drugs are represented in both sets except for one drug, which is only present in the test set. This is supposed to simulate the release of a new drug to the public.

[1] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1, Article 140 (January 2020), 67 pages.

JA アザチオプリンを服用して2ヶ月経ちました。1週間くらいで全身の発疹はなくなり、かゆみもほぼ無くなっていたのですが、麻疹が少し残ってて怖かったなぁと思います。
EN I’ve been on Azathioprine for 2 months now, and after about a week the rash all over my body was gone and the itching was almost gone, but I still had a bit of measles and I think it was scary.
DE Ich nehme jetzt seit zwei Monaten Azathioprin, und nach etwa einer Woche war der Ausschlag am ganzen Körper verschwunden und der Juckreiz fast weg, aber ich hatte immer noch ein bisschen Masern, und ich glaube, das war beängstigend.
FR Je prends de l’azathioprine depuis deux mois maintenant, et après environ une semaine, l’éruption cutanée sur tout mon corps avait disparu et les démangeaisons avaient presque disparu, mais j’avais encore un peu de rougeole et je pense que c’était effrayant.
Positive for “Rash” and “Other”

JA アザチオプリン(イムラン)の副作用で脱毛がひどい。#潰瘍性大腸炎 <url>
EN Severe hair loss due to azathioprine (Imuran) side effects. #Ulcerative colitis <url>
DE Azathioprin (Imuran) Nebenwirkungen von schwerem Haarausfall. #Colitis ulcerosa <url>.
FR Effets secondaires de l’azathioprine (Imuran) sur la perte sévère de cheveux. #Colite ulcéreuse <url>.
Positive for “Alopecia”

JA <user_name> で、アザチオプリンの血中濃度を調べてきました。やはりステロイド性肝障害が関係してるのかも?血液検査では炎症反応は上がっていたのですが、脱毛症状や発熱には至ってないようです。
EN <user_name> So I’ve been checking blood levels of azathioprine. Could it still be related to steroid-induced liver damage? The blood test showed an elevated inflammatory response, but it did not seem to lead to hair loss symptoms or fever.
DE <user_name> Also, ich habe die Blutwerte von Azathioprin überprüft. Könnte es immer noch mit einem steroidbedingten Leberschaden zusammenhängen? Der Bluttest zeigte eine erhöhte Entzündungsreaktion, aber es schien nicht zu Haarausfall-Symptomen oder Fieber zu führen.
FR <user_name> Donc, j’ai vérifié les niveaux sanguins d’azathioprine. Pourrait-il encore être lié à des lésions hépatiques induites par les stéroïdes ? L’analyse de sang a montré une réponse inflammatoire élevée, mais elle ne semble pas entraîner de symptômes de perte de cheveux ni de fièvre.

Positive for “Liver damage”

MedTxt-RR Corpus

This dataset comprises a set of 15 cases, in which 9 different radiologists describe the findings for each report. In total, 135 texts will be made available.

  • Training set: 72 texts
  • Test set: 63 texts

We plan to make the dataset larger by recruiting additional cancer cases and radiologists.

A radiology report is a type of clinical document that is written by a radiologist. Basically, they focus on a single radiology image and describes all potential findings (including potential diseases) that can be expected from the image. While reports and target images are paired, most research on radiology reports tends to focus only on images, due to the hype surrounding image-based AI  (such as automatic diagnosis of X-rays, CT, and MRI). One of the biggest problems when handling radiology reports is in the variety of writing styles. Although a diagnosis can be written in a variety of ways (diversity of expression), conventionally, only one report is created per image. As such, simply collecting reports from medical institutions may not yield enough information on the variability in reporting styles for the same diagnosis. Consequently, we included independent reports from multiple doctors for the same CT image.

An example of a radiology report with named entities recognized

Task Overview

Social Media (SM) Subtask: Adverse drug event detection (ADE)
(Languages: Japanese, English, French, and German)

Task Definition: The participants can choose between the four different tracks Japanese, English, German and French. For each language, a separate system has to be submitted (it is also possible to submit the same system for several language tracks). 

The task itself is divided into two parts:

1. Classification of messages into “contains ADE” vs. “does not contain ADE”
2. Multi-labeling of symptoms for all documents containing ADEs

Input: tweet
Output: symptom labels (either positive (1) or negative (0))

Evaluation: For evaluation, the data was split into 80% training samples and 20% test samples. Since there are 23 + 1 classes an ADE-positive document can belong to, finding exact matches can be difficult. Therefore, we evaluate the system predictions in three ways:

  • All symptoms: Check whether all labels exactly match with the gold standard labels.
  • Each symptom: Check whether a single label matches with the corresponding gold standard label.
  • 2-WAY: Check whether the input contains at least one ADE or not.
ID text Headache ... Other
1 I have an Aspirin-induced headache. 1 0 0
2 I have a headache which I am treating with Aspirin. 0 0 0
3 I don't have an Aspirin-induced headache. 0 0 0
4 I have a headache. 0 0 0
5 I found an article on Aspirin-induced headache. 0 0 0

Examples: The examples above represent a simplified tweet. Only example ID1 gets a “positive (1)” label for the symptom “aspirin-induced headache”, since it describes an adverse event. All other examples ID2-ID5 do not describe adverse events and therefore all labels are set to “negative (0)”.

Radiology Report (RR) Subtasks: TNM staging
(Language: Japanese)

This is a MedNLP original task to evaluate the generalization ability of NLP models when classifying radiology reports under multiple criteria. Focusing on cancer staging, task participants are required to assign four labels, tumor (T), lymph node (N), metastasis (M), and clinical stage (cStage) for each report according to predefined criteria. Early cancers will be labeled with small numbers such as T1N0M0, cStage 1, and advanced cancers will be labeled with large numbers such as T3N2M1, cStage 4. Cancer staging is essential for treatment and research, but it requires complex reasoning. We aim to explore the capability of NLP for automated cancer staging to aid clinical professionals.

The uniqueness of this task lies in that systems must be robust to revision of cancer staging criteria. The criteria are updated every few years reflecting medical advancements. Existing T, N, M, and cStage labels have to be reassigned under the new criteria, which is called “stage migration.” This impacts clinical research because stage migration must be applied to large patient databases, which is laborious if performed manually. This task would examine the possibility of NLP systems to continue to provide useful automation for the future.

We plan to simulate criteria revision by providing multiple sets of cancer staging criteria to the participants. Participants will be asked to output cancer stages for each of the criteria. 


  • March 2023: Dataset release 
  • (March-June 2023: Dry run)
  • June 1, 2023: Registration deadline
  • June 24, 2023: Training data (final version) release
  • July 10, 2023: Test data release
  • July 17, 2023: Result submission
  • August 1, 2023: Evaluation result release
  • August 1, 2023: Task overview paper release (draft)
  • September 1, 2023: Submission due of participant papers (draft)
  • November 1, 2023: Camera-ready participant paper due
  • December 12-15, 2023: NTCIR-17 Conference in NII, Tokyo (Online presentation will be available)



Co-chair (general)

Eiji Aramaki, Ph.D. (NAIST, Japan)

Co-chair (general)

Shoko Wakamiya, Ph.D. (NAIST, Japan)

Co-chair (SM Subtask)

Shuntaro Yada, Ph.D. (NAIST, Japan)

Co-chair (RR Subtask)

Yuta Nakamura, M.D. (The University of Tokyo, Japan)

Gabriel Herman Bernardim Andrade​​ (NAIST, Japan)

Faith Wavinya Mutinda, Ph.D. (NAIST, Japan)

Tomohiro Nishiyama (NAIST, Japan)

Lisa Raithel (DFKI, Germany, TU Berlin, Germany, and Université Paris-Saclay, CNRS, LISN, France)

Roland Roller, Ph.D. (DFKI, Germany)

Philippe Thomas, Ph.D. (DFKI, Germany)

Cyril Grouin, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)

Thomas Lavergne, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)

Aurélie Névéol, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)

Patrick Paroubek, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)

Hui-Syuan Yeh (Université Paris-Saclay, CNRS, LISN, France)

Pierre Zweigenbaum‬, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)

Akiko Aizawa, Ph.D. (NII, Japan)

Shouhei Hanaoka, M.D., Ph.D. (The University of Tokyo, Japan)

Yuji Matsumoto, Ph.D. (RIKEN, Japan)

Noriki Nishida, Ph.D. (RIKEN, Japan)

Hiroki Teranishi, Ph.D. (RIKEN, Japan)

Narumi Tokunaga (RIKEN, Japan)


Y’s READING inc.