Hyakuyaku Dictionary

A Large-scale Drug Name Dictionary Used to Analyze Drug Names in Real Clinical Settings

The process of extracting disease names and drug names from medical documents is carried out frequently in medical language processing research. However, in actual medical practice, abbreviations and English names for the drugs are often used instead of generic names for the drugs. Therefore, it is not possible to meet the demand for extracting all information on drugs using only standard names. Therefore, we have created data that can extract wider range of terms related to pharmaceuticals. We have named it “Hyakuyaku Dictionary”. This page contains download file for Hyakuyaku Dictionary and related data. Please use it freely.

Download

Hyakuyaku Dictionary Data

  • Coming Soon

Dictionary Data for MeCab

Hyakuyaku dictionary data whose format has been changed to allow it to be used together with the morphological analyzer MeCab.

  • Coming soon

Specifications

Excerpt from Hyakuyaku Dictionary

Contents of Hyakuyaku Dictionary

Column Name Description
出現形 (Occurrence form)Drug names extracted from the medical documents
collected by our laboratory (full-width characters)
出現形よみ (Occurrence form reading)Reading for the occurrence form
一般名 (Common name)Common name for the drug corresponding to the
occurrence form (full-width characters)
KEGG文書ID (KEGG Document ID)ID according to KEGG: Kyoto Encyclopedia of Genes and Genome 
DRUG database
頻度レベル (Frequency label)One of 20 levels defined at 5% intervals which denotes the
frequency of occurrence in the medical documents collected
by our lab for the given occurrence form

How to use dictionary for MeCab

1. Install morphological analyzer MeCab (Reference: Mecab official site, Our guide for installation and use)

  • Install MeCab on your device.
    For Windows, please make sure to select ‘SHIFT-JIS’ during installation.
  • Add the installed directory of MeCab as a path to the environment variables.

2. Reading dictionary data

Place the dictionary data (.dic file) directly under C drive, etc.

3. Morphological analysis using Hyakuyaku Dictionary

In Command Prompt, move to the folder containing the dictionary data and type out command “mecab -u [Dic_file]”. With this command, you can open MeCab and make the user provided dictionary as a dictionary file. If you input text containing disease names or symptoms, you can get morphological analysis of the text along with related information present in the Hyakuyaku Dictionary.

For example, if you read the dictionary data for the text “ステロイドの処方 (Prescription of steroids)” and analyze it (run command “mecab -u HYAKUYAKU-utf8_v202007.dic”), the information of the Hyakuyaku Dictionary will be added to the analysis results of “ステロイド (steroids)” as shown below.

Privacy Policy

This is the privacy policy for the deliverables produced and published by Social Computing Laboratory, Nara Institute of Science and Technology (referred hereafter as “our laboratory”). Please understand the contents of the product properly before using this software.

Disclaimer

The deliverable has been developed with the utmost care and a great attention to detail. However, complete and full reliability and robustness cannot be guaranteed. We do not accept responsibility for any problems that may occur as a result of using this application or data. Please use it at your own risk when using it.