A Large-scale Drug Name Dictionary Used to Analyze Drug Names in Real Clinical Settings
The process of extracting disease names and drug names from medical documents is carried out frequently in medical language processing research. However, in actual medical practice, abbreviations and English names for the drugs are often used instead of generic names for the drugs. Therefore, it is not possible to meet the demand for extracting all information on drugs using only standard names. Therefore, we have created data that can extract wider range of terms related to pharmaceuticals. We have named it “Hyakuyaku Dictionary”. This page contains download file for Hyakuyaku Dictionary and related data. Please use it freely.Specifications
Excerpt from Hyakuyaku Dictionary
Contents of Hyakuyaku Dictionary
Column Name | Description |
---|---|
出現形 (Occurrence form) | Drug names extracted from the medical documents collected by our laboratory (full-width characters) |
出現形よみ (Occurrence form reading) | Reading for the occurrence form (full-width characters) |
一般名 (Common name) | Common name for the drug corresponding to the occurrence form (full-width characters) according to KEGG DRUG database etc. |
信頼度レベル (Confidence label) | Confidence to the coding results |
頻度レベル (Frequency label) | One of 20 levels defined at 5% intervals which denotes the frequency of occurrence in the medical documents collected by our lab for the given occurrence form |
How to use dictionary for MeCab
1. Install morphological analyzer MeCab (Reference: Mecab official site, Our guide for installation and use)
- Install MeCab on your device. For Windows, please make sure to select ‘SHIFT-JIS’ during installation.
- Add the installed directory of MeCab as a path to the environment variables.