The workshop lauched a project which aims to compile a Japanese medical terminology dictionary for facilitating natural language processing (NLP) in clinical settings. The workshop lasted for 17 months which is characteristic of high degree of professionalism. Our team encompasses a passionate group of researchers from multiple backgrounds either in medicine or in engineering. As our achievements, we collected approximately 1 million Japanese medical terms from 22 existing resources. Moreover, our efforts have also been devoted to the integration of the hierarchical structures of the collected terms from heterogenous resources, such as synonyms, categories, hypernym and hyponym, which are vital in clinical NLP. As to our knowledge, the dictionary we compiled is the first work to realize the centralized management of massive Japanese medical terms for NLP.
More and more hospitals have moved from paper-based information management to electronic health record (EHR) system, which has enabled retrieval of massive clinical data. Clinical data is represented in structured and unstructured form. Structured data typically encodes demographics, lab values, medication lists, etc., which is easily searchable and easier to analyze. However, much of the clinically important data – signs and symptoms, symptom severity, disease status, etc. – are not provided in structured data fields, but rather are written in clinician generated narrative text. Natural language processing (NLP) provides a way for accessing this important data source, while morphological analysis for word segmentation is the first and most crucial step for Japanese NLP. It has been widely accepted that the use of dictionary can improve the performance of morphogical analyzer. Several Japanese medical glossaries have been compiled by different institutions for different purposes. However, these glossaries are not developed for NLP purpose and there is a lack of centralized management of existing resources. Considering the backgrounds, we think it necessary to integrate medical terms from heterogenous sources to create a new dictionary for the purpose of NLP in clinical settings. Our team was comprised of professionals with extensive experience in editing medical glossaries, experts in medical NLP and in web searching, also a medical doctor, and Ph.D. students. This workshop gathered a group of researchers passionate about medical NLP, who have devoted their efforts and finally realized this purpose.
The project lasted for 17 months. Our activities was performaed not only on the meetings which were held once a month, and also off the meetings. We made project planning, created designs for the data structure of the dicionary, took action from brainstorming to implenmentation, to reviewing the deliverables.
On the meetings, we:
1. Shared knowledge on existing medical glossaries,
2. Discussed and made agreements on the structure of the new dictionary,
3. Allocated work,
4. Reviewed the submissions.
Off the meetings, we:
1. Conducted deep investigations on existing resources,
2. Undertook the work of collecting medical terms and integrating their relations from heterogenous sources.
As to our knowledge, the dictionary we compiled is the first work to realize the centralized management of massive Japanese medical terms for NLP.
1. We have collected approximately 1 million Japanese medical terms from 22 resources, which comprises disease names, symptoms, anotomy, and treatment etc. A format has been developed to store the collected terms on the principle of keeping the original information.
2. We integrated the hierarchical structures of the collected terms from heterogenous sources, such as synonyms, categories, hypernym and hyponym, which are vital in clinical NLP.
The project lasted for more than a year, involving a lot of efforts of multiple experts from different professinal backgrounds. The dictionary as the achievements of this project is expected to be a great devotion for medical NLP.
White board, projector
Ma Xiaojun （東京大学医学系研究科 博士課程）
原稿執筆：Ma Xiaojun （東京大学医学系研究科 博士課程）