Challenges in Training Artificial Intelligence to Process Medical Imaging Data





HOO-CHANG SHIN in his blog posted on Tuesday, November 1, 2016 shares his thoughts on “Challenges in Training Artificial Intelligence to Process Medical Imaging Data”

If an artificial intelligence (AI) system is to be trained to understand things and accurately diagnose medical figures, what will be the most effective information: is it a lot of general image data, or a small range of precise data?

System like Picture Archiving and Communication systems (PACS) could be used to create an AI system that will be able learn from radiology reports and images stored in PACS in order to generate evocative keywords of new images independently and index a broad number of images so that the researchers will be able search for images with specific characteristics, e.g.  “images with brain tumor" or "magnetic resonance images of muscle.” Nevertheless, the objectives of the research are even more outstanding.

One of the main goals of the research is to create an AI system which can automatically diagnose disease from the data collected from patient scans, which is something that our proposed system could not do. And to address its limits, Metathesaurus (a medical semantic database of the Unified Medical Language System [UMLS]) and the RadLex radiology language database was used to detect disease-specific words that are used in radiology from the reports. A negation detection algorithm was employed to determine if each detected disease word is being mentioned positively (disease is present) or negatively (disease is absent) in the descriptions of the image. Then, the convolutional neural networks were trained deeply in order to detect the presence or absence of a disease in a given medical image.

As it has been made known that the AI system is able to detect the presence or absence of a disease in a specific image, which is a supplement to AI generated keywords towards the diagnostic purpose.  2 different challenges were encountered.

Challenge 1:
Based on the Reports written by different doctors which varies widely, and multiple terminologies used in order to describe the same type of disease.
Matching the description variables to a single disease type and training an AI to understand them is a challenging event. To train the AI for more precision in disease detection result, 10% of the entire data that was available was used in the PACS. The description variability could have ignored by aiming to generate approximate keywords which describes each image, but then in the process of being specific in text mining process, about 90% of the available data was filtered out.
Challenge 2:
Doctors sometimes uses indefinite ways to describing a particular disease.
"Possibility of an infection abscess," "not obviously cysts," or "possibly due to cyst" are descriptors that are commonly used by doctors in other to advance investigation. It is more challenging to develop any concrete findings from unclear text descriptions, without examining such images manually

However, it is always desirable to have more data available when building a Machine Learning System (MLS), so that it can generalize the data better beyond the sample it has grasped during the training set.

There have been interesting questions as regarding the trade-offs in building an artificial intelligence that can understand medical data: Do we have to go wide and be less specific with the data used during training, or we should go small and be more specific with the data used during training?