As a pandemic desease, COVID-19 is still spreading throught-out the entire world. You might want to check the COVID-19 Dashboard by Johns Hopkins University at https://coronavirus.jhu.edu/map.html showing more than 20.000 new cases daily for the US and Brazil at the time of writing this text. Despite the development of genetic as well as antibody tests, 2D planar x-ray of the lung is often the first-in-clinics diagnosis choosen in case of severe respiratory disease. We think, it's worth to further optimize AI-driven tools to help radiologists distinguishing between COVID-19 and other lung desease, especially considering the enormous work-load during pandemic events.
What it does:
The pre-trained neuronal net "VGG16" works on 2D x-ray radiographs as a feature extraction layer. We added layers to train for disease classification, which are reflected by more or less prominent tissue alterations within the lung. The algorithm reached over 80% classification accuracy on a pneumonia dataset after just one hour of training on a dataset with around 5000 files, which shows great potential for future improvement!
How we built it:
We used Keras and Tensorflow on Colab. The pre-trained "VGG16" was just one starting point. Other pre-trained nets, e.g. ResNet50, InceptionV3 or Inception_ResnetV2 might be good picks as well. We built a pre-processing step, which checks images for file-type related issues (e.g. lossy compression, missing dpi value, RGB instead of grayscale or which encoding). Especially considering the increasingly available material in open databases, an automatic data quality assuarance as well as homogenization is of great importance.
Challenges we ran into:
Work collaboratively on CoLab with three people is not that easy considering productivity. We lost a lot of time due to broken kernels and "lost" code blocks. Considering the Waterkand Hackathon, the timing issue was most challenging. Maybe for AI projects, there should be a little bit more time due to the computational demand. One major drawback is the fact, that more or less all pre-trained NNs have been trained with color images, but we fed it with 8-bit grayscale data. As a consequence, all trained features-detectors which are trained on color-gradients or peak color intensity are worthless in our application. Considering the programming tasks, we did not manage to turn the NN into a grayscale-optimized detection system. Either we look for already x-ray-film trained NNs, or we build our own deep NN and train it ourselves, which would require a huge dataset of labeled x-ray films.
Accomplishments that we are proud of:
The ready code that we produced was able to distinguish the big validation dataset with an accuracy of more than 80%. Of course there is still space for improvement, but given the limited resources, this is a great starting point.
What we learned?
After getting this hands-on experience, we much better understand the complex problems, that come into play when working on "uncommen" grayscale data. But during this intensive weekend with very, very long nights, we learned a lot about more efficient cooperation, as we distributed the work at several stages. Moreover, we found several important programming designs, that go beyond the normal computer-science lectures covering programming languages, as we felt the preassure to finish something in a defined time frame.
Screening the web for further datasets, getting a grayscale-optimized deep NN and maybe apply it to related fields: On the one hand, planar radiographs of other body sites with other motivation, and on the other hand, transfering some of the gained knowledge to 3D volumetric datasets from computed tomography (CT-)datasets.
Wrapping up the hackathon project >>>Xividocx<<<
Deep Learning RNNs detects COVID-19 on radigraphs
- Radiograph-optimized version with pre-trained Deep Neural Networks VGG16 and ResNet50.
- Data Pre-Processing to analyse images for color-related "artefacts" in the form of annotations and/or compression artefacts + removal in the form of transformation to 8-bit grayscale jpegs
- Normal model.fit_generator() implementation and a model.fit() implementation, that saves intermediate steps and feeds them afterwards into a decoupled prediction algorithm.
- Model accuracy, loss, confusion matrix, and KFolds routine
- Extraction of network layer visualizations
- Adding more case fitting weights for the pretrained model
- Most importantly, integrate more labeled training data and spend more time on hyperparemter tuning and successive training the neuronal network. Soon publicly available on Github: https://github.com/td-ct/Xidivocx.git