Data quality in a distributed learning environment
Vast amounts of data to improve cancer treatment decisions
We are looking at large amounts (tens of thousands) of clinical case transcriptions. A few hundred of these cases have been classified with observations from the radiologists, for example: unusual mass of tissue on the upper lobe of the left lung. However, annotating all of these cases by hand is extremely time consuming and error prone.
We will use NLP & ML tools to learn from the text and automatically annotate the bulk of the clinical cases. Similar projects have been run in the past, but the annotations produced were not of sufficient quality.
The resulting NLP tool could give a tremendous boost to clinical radiology, unlocking large volumes of knowledge currently locked in natural text.