ARVO this year was very different to say the least. And while there was clearly a ton of excellent research presented, which was the one consistent thing about this year’s event, the inability to interact with fellow researchers was the significant down side. I visited many posters and left several questions, but the time lag to the answer and the lack of notification of an answer made interaction very difficult.
With that all said, there were so so many things to report on. From our perspective, we focused on imaging, imaging biomarkers and analysis methods. I can cut to the quick on the analysis methods as these were invariably deep learning-based, with some work on alignment, but no new algorithms there. The following reports on notables that caught our attention. This is by no means exhaustive – we had a booth to cover! – but it’s worth noting the general trend of more advanced analyses offered using deep learning and how this will be impacting clinical practice in the coming years.
Based on my notes, I’ll report on these in three parts, and then we’ll get to a review of the logistics of imaging in ophthalmology, in particular the shortfalls of formats, storage and interoperability. Here’s part one regarding posters of interest in deep learning, but first some background that will also serve as introduction to the next post.
Deep Learning based Analysis and Clinical Use
AI is a big deal, of course. Manufacturers are struggling to pretend they’re on top of things, but in reality they make hardware with software an afterthought. That’s an opinion, of course, but comes from experience and a good deal of time in the industry and collaborating with many clinical researchers. It is also a reason why Voxeleron exists and since our foundation I affirm this view by attending talks by the industry folk promoting their work at each conference. Evidence of who leads in these areas can certainly be heard through the noise. For example, Heidelberg recently announced collaboration with RetInSight to offer a deep learning-based retinal fluid segmentation algorithm in their software. RetInSight is based out of the Medical University of Vienna, long-time leaders in the development of analysis methods and biomarkers for OCT analysis. And they are affiliated to one of the world’s largest reading centers so data used to develop the methods is to hand.
We expect other manufacturers to follow suit as presentations that have their headline AI-based analysis tool being an emulation of an application that we at Voxeleron developed and released four years earlier, is quite simply not evidence of leading in the area of AI.
So what is really hot in deep learning at ARVO? It would seem transfer learning is where it is at based on the number of inference engines that used VGG16 or ResNet as their model. In my view, this all too often was a demonstration of how complex a mapping can be achieved using such architectures. Does it generalize to unseen data? Does it offer insight into the disease process itself? If, for example, you can map color fundus photos to OCT thickness maps, it would at least be nice to open up the black box and try to determine what features in the image are influencing the network’s decision. Without that, the results lack any kind of meaningful interpretation, and I would only be convinced of performance based on a prospective study. And the same can be said for a number of other such inference examples.
But two things jumped out to me as very interesting. The first was segmentation in OCT of incomplete retinal pigment epithelium (RPE) and outer retinal atrophy (iRORA) and complete RPE and outer retinal atrophy (cRORA). Here the session “AMD imaging – GA” was particularly interesting and this area fed directly into the session on “AI in the retina: Prediction models” which looked a lot at prognostics based on segmentation, tying these indications to functional endpoints (visual acuity).
The following discusses the first of these, briefly, ahead of next week’s post that will cover prognostics. After that, I want to go over the more traditional image analysis methods that are not reliant on machine learning. Yes, those can be good too!
In dry-AMD, being able to quantify the amount of geographic atrophy is incredibly useful in the development of potential therapeutics. Heidelberg offered a method of segmenting this in fundus autofluorescence (FAF) images, but the interest is clearly in OCT where, to date, the only manufacturer to offer such an algorithm is Zeiss’ “Sub-RPE illumination” offering. Interestingly, this was never even touched upon by any presentation, including one of their own posters, so perhaps it is now deprecated.
The session on GA segmentation was dominated by the Optima Lab who are undisputed leaders in this area. Dr. Schmidt-Erfurth presented on their quantification of GA areas that was used in the FILLY trial. Using their in-house algorithms, they could accurately quantify both RPE loss and photo-receptor (PR) loss. In doing this, they also were able to demonstrate that PR lesion sizes precede and were larger than RPE loss in GA. The take home messages being that 1) accurate quantification of relevant biomarkers for staging the disease is possible and 2) better to follow PR loss than RPE loss.
Monitoring GA lesion size was also demonstrated by Lachinov et al. from the same lab who also reported on the accuracy of their segmentation and its use in a clinical trial, here concentrating on the method. Interestingly, rather than working on the 2d en face images alone to delineate the area, their CNN comprised a 3d-to-2d network producing the final 2d segmentation. Using Iowa’s layer segmentation algorithm, the 3d OCT data is flattened to the RPE surface allowing them to select a 3d region of interest (ROI). Precision in the position of the RPE was not required. Dice scores were 0.91 and 0.93 at baseline and year one scans, which are impressive results.
Rather than using OCT data alone, Yang et al. from Genentech added FAF images as their input to the CNN. Their OCT data is also flattened along Bruch’s membrane (BM) to create three en face images: full depth, above BM and below BM. These, along with the FAF were used separately and together to determine GA lesion area and model annualized growth rate. Growth rate prediction benefited from use of both modalities, but not so for lesion segmentation.
Pramil et al. presented GA segmentation in swept-source OCT (SS-OCT). Average Dice coefficients relative to two graders were 0.88 and 0.87. The sample size is small compared to the work of the Optima lab and the results slightly lower, but this still looked impressive. Interestingly they use the train/validation split to learn model parameters and then train on all the training and validation data to build the final model that is applied to the test set. This gives me pause as I like to know when to stop training based on the validation data. Also of note, algorithm repeatability was lower for both segmentation and region growth estimation than for the manual gradings, implying that it is not such a hard task to perform manually; one would expect the computer to have the edge here.
Related work was a poster in another session from an industry collaboration, this time between RetinAI and a clinic in Lausanne. This reported on RPE atrophy growth rates over a mean follow up time of 43 months. While the data set looked very interesting – 99 eyes over almost 4 years! – no attempted validation of the segmentation itself was reported or referenced, so the interested reader will note that until this is done, the numbers are fairly arbitrary (something is changing, but we cannot be sure of what).
Next Up: Prognostics
Ok, please watch this space for notes on prognostics using deep learning as that’s next. If the segmentations are the nuts and bolts of measuring biomarkers, the next step is predicting how these might change, or indeed, how, from baseline data analyzed in such a way, the prognosis in terms of visual acuity looks.
Update: part 2 is now live.