This post is the second in our series of notes from ARVO. It follows an initial introductory post – here – that covered deep learning-based segmentation, with a focus on geographic atrophy. This dovetails nicely into these notes that cover prediction models in AI.
Segmenting outer retina and retinal pigment epithelium (RPE) loss lays the foundation for not only evaluating novel therapeutics to quantify their efficacy – does it slow the growth? – but also facilitates a prognosis for how vision would be affected. This would allow, for example, earlier interventions where necessary and improve patient management.
In those I followed, the prediction was functional outcomes in the form of best-corrected visual acuity (BCVA) or, very interestingly to predict the growth of GA. For BCVA the question asked is if machine learning can take whatever information is available at baseline, and predict visual acuity outcomes that result. Typically, the result was reported via correlation, but also based on classification where the target value was a binarized parameter of good and bad outcomes. Obviously, this makes for potentially clinically impactful methods, and certainly made for an interesting session.
Kawczynski et al. looked at the prediction of BCVA from color fundus photography (CFP). This used the Inception ResNet (not VGG16!) and reported results using correlation coefficients. It is actually work that has been presented previously and while interesting – it did use an external test set – the correlation was pretty good(R2=0.6), but disappointingly no effort was made to interpret how the network was making its decisions. Data came from the phase 3 MARINA (NCT00056836) and ANCHOR (NCT00061594) trials for nAMD.
Similarly, and also in an nAMD population, Akhlaq et al. used the rather aged AlexNet to do the same thing as Kawczynski, but here using a central B-scan from an OCT volume as input. The binarized the BCVA scores to have good/bad outcomes as the target inference and cross validation (I think, using 20 folds – ~5% being used for testing). Transfer learning used ImageNet, which to me at least is not clearly helpful. The AUC was 0.86, but correlation was not reported. Saliency maps were used to look at what, in the image data, was influencing the result. In the examples shown, this mostly pointed to clear pathologies in the B-scan, so a good sanity check of the methodology.
Novosel et al. looked at both CFPs and OCT data to predict visual acuity response (VAR) to anti-Vegf treatment (ranibizumab) where the data was from the nAMD CATT study (NCT00593450). Three models were evaluated: one using baseline characteristics (BL – age, BCVA and OCT parameters); another just BLs and CFPs; and a multimodal model that fused both the outputs of the other two models. The OCT parameters were thickness over various areas and were uncorrected. The short conclusion was that the models showed limited ability to predict VAR and the model using CFPs was the least good; meaning the CFPs added nothing or just muddied the waters. It would be interesting to know if the simple characteristics alone – i.e., age, BCVA etc. – would’ve been more predictive and also to ensure the OCT parameters were correct to understand if they were contributing in a meaningful way as segmentation errors are manifest using the device software.
Another study from the same Roche lab reported on treatment outcomes in the nAMD AVENUE trial (NCT02484690). Essentially, a similar approach was taken in that baseline characteristics were used (that include the OCT-based summary params) but the image data added here was the OCT data fed through a CNN. Again, image data added nothing, which in hindsight makes you wonder about the approach as a linear model with simple parameters was pretty much the best result and, in accordance with Occam’s razor, would be the best one to deploy. Experiments such as this, however, are the right thing to do as we collectively figure all this out.
Lachance et al. reported on VA outcomes post macular hole surgery based on the pre-operative OCT images. Interestingly, they use Google’s AutoML to build their model and, based on binarized outcomes (“<70 letters” vs. “≥70 letters”), reported an AUC of 0.80. The result was based on a single split of the data; 383 eyes, 80% training, 10% validation and 10% testing, and the input image was the central B-scan. It would be interesting here to look instead at the change in VA rather than the final result as that should be correlated to the initial, pre-operative acuity where measurable. The author said this was planned.
Some really interesting work from Bogunović et al., also based on the aforementioned FILLY trial, here to predict the growth of GA as . A U-Net was used to segment the GA area based on en face views of OCT volumes where the integration range was set based on the RPE segmentation. The output of the CNN feeds neural ordinary differential equations, a technique to model dynamics in an image. The equations are in this case learned and the resulting application is able to determine the growth speed at each pixel. 5-fold CV was used and when used to classify fast progressors, the AUC was 0.83, although direct correlation to growth rate was only fair at R2=0.32. To get a meaningful result on such a challenging problem, however, is really quite something and it will be great to see if this can be applied in the future.
Whereas Bogunović’s work only used FAF to assess ground truth GA areas as well as agreement to OCT-based GA segmentation, Yang et al. used all permutations of these modalities in their area and growth rate prediction of GA using deep learning. While area is less interesting as the correlation will always likely be high depending on the timelines, growth rate prediction performed very well the best correlation of R2=0.56 for FAF + OCT data; OCT data alone yielded an R2 of 0.48; with FAF alone at R2=0.52. The input OCT data was three OCT en face slabs around Bruch’s membrane and the CNN used was Inception V3.
Next Up: More Traditional Analyses
Again, please watch this space for more notes from ARVO. Next up are notes around more traditional, computer vision-based methods of analyses. Done correctly, these are of course critical to OCT analysis, do not require supervised learning – so handle well cases not in a “training distribution” – and are readily interpretable. They also do not need specialized hardware run on. We, of course, have used them extensively!
Update: part 3 is now live.