2024 Week 1

Hi everyone! My first blog post of 2024! I’m really excited to start continue DREU this summer with Professor Jessy Li. I’m very happy to be co-leading with my friend and previous collaborator, Sebastian Joseph, as well as being advised by Professor Byron Wallace.

This summer, I’m working on a related line of work in factuality and medical applications in NLP, particularly for claim checking based off of Reddit Health claims. I am quite interested as this feels like a genuinely impactful project for digital health and NLP.

I have mainly been leading the retrieval part of the project so far. We are currently working with the RedHOT dataset as a source of Reddit claims, and retrieving from a set of 800,000 different medical abstracts from trialstreamer. I’m so far approaching the problem as an embeddings based approach, and have embedded about 350,000 of the 800,000 abstracts with an open sourced embedding model from Alibaba, Alibaba-NLP/gte-large-en-v1.5. Compute for this massive database especially on Colab has been challenging and extremely frustrating, except for when I have access to the A100 :)

I’m excited to move on and keep embedding and query for the most semantically relevant abstracts and where that takes the research next!

Written on May 31, 2024