Week 7

Moved to Austin! My first time getting scorched in 103+ degree weather throughout the day…

One irritating thing is that there have been a lot of compatability issues with using slighlty older packages (QAFactEval):

I’ve tried to downgrade various python versions across the system, as well as reinstalling and uninstalling different versions of the dependencies… but after 20 frustrating hours I am extremely frustrated

Alternatively, I’ve also read related literature on using GPT-3.5 to evaluate the factuality and consistency through different methods (Direct Score 1-100, Chain of Thought, 5 Star, and Binary Decision). I think that this might have better performance than the older non-LLM based factuality metrics (the reality of the modern day LLMs…) so hopefully the QAFactEval incompatability errors don’t matter

On a more positive note, I’m excited to meet Professor Li and go to the RLP research office!

Written on July 13, 2023