Week 5

I’ve tried FlanALPACA, GPT-3.5, ALPACA, FLAN-T5-XL, REDPAJAMA, FALCON, and 4-5 different prompts for each, as well as different parameter size variants for each model across the first ~200 abstracts in one of Byron’s trialstreamer datasets

At this point, I’m looking into evaluating the massive text that I’ve generated and outputted in tens of csvs… My annotation abilities are pretty poor since I’m not a linguistics student, but I’m able to at least highlight and identify the extremely problematic and non-factual generations

Furthermore, I’ve been trying and looking into many automatic metrics, like BLEU, SARI, BERTScore, ROUGE, etc: image

One of the irritating black box model API errors is that the server often gets overloaded: image

I heard the GPT-4 API is coming out soon… so will be definitely using that for future model generations

Written on June 29, 2023