1 min readfrom Machine Learning

2b or not 2b ? Custom LLM Scheduling Competition [P]

Hey everyone,

I am generally interested in resource management and notably reducing the token cost for a given answer. So I just launched a Kaggle competition around a simple question: whether you should run a small model or not. I plan to add more model over time for better decision making.

Here is the competition: https://www.kaggle.com/competitions/llm-scheduling-competition

The idea:

  • You get questions from the MMLU benchmark
  • Instead of answering them, you decide:
    • 2b → run a small model
    • none → skip it

Then there is a cost-based metric:

  • running the model costs compute
  • running it when it fails is expensive
  • skipping when it would have worked is also penalized

So the goal is to minimise weighted cost.

Currently the set up is quite simple as the cost to run your model is no taken into account. Still it might be a first step in the right direction.

Curious to see what people come up with—rules, classifiers, or something more creative.

Happy to discuss ideas or answer questions!

submitted by /u/WERE_CAT
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#no-code spreadsheet solutions
#big data management in spreadsheets
#cloud-based spreadsheet applications
#real-time data collaboration
#financial modeling with spreadsheets
#real-time collaboration
#enterprise data management
#resource management
#token cost
#LLM scheduling
#weighted cost
#Kaggle competition
#cost-based metric
#compute
#small model
#decision making