natural language is all you need

natural language is all you need.

A coding judge where your prompt is scored, not your code. Pick a small local model, coax it with words, climb the global leaderboard. Fewer tokens and smaller models win.

The old judge
write the code yourself
two_sum.py
def solve(nums, target):
    seen = {}
    for i, n in enumerate(nums):
        if target - n in seen:
            return [seen[target - n], i]

Syntax, edge cases, off-by-ones.

This judge
write a prompt. a small model codes.
your prompt
“two sum, hashmap, return sorted indices”
qwen2.5 · 0.5B
generated
def solve(nums, target):
    seen = {}
    for i, n in ...
Acceptedscore 899 · 5/5 tests

Global leaderboard

See top 100 →

Loading…

Problems

difficulty:kind:

Loading…