Beat it by having Codex hand-craft weights: https://gist.github.com/N8python/02e41d156ec615328cde2e1e5c0e9d53 100% accuracy on 10 million random test cases w/ only 343 parameters. As a bonus, it uses the vanilla Qwen3 architecture, just with the right weights.