MiniMax has introduced OctoCodingBench, a benchmark designed to evaluate how well coding agents follow process-level instructions. The study found that while many models can pass output tests, they may disregard rules such as naming conventions and user preferences. Several open-source models surpassed closed-source counterparts in overall rule compliance.