Updated 2026-06-07 22:32 UTC
| # | Model | Overall | Factual | Reasoning | Format | Hardware | When |
|---|---|---|---|---|---|---|---|
| 1 |
llama3.2:3b
ollama
|
93% | 93% | 90% | 100% | Apple M4 · 16 GB | 2026-06-07 22:05 |
| 2 |
gemma2:9b
ollama
|
91% | 87% | 90% | 100% | Apple M4 · 16 GB | 2026-06-07 21:36 |
| 3 |
qwen2.5-coder:7b
ollama
|
91% | 87% | 90% | 100% | Apple M4 · 16 GB | 2026-06-07 19:35 |
| 4 |
qwen2.5:7b
ollama
|
91% | 87% | 90% | 100% | Apple M4 · 16 GB | 2026-06-07 21:40 |
| 5 |
mistral:7b
ollama
|
91% | 87% | 90% | 100% | Apple M4 · 16 GB | 2026-06-07 21:45 |
| 6 |
phi3.5:3.8b
ollama
|
87% | 87% | 90% | 80% | Apple M4 · 16 GB | 2026-06-07 21:55 |
| 7 |
phi4:14b
ollama
|
85% | 73% | 90% | 100% | Apple M4 · 16 GB | 2026-06-07 18:50 |
| 8 |
codellama:13b
ollama
|
80% | 80% | 70% | 100% | Apple M4 · 16 GB | 2026-06-07 20:32 |
| 9 |
llama3.1:8b
ollama
|
79% | 87% | 60% | 100% | Apple M4 · 16 GB | 2026-06-07 21:50 |
| 10 |
gemma4:latest
ollama
|
56% | 60% | 30% | 100% | Apple M4 · 16 GB | 2026-06-07 22:08 |
| 11 |
deepseek-r1:7b
ollama
|
17% | 13% | 0% | 60% | Apple M4 · 16 GB | 2026-06-07 19:28 |
| 12 |
deepseek-r1:14b
ollama
|
13% | 13% | 0% | 40% | Apple M4 · 16 GB | 2026-06-07 19:05 |
| 13 |
qwen3.6:latest
ollama
|
0% | 0% | 0% | 0% | — | 2026-06-07 18:04 |
| When | Model | Overall | Factual | Reasoning | Format |
|---|---|---|---|---|---|
| 2026-06-07 17:55 | gemma4:latest ollama |
61% | 73% | 40% | 80% |
| 2026-06-07 18:04 | qwen3.6:latest ollama |
0% | 0% | 0% | 0% |
| 2026-06-07 18:50 | phi4:14b ollama |
85% | 73% | 90% | 100% |
| 2026-06-07 19:05 | deepseek-r1:14b ollama |
13% | 13% | 0% | 40% |
| 2026-06-07 19:28 | deepseek-r1:7b ollama |
17% | 13% | 0% | 60% |
| 2026-06-07 19:35 | qwen2.5-coder:7b ollama |
91% | 87% | 90% | 100% |
| 2026-06-07 20:32 | codellama:13b ollama |
80% | 80% | 70% | 100% |
| 2026-06-07 21:36 | gemma2:9b ollama |
91% | 87% | 90% | 100% |
| 2026-06-07 21:40 | qwen2.5:7b ollama |
91% | 87% | 90% | 100% |
| 2026-06-07 21:45 | mistral:7b ollama |
91% | 87% | 90% | 100% |
| 2026-06-07 21:50 | llama3.1:8b ollama |
79% | 87% | 60% | 100% |
| 2026-06-07 21:55 | phi3.5:3.8b ollama |
87% | 87% | 90% | 80% |
| 2026-06-07 22:05 | llama3.2:3b ollama |
93% | 93% | 90% | 100% |
| 2026-06-07 22:08 | gemma4:latest ollama |
56% | 60% | 30% | 100% |
No per-prompt data.
| Model | Short TPS | Medium TPS | Long TPS | Short TTFT |
|---|---|---|---|---|
llama3.2:3b ollama |
43.9 | 38.8 | 37.8 | 540.3 ms |
phi3.5:3.8b ollama |
31.5 | 24.5 | 22.7 | 1121.3 ms |
gemma4:latest ollama |
26.7 | 26.2 | 26.3 | 12526.3 ms |
qwen2.5-coder:7b ollama |
23.3 | 21.9 | 22.1 | 445.7 ms |
qwen2.5:7b ollama |
22.8 | 22.4 | 22.3 | 443.7 ms |
mistral:7b ollama |
22.7 | 21.8 | 21.9 | 332.0 ms |
deepseek-r1:7b ollama |
20.0 | 19.6 | 19.5 | 4944.3 ms |
gemma2:9b ollama |
18.9 | 17.5 | 17.3 | 444.7 ms |
llama3.1:8b ollama |
18.7 | 16.8 | 16.5 | 806.7 ms |
phi4:14b ollama |
9.9 | 9.3 | 9.1 | 968.0 ms |
deepseek-r1:14b ollama |
8.6 | 8.2 | 6.0 | 12322.7 ms |
codellama:13b ollama |
0.3 | 0.0 | 0.0 | 4630.7 ms |
No domain results. Run bench-domain to generate data.