https://github.com/DOMjudge/domjudge/pull/444 contains an unpolished implementation prototype (which also throws an exception in the end). Brings down judging time from 165 to 65s.

Tobias Werth <tobias.werth@gmail.com> schrieb am Mi., 31. Okt. 2018 um 19:59 Uhr:
Hi,

Since I have noticed that we are doing very poorly for problems with many test cases, I have looked into what causes how much of the slowdown (compared to verifyproblem it's 50x for the insane WF problem).
The least invasive improvement we can do is in making some changes to the judgehost API.
I would like to do those changes before we make the next release aka before NWERC so that we don't (have to) change the API in the next release again.

Currently the API requests for one judging look like:

API request POST judgehosts/next-judging/bar-0
Judging submission s92 (endpoint default) (t1/p9/cpp), id j257...
API request GET config data=name=script_timelimit
API request GET config data=name=script_memory_limit
API request GET config data=name=script_filesize_limit
API request GET config data=name=process_limit
API request GET config data=name=output_storage_limit
^-- above requests could be grouped together already today, but that doesn't bring a big win (I'll do it anyway)

Working directory: /home/sitowert/domjudge/output/judgings/bar-0/endpoint-default/c2-s92-j257
API request GET contests/2/submissions/92/source-code
API request PUT judgehosts/update-judging/bar-0/257 data=compile_success=1&output_compile=a25pZ2h0REsuY2M6IEluIGZ1bmN0aW9uICdpbnQgbWFpbigpJzoKa25pZ2h0REsuY2M...
executing chroot script: 'chroot-startstop.sh start'
API request GET config data=name=timelimit_overshoot
^--- could also be grouped to the config requests above

API request GET testcases/next-to-judge/257
Running testcase 1...
API request POST judgehosts/add-judging-run/bar-0/257 data=testcaseid=94&runresult=correct&runtime=0.001&output_run=MTIK&output_error=&output_system=Q29ycmVjdC...
Testcase 1 done, result: correct
API request GET testcases/next-to-judge/257
Running testcase 2...
API request POST judgehosts/add-judging-run/bar-0/257 data=testcaseid=95&runresult=correct&runtime=0.001&output_run=Ngo%3D&output_error=&output_system=Q29ycmVj...
Testcase 2 done, result: correct
API request GET testcases/next-to-judge/257
Running testcase 3...
API request GET testcases/95/file/input
API request GET testcases/95/file/output
^--- this is only done for unknown/changed testdata

-----------------------------------------------------------------------------------------------------------

In the future, I want to have it rather like this flow:
API request POST judgehosts/next-judging/bar-0
^--- this will additionally return all the md5sums for all inputs/outputs of the problem

Judging submission s92 (endpoint default) (t1/p9/cpp), id j257...
API request GET config data=name=script_timelimit,script_memory_limit,script_filesize_limit...timelimit_overshoot
Working directory: /home/sitowert/domjudge/output/judgings/bar-0/endpoint-default/c2-s92-j257
API request GET contests/2/submissions/92/source-code
API request PUT judgehosts/update-judging/bar-0/257 data=compile_success=1&output_compile=a25pZ2h0REsuY2M6IEluIGZ1bmN0aW9uICdpbnQgbWFpbigpJzoKa25pZ2h0REsuY2M...
executing chroot script: 'chroot-startstop.sh start'
API request GET testcases/95/file/input
API request GET testcases/95/file/output
^--- request all unknown/changed testdata
Running testcase 1...
Testcase 1 done, result: correct
Running testcase 2...
Testcase 2 done, result: correct
Running testcase 3...
Testcase 3 done, result: wrong-answer
API request POST judgehosts/add-judging-run/bar-0/257 data=TBD
^--- this is actually the result of three testcases
API request GET testcases/next-to-judge/257
^--- this is the check whether we should continue judging

-----------------------------------------------------------------------------------------------------------

The main idea is to not do the next-to-judge calls (as long as possible) and to group the add-judging-run calls.
The next-to-judge calls are not necessary as long as all previous test cases are correct. For the most commonly used judging model (fail on first error) this is true for at least N-1 of N test cases. 
Now we still want to ping back from time to time to 
a) signal progress, and
b) not to post too much data back in the database at once.

So I could imagine reasonable default values is to post back at least every 10s and if we accumulated more than twice the output limit but that should be configurable.  

For the actual grouping/batching I can see two options: either allow the one POST add-judging-run call to allow multiple results to be posted or follow something like this:
to do batched requests across the API.

For a problem with N test cases and M unknown test cases, this approach will bring us down
from 9 + N*2 + M*2 API calls
to 4 + M*2 + X/10 + 2 API calls (<-- in the best case of a correct problem of course with X seconds of total judging time)

Any objections or thoughts?

Tobi