This project is read-only.

Possible performance improvement for children process?

Jun 3, 2016 at 5:10 PM

I just wonder whether there is some room for performance improvement when enable "cover_children" option. For some special reason, my software will create a child process for some special operation like "start remote mode". and the same bunch of dependent dlls are loaded when new child process starts.

I have successfully built the source code for OpenCppCoverage and through debugging I find when starting a new process, OpenCppCoverage will set break point at each line of the dependent dlls. And when the break point is hit, the line is marked as executed and then the break point is removed.

This explains why program usually starts slowly when triggered with OpenCppCoverage. Because many breakpoints stay there. But after that, the program will run faster because if a line is hit, the break point will be removed, and only non-hit code will still have breakpoints kept.

So potential performance degrade comes when a child process starts to run in several aspects:
a. Through debugging I find again, OpenCppCoverage will try to set break point at each line of the same dependent dlls. So the child process still starts slowly. I think this is not necessary because on parent process starting, we have some code hit before during the starting process. Those bunch of "starting code" has been marked as executed, so they should not be set break point at again for new process start.

b. The address to set break point at is again analyzed through pdb. I feel this may not be necessary. In theory, the address to set break point at for each same dll could be calculated as "AddressValueInParentProcess - ParentProcessBaseOfImage + ChildProcessBaseOfImage". So maybe we can save the time of reading pdb of related dlls to calculate the break point address.

Anyway, above are just my personal ideas. I am not sure whether my observation of how OpenCppCoverage is working is correct. My program will take much time to launch during start and I don't want the child process take same amount of time to start when triggered with OpenCppCoverage.

Jun 7, 2016 at 12:40 PM

You did a very good analyze of the problem.
When I optimized the memory usage for "cover_children", I noticed that SymLoadModuleEx takes a lot of time and not really setting the breakpoints.

As you are able to compile the code, I will try to investigate this weekend if there is a quick fix using a cache and if yes I will sent to you as a patch file.

Jun 11, 2016 at 12:36 PM

Thank you very much to pay attention to this problem.

Actually I have tried changing some code in my local computer to optimize the child process. So far it works well. My rough thoughts are:
  1. If the same dll gets loaded in child process, we will get address to add break point from cache instead of reading from pdb. Through debugging I find the only difference between the address to add break point to in child and parent process is the process handle.
  2. If some break point in parent process has been hit, we will remove the address in the cache in function OnBreakPoint(...), so that next when child process starts, the already hit code would not be set break point again.
  3. If some break point in child process has been hit, we will also remove the address in the cache because there may be future child process. As long as the code is hit, wherever process it is in, no breakpoint should be set for future child process.(My strange program will new several child process which might execute same code...)
I cannot understand all of the code of OpenCppCoverage, so I just modify the code based on my own understandings. Also I see some restrictions you set in the code. For example, I find I cannot use a map with both key and value set to class Address. Address can be the key, but not the value. I want to use a map of <Address, Address> to map child process break point address to parent for my own implementation to address thought#3 :) To workaround this, I am now using a map of <Address, pair<Handle, value> >.

I am glad to receive your patch file if possible. I would compare yours with my own ugly implementation to better understand your program.

Jun 11, 2016 at 2:38 PM

You can find the code in the branch SymLoadModuleCacheTest.
This is hack code for testing the solution and so your code is probably cleaner :). A correct implementation would require much more work.
Note that, I only cache the reading from the pdb file.

I did not notice a huge performance improvement even when launching 100 children processes.

I have the feeling that the bottleneck with "cover_children" is the console output.
I tried without filter (no “--sources” or “–modules”) and the code is 4 times faster when I add “--quiet” option.

Do you have some measures about the performance improvement for your code?
Can you compare when using “--quiet”?

Hope that help,

Jun 12, 2016 at 6:50 AM

Thanks for the reply. I am a git newbie and I think I need more time to understand how to sync code from a branch...

Your finding about console output is valuable. I tried it today, although the result shows it is not too much related with the performance problem on my side. It still reminds me that performance measure should be more cautious, should not only depend on suspect. We need more accurate data to identify where is the real bottle neck.

I simply measure the performance by my iPhone through stop watch:) It seems on my side, there is no much difference when use or not use "--quiet". Use the released 0.9.51, It takes about 5 minutes to finish start the parent process for my program and also about 5 minutes to finish start the child process. When quiet mode is enabled, the parent process takes about 4 minutes and 50 seconds, the child process also takes about 4 minutes and 50 seconds.

l find performance tuning is not an easy job for me. During more testings today, I find my code seems cannot work for some complex cases. I met with a breakpoint address invalid error, which implies that the algorithm to identify address to set break point at in child process has bugs. I will debug more.

Jun 13, 2016 at 7:18 PM

To merge the code, start Git Bash and execute git pull origin/SymLoadModuleCacheTest.
If you have a conflict, run git mergetool (you probably need to configure your merge tool too).

I do not have a real world application to measure the performance with --cover_children and my measure was on a toy application (Just run a process several times).
To measure performance, I suggest using the performance tool inside Visual Studio which is great and easy to use on your application. (In Debug menu/Start Diagnostic Tools Without Debugging... then select CPU Usage).

Hope that help,

Marked as answer by OpenCppCoverage on 8/20/2016 at 8:17 AM