engracio, your post is a little bit, err, not straightforward. I'm not sure if you have questions, but I'll try to write some answers.
Stage1 needs less memory than stage2. Knowing this, p95 computes the bounds optimal for factor throughput, given the max. mem available(and some other stuff).
This is if you put Pfactor=... in the worktodo.ini.
Of course, if you have 2 computers, on with less mem than the other, it would make sense to run stage1 on the one with less and stage2 on the other one.
Running tests with your own chosen bounds (or only one stage) means putting Pminus1=... to the worktodo.ini.
You have a dual processor, right? It would be the first idea to set up two instances, one with less mem, running stage1, and one with more, runniing stage 2...you get the point.
The problem is that you have only one memory. I tried it out, at least with hyperthreading, the bottleneck memory access slowed them down to the speed of one instance.
As p95 does not have genuine support for multiprocessor systems, it makes sense to optimize on your own, running one times p95, and one times something else which doesn't need so much access to mem, like - the sieving client. (That's what I remember.) Or perhaps even PRP. This, you hve to try out.
It's a little bit like if you mix 1liter alcohol and 1 liter water, it gives 1.8 liter of the mix.
More information you will find on www.mersenneforum.org
Good luck, H.