PDA

View Full Version : Altivec Enhancements



Scotttheking
04-05-2002, 02:14 AM
Now that Team MacNN is moving into DF, would it be possible to get some Altivec enhancements for the client?
We like seeing our processors fully used :)

Thx,

Scott

Shaktai
04-05-2002, 02:33 AM
Originally posted by Scotttheking
Now that Team MacNN is moving into DF, would it be possible to get some Altivec enhancements for the client?
We like seeing our processors fully used :)

Thx,

Scott

Now there is an idea, if it is doable. I would love to give my G4 a crack at it with Altivec.

Shaktai
04-05-2002, 02:42 AM
Originally posted by Scotttheking
Now that Team MacNN is moving into DF, would it be possible to get some Altivec enhancements for the client?
We like seeing our processors fully used :)

Thx,

Scott

Now there is an idea, if it is doable. I would love to give my G4 a crack at it with Altivec.

Brian the Fist
04-05-2002, 11:49 AM
We have had a number of requests for Altivec enhancements, however we are currently unable to perform such enhancements. We do not even have an Altivec machine to test on, let alone the knowledge to use it. Also, I suspect that our algorithm would not benefit that much because the majority of its time is spent doing pointer traversals, not math.

Can someone knowledgeable on Altivec tell me exactly what sorts of operations it is good at optimizing (and don't say 'everything' :p) We may be able to get a 3rd party involved if there is enough interest and if it appears to be worth the time it would take.

Shaktai
04-05-2002, 03:36 PM
Originally posted by Brian the Fist
We have had a number of requests for Altivec enhancements, however we are currently unable to perform such enhancements. We do not even have an Altivec machine to test on, let alone the knowledge to use it. Also, I suspect that our algorithm would not benefit that much because the majority of its time is spent doing pointer traversals, not math.

Can someone knowledgeable on Altivec tell me exactly what sorts of operations it is good at optimizing (and don't say 'everything' :p) We may be able to get a 3rd party involved if there is enough interest and if it appears to be worth the time it would take.

It is my understanding that it accellerates floating point and vector calculations primarily. But I am not an authority on it. Operations that can take advantage of it, can see performance increases anywhere from 50% to 400% on the Power PC G4 chips. Altivec.org (http://www.altivec.org) website provides information that you may find helpful in determining if your process can benefit from it.

If you find that your process can benefit from it, you might approach Apple computer for additional help in how to implement it or possibly even the loan of a "development/testing" machine. After all they like to demonstrate how altivec can benefit many different processes.

Scotttheking
04-05-2002, 04:24 PM
The best place for info / help is the apple developer mailing lists.

rayson_ho
04-05-2002, 10:18 PM
If you can tell us where the program spends most of the time it, and what it does, then we may be able to determine whether we can use AltiVec or not.

Also, not only AltiVec, but there are other kinds of SIMD (Single Instruction Stream, Multiple Data Streams) implementations, like MMX, SSE, or 3D Now!. Each CPU vender has its only implementation of SIMD instructions.

Rayson

wheeles
04-06-2002, 07:36 AM
If have a dual cpu PowerMac G4 that I am using to run the client.

Any Altivec enhancements would be appreciated. Any speed improvements to the client of this nature would result in more Mac people lending their cpu power to this project.

wheeles
04-08-2002, 10:48 AM
Not sure if you guys have seen the following article on O'Reilly but it gives a bit of info about Altivec and how to code for it.

http://www.oreillynet.com/pub/a/mac/2002/04/05/altivec.html

Hopefully this will come in useful.

:cheers:

Marc2211
04-08-2002, 12:51 PM
I'm also using a DP G4 Powermac...altivec would be very much welcomed, and I'm sure attract more G4/mac users to the project...

Marc

SkiBikeSki
04-09-2002, 02:45 PM
I think everyone could benefit from reading this article by O'reilly about Alti-Vec.

http://www.oreillynet.com/pub/a/mac/2002/04/05/altivec.html

Shaktai
04-09-2002, 03:52 PM
Howard,

Hope some of the references that have been given are helpful. I you need more, let us know. Of course the question remains, will there be enough G4 users to make the effort worthwhile. I think you will find that answer to be yes. RC-5 is nearing its end, and has been a haven for Mac G4 users because of its Altivec enhancenments. You already have a large number of G4's working on this project, and can expect more. Not to mention the fact that the Mac Community will get the word out for you, if you are able to provide Altivec enhancements to your clients. You could soon have a few thousand Mac G4's crunching for you and a good many of those folks will bring other computers with them as well. (I myself have 1 G4, 1 G3, 1 celeron and 1 Athlon all crunching on this project.) In the end though, only you and your team can decide if it is doable and worth the effort. Please keep us advised.

Brian the Fist
04-09-2002, 09:48 PM
I may have someone who is willing to look into it for me (a Mac user of course :D ) Still not sure that it could really benefit from the Altivec though as the bottleneck is not in floating point operations. But only one way to find out I guess. Probably couldn't start doing it until the summer though (i.e. May) as it would probably be a summer student doing it. We'll post a notice if/when we begin such a port.

Shaktai
04-09-2002, 10:01 PM
Thanks for keeping an open mind. :)

Scotttheking
04-09-2002, 10:17 PM
Originally posted by Brian the Fist
I may have someone who is willing to look into it for me (a Mac user of course :D ) Still not sure that it could really benefit from the Altivec though as the bottleneck is not in floating point operations. But only one way to find out I guess. Probably couldn't start doing it until the summer though (i.e. May) as it would probably be a summer student doing it. We'll post a notice if/when we begin such a port.

Thanks for looking into it.

Can you give us an idea what you think the bottleneck is?
Thx,

Scott

Shaktai
04-30-2002, 05:48 PM
Just as a heads up. Vijay Pande at Folding@Home has announced that they are working on a new core that will support Altivec (and I presume some other vectorization technologies). It would be great if a way could be found to do this for dFold as well.

The Thread is http://forum.folding-community.org/viewtopic.php?t=102

Scotttheking
05-11-2002, 06:59 AM
I know you said you'd post, but I'm impatient :)

Any news :D ?

Jodie
05-11-2002, 09:13 AM
Bottom line is if you can't take advantage of 128bit register math through vector math or other intense floating point routines - then the Altivec optimizations are a waste of time. The same being true for SSE/2, 3DNow!, and the rest of the floating optimized SIMD instructions sets.

SIMD isn't a magic bullet guys. Apple may have sold you on that being the case - but it's not.

In fact, very few real-world tasks are really all that sped by it.

Look at how dog slow the P4 is in reality. If you're encoding video, you're in floating-point land, and it screams. If you doing a 500k cell cut and paste in excel, then it's going to come down to memory bandwidth.

Our team's off-the-cuff profiling has shown so far that it's all in the memory bandwidth for this project - and processor cache. That jibes with it being pointer math.

So for this project, the USparc 3's, Alpha's and to a lesser extent, higher-end Xeons should be the prom queens.

Paratima
05-11-2002, 12:09 PM
Originally posted by Jodie
So for this project, the USparc 3's, Alpha's and to a lesser extent, higher-end Xeons should be the prom queens. That's an interesting premise that I've read hereabouts before. Anyone have actual performance numbers on those machines?

Scotttheking
05-11-2002, 07:35 PM
Jodie, I know you know more about this stuff then most of the rest of us, so could you explain a bit more?
Would it then make more sense to code prefetch into the app (can that even happen?) or something?

Also, would this mean that the new G4s L3 cache will provide a performance boost?

BTW, I'd still like to see a evaluation of the code and it's altivec potential from someone looking at the code. I never said altivec would be faster, but I'd like to see if it can be.

Thx,

Scott

Brian the Fist
05-12-2002, 04:04 PM
MHz for MHz, Alpha is definitely THE fastest processor for distributed folding. The majority of the program's time, as I have mentioned before, is traversing pointers (specifically in binary-tree-like data structures). This accounts for 50% or more of its time. Another good but smaller chunk is spent RLE decompressing the data in protein.trj, the protein data file. The expanddb utility that originally came with foldtrajlite uncompressed protein.trj, but we found this made things slower, not faster, probably due to increased loading from disk.

Altivec will be looked at this summer, if all goes well.

Jodie
05-13-2002, 01:31 AM
That makes sense to me. I would expect, in order of performance (based on a suspect model that I have derivived from what you just said :D )

Alpha
USparc3
R10k
P3 Xeon w/2M ->1M
P3 Xeon w/512k
P4 Xeon w/512k
P3 w/512k (they have those in server units like Dell, right?)
P4 Rambus 800
AMD XP
P4 Rambus 400
G3/G4
P3 standard
Celeron

The new G4 should then be faster due to greater cache. Place it probably betwixt the P4 Xeon and P3 Xeon

Increasing your memory bandwidth should help substantially. So DDR on the AMD would be a priority. (shoot - and here my whole cluster is optimized for G@H performance...)

Vectorization isn't a priority - so Crays, Hitachis and Fujitsus are right out. ;)

If vectorization is out, how much response do you expect to see from Altivec or other SIMD optimization? A perfect parallelization shouldn't see you more than a 30% improvement. Realistically, probably what, 10%? Is it worth the effort?

There's the perception thing here too, I think. If you do Altivec optimization, you're going to have to do atleast SSE2 and most likely SSE1 and/or 3DNow!2 optimization... Otherwise, the PC-class users are going to scream that the 5% market distribution of Mac users got their optimization whilst the 90% of WinTel didn't. sounds like RC all over again.

If it were I, and obviously it's not, I'd optimize for 64-256bit register math (if you're doing binary trees, it's a natural), go with a register compiler where available (can you say Watcom - ZOOOM! :D ) and hold out for the RealComputers like the US3, Sledgehammer (errr, I mean Opteron or whatever the heck the stupid marketing team came up with), Xeon, etc...

But I'm just rambling.

Scott - when I'm a tad less busy, I'll see if I can come up with something to help explain. It's a *really* complicated topic...

:o

Scotttheking
05-13-2002, 06:18 AM
Originally posted by Jodie
Scott - when I'm a tad less busy, I'll see if I can come up with something to help explain. It's a *really* complicated topic...

That's fine.
I can probably understand it in tech talk also, and if something's confusing I have translators :D

Just curious, I forgot about the new new G4s :)
Is this the correct speed order?
L3 cache
256 on die L2
1MB backside L2

Also, since it's a lot of memory stuff, are the altivec memory calls I've been reading about useful :D?

(Yeah, I know I'm wanting altivec a lot, even if it is mainly psycological. There are a lot of people who'd join up just because of that word, even if there's not much benefit. And I want those users :D)

--Scott

eXXile
05-13-2002, 01:09 PM
Originally posted by Jodie

Increasing your memory bandwidth should help substantially. So DDR on the AMD would be a priority. (shoot - and here my whole cluster is optimized for G@H performance...)
:o

How substantial is the difference between SDRAM and DDR?

Jodie
05-15-2002, 12:51 AM
Hmm, I have a P4-400RD a P4-800 and a P4 - 333 DDR. Downside is they're all different speed chips.

So let me scrounge up some 1.6's (I know I have lots of those) and then compare.

Note they will be different chipsets so that could change things a bit.

Should be able to bite into that this weekend.

Jodie
05-15-2002, 12:54 AM
Is there anyway to do one set of 5000 and then exit? Or should I write a script to watch progress.txt and time from 5000 to 0 remaining? Night quite as accurate, 'cause my polling will slow things down a bit...

Welnic
05-15-2002, 02:08 AM
For benchmarking I would just run with the -i f switch which will prevent the client from trying to upload. Then after running for a certain length of time just remove the foldtrajlite.lock and count the number of WU that were done.

Scotttheking
05-28-2002, 03:15 AM
bump back up.

Jodie, got time for that explanation now?

eXXile
05-28-2002, 01:18 PM
Iwill XP333 with an AXP 1800@1.66 turns out approximately 3400 structures per hour. The XP333 mobo uses an Ali Magik chipset which runs DDR.

Iwill KK266 with an Athlon 1.4@1.65 turns out approximately 3300 structures per hour. The KK266 mobo uses a VIA KT133A chipset which runs SDRAM.

I know there will be some discrepency about how the Ali Magik chipset isn't the fastest DDR chipset, and that I'm not comparing the same processor. But, the numbers are relatively close, so by using DDR or SDRAM doesn't make a dramatic difference in structure production.

MAD-ness
05-28-2002, 06:12 PM
You might try a SiS735 based board, it has good (KT266 level but not KT266a level) performance with DDR and it takes either SDRAM or DDR SDRAM.

If no one else steps up and runs one I might be convinced to open the case, dig up some SDRAM somewhere and do some benchmarks. I would prefer to avoid the extra work though. ;)

jamesa
07-14-2002, 12:06 PM
Originally posted by Jodie
If vectorization is out, how much response do you expect to see from Altivec or other SIMD optimization? A perfect parallelization shouldn't see you more than a 30% improvement. Realistically, probably what, 10%? Is it worth the effort?


Jodie,

I'm going to bump this for two reasons:
1. Because I think it's worthwhile, and
2. Because I have something new to add. Apple have within it an Architecture and Performance Group whose job is to look at algorithms and make them run fast on PPC (and Altivec) hardware. I know they exist, but I don't know how to get hold of them directly, but I can point you in the right direction should you be willing.

I know they exist because a guy whose screensaver is to be included in Jaguar (OS 10.2) just had his code optimised by these people (check it out: http://www.versiontracker.com/moreinfo.fcgi?id=11393&db=mac). What I'd suggest you do is contact him (email address is listed as calumr@mac.com ) and ask him if he can point you in the right direction.

I'd really like to see if they can help - I want to use my clock cycles where they are most effective and right now having a few macs dedicated to the effort, without Altivec optimisations I don't feel they're really being fully utilised. Surely there's somewhere in your code where it can make a difference ;)

Thanks

-- james

Brian the Fist
07-14-2002, 12:28 PM
To clarify, Jodie is NOT a member of this project team, I am; she is a user (with lots of computers). So Ill assume your comments were directed to myself.

We have rigorously optimized our code almost to the level of assembly code, and knwo exactly where the bottleneck is. The majority of its time is spent doing 32-bit pointer traversal, something which AltiVec cannot help with. Thus we expect only minimal improvement. Plus if we looked at the actual number of users using the software, it would make more sense to optimize the Windows version, not the mac one, if we were to make any specific CPU-optimized versions. However, at 13 platforms currently supported, we have enough versions to maintain to keep our hands full already.

The_Equivocator
07-16-2002, 11:02 AM
Yeah, I looked into this a little while ago with some of the utilities that ship on the OS X developer CD. The utility I used looks at the core function calls that a program uses and tells you the percent of time that that they are being executed. It is a really good way of seeing whether or not a program would benefit from Altivec enhancements.

Unfortunately, it was immediately obvious that Altivec would not be worth implementing in Distributed Folding.

dtsang
08-11-2002, 09:06 AM
Howard,

Are you still looking into G4 optimization? If yes, may I suggest taking a look at this page:
http://developer.apple.com/hardware/ve/performance.html

It allows you to run a diagnostic to see if the distributed folding program can really be optimized.

From the page:

MONster and Shikari are most suitable for applications level or OS level performance measurement. They are a good way of identifying which applications or which functions might benefit from performance tuning and why. As such they are somewhat above the scope of the sorts of optimizations discussed here. These are the sorts of tools you should use first to discover what to vectorize. The actual process of verifying that your optimizations are working as intended however relies much more heavily on trace utilities and simulators like Sim_G4 and Acid.


I hope this helps!

dtsang
08-11-2002, 09:29 AM
Also, I found another example of how Distributed Folding can benefit from AltiVec enhancement. On the Folding@Home website (which, I'm assuming, is running a similar protein folding initiative), they have a page dedicated to their new speed-oriented client. Read it here:

http://folding.stanford.edu/gromacs.html

On the page:

How can Gromacs be that much faster? Gromacs is built for speed. Everything about it has been optimized to be the very fastest MD code on the planet. ... Altivec is supported on Macs. The inner loops are handcoded in assembly. It has algorithms creatively designed for speed. It's an amazing feat. For us to include all of these optimizations into our current scientific code did not seem a judicious use of our programming resources (why reinvent the wheel?) and we instead decided to collaborate with the Gromacs team.

AltiVec enhancement is possible!:rotfl:

Jodie
08-11-2002, 06:43 PM
As has been posted before - it appears that the algos involved in DF are rather very different than those employed by F@H.

dtsang
08-11-2002, 10:04 PM
My boo-boo. I read most of the thread, but not all of it, I guess... :cry:

mikkyo
10-02-2002, 05:52 AM
So I noticed some of the CHUD tools were mentioned but did anyone look at them and do some profiling?
Performance, Debugging, Profiling (http://developer.apple.com/tools/debuggers.html)

Here is an example of the kind of info they can provide...
This is a partial analysis of a sample of 100000000 instructions of the foldtrajlite process running under OS X grabbed with amber and analyzed with acid.

Total Instruction Count = 100000000

------------------------------------------------------------------
Instruction Type | Count | % of Total
------------------------------------------------------------------
Integer 39601893 39.60
Floating Point 5291767 5.29
Altivec 0 0.00
Branch 18743879 18.74
Load 18273513 18.27
Store 9192360 9.19
Cache Control 21 0.00
Data Stream 0 0.00
Miscellaneous 8896567 8.90
------------------------------------------------------------------


Revealing eh? The full report gives much more detail.

Looks to me like you do use floating point a bit but not as much as integer math.
You could also use altivec for your pointer tree traversal, if you wanted.
It sure would be nice to see the speed benefits of someone spending some time on this.

Jodie
10-03-2002, 11:31 PM
What's the ratio of SSE/SSE2/3DNOW enabled users to Altivec-enabled users, again? Oops! It's in the chart. 95%-ish SSE/SSE2/3DNow potential users... Call everything Win98 and below as MMX or worse. 20%... That leaves us with a conservative 75% to 3%? One would hope SSE would be evaluated first...

Brian the Fist
10-04-2002, 10:08 AM
Exactly. Not that I know the first thing about using SSE instructions either... though I THINK the Intel compiler automatically uses them where it sees fit.

runestar
10-04-2002, 03:28 PM
SSE and 3DNow! and other such compilations are really oriented towards gamers and heavy duty graphics modeling which make heavy duty of floating point.

There's a certain amount of work vs performance gained ratio that must be maintained or it doesn't really pay off. As one person noted, floating point operations only consist about 5% of the calculations so the amount of work Brian (Howard) would have to spend researching and then incorporting it would very likely outweigh the speed benefits.

Its not an unwillingness to incorporate them, but its like putting high-speed performance tires on the car when all you do is drive around locally... Its really great, the car may run just a tad smoothier... but is it really worth the extra cost?

I do think that Brian would be interesting in optimizations that would make a significant improvement in the calculations. I wouldn't claim to speak for him, but for me I would guess at least a 20to25% performance increase for the extra time spent incorporating it in.


As for the Mac community, even though they are a small chunk of the market, they tend to be a loyal bunch to Mac related topics, so just because they are Mac users, they shouldn't necessarily be ruled out for improvements. Its been too long that the WinIntel giant has cast its shadow.


Now as for SDRAM and DDR... remember that these are the max potential speeds that data can move. Its similar to hard drives in that the max transfer speed, is not the speed that all data is going to transfer at, but rather how fast it could potentially travel.

Of course there are a lot more factors as someone pointed out. The MotherBoard chipset does play a big role. Some designs just work out better than others, and over time even the same speed chipset gets new tweaks.

From what I found, its accepted that DDR is BOINC consists of a client program and a data-distribution server backed by a database. BOINC, however, is not a specific application program - it's a framework that can support many different applications. This will make it easy for us to run multiple computations simultaneously - like AstroPulse and our southern hemisphere search - and to release new versions of these applications without requiring you to manually download and install software.

Even more significantly, BOINC is an open system. Other science projects can create their own distributed computations using BOINC. You choose the projects in which to participate, and you decide how much of your computing resources should go to each project. faster than the older SDRAM.

The rule of thumb is, if you are getting a newer system with support for a newer standard, go with the newer standard parts even it supports the older standard. Just make sure to match it up with what your board supports, for example don't put slower of faster RAM than what the board is rated for. Slower RAM limits you, and the faster RAM is wasted since the board can't take advantage of it.


TTFN,

RS½

mikkyo
10-04-2002, 06:11 PM
What's the ratio of SSE/SSE2/3DNOW enabled users to Altivec-enabled users, again? Oops! It's in the chart. 95%-ish SSE/SSE2/3DNow potential users... Call everything Win98 and below as MMX or worse. 20%... That leaves us with a conservative 75% to 3%? One would hope SSE would be evaluated first...

I would hope that someone would turn on GCC3s optimization flags for various platforms, build a version geared toward each, pass it off to some users of that CPU and let them test it, get the results back, verify all is working good and check the benchmark times, then release a few CPU specific clients, since after all it is a freebie and could potentially help a bunch and might not even require any code changes.

(I'm sure this will stoke the flames)
Regarding users of CPUs, try this logic:
All Apple PPC machines are made by Apple.
Apple makes extensive use of Altivec in MacOS X.
Apple makes sure the compiler that ships with OS X Dev tools works great for CPU optimization and Altivec coding.
Of the 95% of SSE/SSE2/3DNow potential users, there are probably 40+ different motherboards, and 20+ memory controllers.
Testing on Apple's machines requires maybe 14 different machines, but all are *very* similar, including the memory controllers.
Testing on SSE/SSE2/3DNow machines requires hundreds of different machines, some similar some very different, some using SDRAM paths, some of the same machine using DDR paths.
All Apple users you are interested in are running MacOS X(since you aren't coding for OS 9).
All SSE/SSE2/3DNow enabled CPU users are running, uh lessee, 12 different *nix variants, 5 differnt WinXX variants, 5+ different types of CPUs(I really don't know the exact specifics).

The above is why it is far, far easier and better to support the MacOS running on PowerPCs. One client works for Millions.
It is more bang for your coding time, less testing, less odd bugs you can't track down, less headache.

As for the DF client, using FPU math could be a boost, and you'll get some of that with CPU optimization for free.
Multi-threadiing would be great as there are many dual CPUs out there(not just macs) and would offer some gain.
Using Cache instructions might help as well if you know what needs to stay around and gets repeated, again some of that you get for free with CPU specific optimization.
Using Vector Math would be a huge boost and make it easier to support SSE/SSE2/3DNow/Altivec in general.

It all really comes down to whether you want to take the most advantage of the user's CPU, how quickly you want results and the willingness to explore new coding frontiers. Of course, there are lots of folks willing to help out, from coding to testing, so that is a big boon.

Jodie
10-04-2002, 10:00 PM
We do high-end video compression servers. I think I can speak with some authority on the topic of SSE/SSE2/etc.

Quite the contrary to what you posted, SSE's big strength is not in floating, but rather in integer math. mmx had the floating boosts, SSE is a bit more in the float, but was focused on integer calculations. Just in 3DNow! instruction set, for example, there are 19 additional integer calculation SIMD instructions.

The compiler can use SSE instructions, but can't do intelligent pipelining for integer calculations.

Add to that cache-hit-hinting (which compilers do basically nothing with) and your walking a tree gets substantially faster.

SSE2 has 144 new instructions including a substantial number devoted to 128-bit SIMD integer arithmetic. In pure integer math, we see a 15% speed increase with Intel's compilers and a 254% speed increase in integer math with hand optimization over already highly optimized code.

SSE2 wasn't intended for gaming. It was intended for encryption, voice and video compression, financial analysis, engineering and scientific calculations

That, as I remember it, is straight from the horse's mouth. I took a class from Intel on SSE/SIMD. I believe the web page describing the intent is at: http://www.intel.com/design/Pentium4/prodbref/index.htm

Aha -

Streaming SIMD Extensions 2 (SSE2) Instructions
With the introduction of SSE2, the Intel NetBurst microarchitecture now extends the SIMD capabilities that MMX technology and SSE technology delivered by adding 144 instructions. These instructions include 128-bit SIMD integer arithmetic and 128-bit SIMD double-precision floating-point operations. These instructions reduce the overall number of instructions required to execute a particular program task and as a result can contribute to an overall performance increase. They accelerate a broad range of applications, including video, speech, and image, photo processing, encryption, financial, engineering and scientific applications.

Data Prefetch Logic
Functionality that anticipates the data needed by an application and pre-loads it into the Advanced Transfer Cache, further increasing processor and application performance.

Jodie
10-04-2002, 10:11 PM
And I call :bs: on your argument, Mikkyo. 3DNow, MMX, SSE, SSE2 is entirely motherboard, memory architecture, etc. independant. It's processor dependant. 3DNow! is Athlon only, so toss it out.

Athlons support MMX and SSE.

P4 is the only SSE2 processor other than Server P3. So toss that out.

If you code for MMX it runs on every Athlon and P-II or greater.

So that takes you to seventy-something percent of the total machines out there today.

If you code for SSE it runs on EVERY Athlon Tbird + and every P-3 +

Now you're still over 40% of the machines.

It's also operating system independant. The same MMX code that compiles under windows compiles under any *nix that runs on that processor.

By autodetecting your processor, you can dynamically choose MMX, SSE, SSE2, etc.

Suggesting that it's smarter to code for an operating system that runs on less than 5% of the computers in the world is ludicrous. "Your entire market is a few thousand machines. So it's much easier to test your code. No one will run it, but that just means you have less to support!"

By your argument, the smartest machine to code for would probably be a PDP-03. I think I have one of a half dozen left running in the world... MUCH easier to test! Only code that was written on the '03 can run on the '03, and since there's only ONE board to test, look at how much wiser a decision that is!

:haddock:

Darkness Productions
10-04-2002, 11:50 PM
Wellll...... he could do something like the d.net client did, and have a different *core* for it. Then, the client would detect what you're using, and use the appropriate core.

Jodie
10-05-2002, 01:19 AM
Sure. In fact, back in my hacking days we used to do a universal executable. You could do a single executable that ran on every platform you wanted to support. But you carry a lot of extra "weight" doing that... :cheers:

bwkaz
10-05-2002, 09:21 AM
Not to mention a whole lot of time to develop each one...

runestar
10-05-2002, 12:29 PM
Well put, but... Breathe Jodie BREATHE... =)

RS½