![](/Content/images/logo2.png)
Original Link: https://www.anandtech.com/show/1031
Intel's Pentium 4 3.06GHz: Hyper-Threading on Desktops
by Anand Lal Shimpi on November 14, 2002 5:39 AM EST- Posted in
- CPUs
Intel's Hyper-Threading technology has been officially around for just over a year now, but we've been hearing about the technology for even longer than that. Originally billed as a technology aimed solely at servers, there was no hiding the fact that Hyper-Threading was destined for the desktop, the only question was when.
We all thought that Hyper-Threading would make its desktop debut in 2003 with Prescott, which would give Intel time to work with software developers to begin to include initial forms of HT support in their products. Earlier this year it became clear that things at Intel were running ahead of schedule and Hyper-Threading would be making its desktop debut with the 3.06GHz Pentium 4.
An accelerated desktop release schedule meant that Intel was getting higher than expected yields on their 0.13-micron Pentium 4s but it also meant that Hyper-Threading was going to be arriving sooner than planned. We have been very skeptical about Intel's Hyper-Threading technology based on the poor desktop performance we saw back when we tried HT on the Xeon processors. As we reported during our coverage of the Fall 2002 Intel Developer Forum, apparently things have changed.
Over the past several weeks we've been running Intel's new Pentium 4 clocked at 3.06GHz through its paces and today we're finally able to bring you our experiences. Intel's got quite a bit coming over the next week and they're jump starting things by breaking the 3GHz barrier.
On the Surface: Just Another Pentium 4
Taking a look at the new 3.06GHz Pentium 4 you can see that the chip is no different than its 2.80GHz predecessor. Even if you were to look at the processor's die itself you'd see no differences, despite the fact that the 3.06GHz Pentium 4 supports Hyper-Threading (we'll talk about that in a bit).
The processor feeds off of a 133MHz FSB clock, which is sampled four times per clock to provide an effective transfer rate equal to that of a 533MHz FSB.
The architecture of the Pentium 4 has not changed at all for the 3.06GHz processor; the CPU is still based off of the same 0.13-micron Northwood core that made its debut at the beginning of this year.
We've already explained the technology behind the Pentium 4 so be sure to revisit some of our older articles if you're not up to speed:
- Intel's Pentium 4 2.4GHz: Taking the Lead
- AMD's Athlon XP 2100+ The Last of the Palominos
- AMD's Athlon XP 2000+ vs. Intel's 0.13-micron Northwood
- Intel Pentium 4 2.0GHz: The clock strikes two
HT Enabled
It isn't a surprise to many that Hyper-Threading (HT) has been present on all Intel Pentium 4 CPUs, even those before today's 3.06GHz part. Remember that Hyper-Threading only accounted for < 5% of the Pentium 4's overall die size, so it didn't kill Intel to have HT on die but not enabled. The benefit of integrated HT into the Pentium 4's design from the start is obvious; once the technology is deemed ready for the mass-market, enabling it is simple and doesn't require any redesign of the CPU. Had the technology been ready back in 2000 we would've seen it debut with the Pentium 4, but not everything works out perfectly and thus only owners of the 3.06GHz CPU and future Pentium 4s will get Hyper-Threading support.
The graphic above illustrates clearly the additions to the core that had to be made in order to support Hyper-Threading; as you can see, the technology imposes next to no hardware changes on the Pentium 4 but the potential for performance increase is tremendous.
From a hardware standpoint taking advantage of Hyper-Threading simply requires the following:
We've explained the benefits of Hyper-Threading in great detail in previous articles, but just to recap we'll briefly explain the theory and the implementation here. For more detail be sure to read our previous pieces:
- Intel's Hyper-Threading Technology: Free Performance?
- Intel Developer Forum Fall 2002 - Hyper-Threading & Memory Roadmap
- Intel Developer Forum Fall 2002 - Day 2: More on Hyper-Threading
- Database Server CPU Comparison: Athlon MP vs. Hyper-Threading Xeon
Hyper-Threading - How it Works
Your CPU has many different units, or groups of transistors that work together to perform particular functions. A CPU's execution units are those units that actually perform calculations on data and units that move data around (to/from memory).
Unfortunately it's very difficult to keep all of these execution units busy 100% of the time. Because of the nature of most applications, your CPUs execution units enjoy a horribly low utilization rate - generally around 35%. Reducing the number of execution units would cut down on costs however we've ended up with these 7 - 9 execution unit CPUs because of the performance improvements having those additional units give us in the rare occasion that they are in use.
One of the biggest reasons that around 65% of your CPU's execution resources remain idle is because of the fact that the CPU can only execute one thread of instructions at a time. Think of a thread as a collection of instructions related to a single program, for example running spell check in Word would send a thread of instructions to the CPU to begin checking your document for spelling errors. It turns out that the instructions within a particular thread mostly use the same execution units over and over again, leaving the remaining units idle. The idea behind Hyper-Threading is to send multiple threads to the CPU with the hopes that the idle execution units will be used by different threads. Intel claims that with HT enabled the utilization of the Pentium 4's execution units can jump to around 50%, not a bad improvement for such a small modification to the core.
The current version of Intel's Hyper-Threading technology allows a maximum of two threads to be dispatched to a HT enabled CPU. To the OS, a HT enabled CPU simply looks like two processors and thus the OS sends two threads to the CPU for execution.
Hyper-Threading - How it Works (continued)
Operating System support for Hyper-Threading is necessary but it currently exists in two different forms. Windows 2000 Professional supports multiple processors but it does not properly support Hyper-Threading. This means that it will see a single HT enabled Pentium 4 as two CPUs, but the OS will think that it is running on two physical CPUs instead of one physical CPU split into two logical CPUs. Why is this a problem?
With a single Pentium 4 processor this isn't much of an issue, but things get much more complicated with multiprocessor Xeons with HT enabled under Windows 2000 Professional or Server. Windows 2000 Professional only supports a maximum of two processors, and 2000 Server supports a max of 4 processors. With two HT enabled CPUs under Windows 2000 Professional, enabling HT will not make a difference as the OS will only work with a maximum of two CPUs. Similarly, a quad HT system under Windows 2000 Server would appear to the OS as an 8 processor system and thus exceed its licensing limitations giving you the use of only 4 of the CPUs.
Luckily Windows XP was designed with Hyper-Threading support in mind and thus even Home Edition will support a single CPU with HT enabled. Keep in mind that Windows XP Home does not support multiple physical processors, but if you enable HT on a Pentium 4 XP Home will recognize it as two CPUs.
The same situation exists with Windows XP Professional where the OS supports a maximum of two physical processors but it will allow a configuration with 4 logical processors.
Microsoft's SQL Server also has an identical licensing scheme where you do not have to pay for more expensive licensing for the number of logical CPUs you have; you simply make sure you are properly licensed for the number of physical CPUs present in your system.
Hyper-Threading - Pros & Cons
Thus far we have a technology that offers an order of magnitude improvement in execution unit utilization but what sort of drawbacks are there to Hyper-Threading?
Fundamentally we still only have one CPU and one set of execution units, so if the OS dispatches two threads that contend for identical resources in the CPU then HT could reduce performance.
In the earlier versions of Hyper-Threading, there were some pretty significant performance drops in desktop applications with it enabled. Luckily through revision after revision of the technology and through the addition of a few new components (flip back a few pages to see what's new) the vast majority of applications will see a performance increase or no performance loss at all.
Over the past several months Intel has been testing various applications and how their performance changes with Hyper-Threading enabled:
The question here is can we validate these results? We'll focus on where Hyper-Threading improves performance shortly, but we want to make sure that enabling HT is not going to reduce performance first:
It looks like Intel was able to deliver on their claims, most users should have no problem leaving Hyper-Threading enabled as it won't reduce performance. In fact, other than in Content Creation Winstone 2002, we saw some pretty decent performance gains with HT enabled which leads us to the next point of investigation - where will HT improve performance?
Hyper-Threading - Pros & Cons (continued)
To understand where Hyper-Threading can do good you have to understand when execution units are idle. Obviously the instruction composition of a thread will determine what execution units are in use but there are other situations that create moments of idle utilization.
One of the biggest issues with the Pentium 4's 20-stage long pipeline is that branch mis-predicts will result in a severe performance penalty. An incorrectly predicted branch will flush the contents of the pipeline and leave execution units idle until the thread is sent through the pipeline once more. With HT enabled, another thread could be in the pipeline and continue to use the execution units while the CPU recovers from the other thread's mis-predicted branch.
Another situation where execution units remain idle is when you're processing data streams using instructions that inherently take longer to execute than simpler ones. The problem with streaming situations is that there are usually very long dependency chains where you cannot execute multiple instructions in parallel because the outcome of one operation is necessary in order to process the next instruction. This is quite common with video encoding which is why we see such large performance increases with HT enabled in our DiVX tests. Remember that in order for us to see a performance gain while running a single application, the application must be multithreaded so it can dispatch more than one thread to the CPU at a time.
Finally we have a situation that everyone finds themselves in - your CPU's execution resources remain idle when your CPU is waiting on main memory to provide data for an operation. You can't add numbers you don't have so until the CPU gets the data it needs its units remain idle; here's where running two applications simultaneously can benefit from Hyper-Threading. When one thread is going to main memory the other thread could be having its way with the execution units thus improving efficiency.
Luckily we have a number of situations we can use to describe how multitasking can let Intel's Hyper-Threading spread its wings, so let's get to it:
The first situation is relatively basic; we're converting a PowerPoint presentation to PDF format while scanning a the system32 directory for viruses using Norton AntiVirus 2003:
PowerPoint-to-PDF Conversion + NAV2003 Virus Scan |
|||
PowerPoint
(Time in Seconds) |
NAV2003
(Time in Seconds) |
||
Intel Pentium 4 3.06GHz |
54.8
|
85.0
|
|
Intel Pentium 4-HT 3.06GHz |
57.8
|
53.0
|
|
AMD Athlon XP 2800+ |
62.1
|
19.0
|
Here you can see that enabling Hyper-Threading makes the PowerPoint task take 3 seconds longer, but it reduces the virus scan time by over 30 seconds. Unfortunately, even with HT enabled the Pentium 4 isn't able to deliver times quite as quick as the Athlon XP 2800+. What is important to note here is that Hyper-Threading does have a significant positive impact on performance.
Next we've got a Word document that's being converted to a PDF in the foreground, meanwhile we're scanning the C:\Windows\System32\ directory for viruses using McAfee's Virus Scan 7:
Word Doc-to-PDF Conversion + McAfee 7 Virus Scan |
|||
Word
(Time in Seconds) |
McAfee
7
(Time in Seconds) |
||
Intel Pentium 4 3.06GHz |
83.0
|
42.0
|
|
Intel Pentium 4-HT 3.06GHz |
72.0
|
35.0
|
|
AMD Athlon XP 2800+ |
81.0
|
35.0
|
This time enabling Hyper-Threading improves performance in both tasks, shaving 11 seconds off (13%) of the Word task and 7 seconds off (17%) of the McAfee task. The Athlon XP is able to perform just as well in McAfee but is slightly slower in the Word conversion test.
Finally we're converting an .avi to DiVX format while copying a 100MB directory:
DiVX Encoding + File Copy |
|||
DiVX
Encode
(Time in Seconds) |
File
Copy
(Time in Seconds) |
||
Intel Pentium 4 3.06GHz |
92
|
220
|
|
Intel Pentium 4-HT 3.06GHz |
79
|
185
|
|
AMD Athlon XP 2800+ |
75
|
244
|
Once again we see performance improvements in both tasks, lending support to this idea of multitasking benefiting from Hyper-Threading.
Hyper-Threading - It's getting Hot in Here
Obviously if you're running your CPU's execution units at higher utilization levels than normal your CPU is going to produce more heat. Intel states that with HT enabled the 3.06GHz Pentium 4 produces about 6% more heat, but just to make sure we ran our own tests on the CPU:
|
As you can see here, running with HT enabled resulted in a 6.5% higher peak temperature during our DiVX tests; a reasonable increase but nothing to get too alarmed about.
In order to deal with the added heat introduced by Hyper-Threading and the higher clock speed, Intel moved to an even larger heatsink/fan for the 3.06GHz Pentium 4:
Interestingly enough, this fan is actually one of the first retail Intel fans that's actually somewhat noisey. It is still much quieter than most 3rd party fans but it is definitely loud for a Intel retail solution.
The first three heatsinks are Pentium 4 solutions in newest to oldest order
(left to right)
The heatsink on the end is the latest AMD solution for the Athlon XP.
AMD's Response?
Normally whenever Intel makes a major processor release AMD is ready and waiting to respond, however this time around we don't have anything to show from the AMD camp. The Athlon XP 2600+ and 2700+ processors are finally hitting the streets, with the Athlon XP 2800+ due out in the first quarter of 2003.
Finally Available the XP 2700+
Click to Enlarge
AMD has released one new CPU since we reviewed the Athlon XP 2800+; due to high OEM demand for more 333MHz FSB processors AMD has introduced an Athlon XP 2600+ with support for the 333MHz FSB.
The new XP 2600+/333FSB
Click to Enlarge
The 333MHz FSB version of the 2600+ is clocked at 2.083GHz vs. the 2.13GHz clock speed of the 266MHz FSB 2600+. The decrease in clock speed is made up for by the increase in FSB frequency, causing the new 2600+ to be basically the same speed as the old 2600+ which is why they share the same model number.
Intel will be backing off of the throttle a bit as this year comes to an end, which should give AMD time to put together a solid Barton launch for next year. Even after Hammer hits, it will be up to Barton to compete with Intel for most of 2003.
Test Platforms - Where is Granite Bay?
For our Athlon XP testbed we're able to bring you performance results based on a final nForce2 motherboard - the ASUS A7N8X Deluxe.
Unfortunately boards still aren't widely available but we're so close to retail availability that we decided to continue to run Athlon XP benchmarks with the new nForce2 chipset. Much of the Athlon XP's competitiveness in the business, content creation & high-end tests is due to the advantages offered by the nForce2 platform. For more information on exactly where these performance boosts come from be sure to read our latest nForce2 article.
For the Pentium 4 3.06GHz testbed everyone is expecting the CPU to be paired up with Intel's Granite Bay chipset unfortunately the chipset isn't officially out yet. Intel's forthcoming dual-channel DDR platform for the Pentium 4 will be released very soon and we're actually currently working on a roundup of the first Granite Bay boards, but until the chipset is officially released we'll be bringing you performance scores from Intel's own 850E motherboard paired with PC1066 RDRAM.
And finally we're able to get to our usual suite of performance tests, here we go…
Windows
XP Professional Test Bed
|
|
Hardware
Configuration
|
|
CPU |
AMD
Athlon XP 2800+ (2.25GHz)
AMD Athlon XP 2600+ (2.13GHz) AMD Athlon XP 2200+ (1.80GHz) AMD Athlon XP 2000+ (1.67GHz) AMD Athlon XP 1800+ (1.53GHz) Intel Pentium 4 3.06GHz Intel Pentium 4 2.80GHz Intel Pentium 4 2.53GHz Intel Pentium 4 2.26GHz Intel Pentium 4 2.0A GHz Intel Pentium 4 1.8A GHz |
Motherboard |
ASUS
A7N8X - NVIDIA nForce2 Chipset
Intel D850EMV2 - Intel 850E Chipset |
RAM |
2
x 256MB DDR400 CAS2 Corsair XMS3200 DIMM
2 x 256MB PC1066 Samsung RIMMs |
Sound |
None
|
Hard Drive |
80GB
Western Digital Special Edition 8MB Cache ATA/100 HDD
|
Video Cards |
ATI
Radeon 9700 Pro
|
All Pentium 4 3.06GHz tests were run with Hyper-Threading enabled.
Business & Content Creation Performance
For our Business & Content Creation tests we turned to the new Business Winstone 2002 benchmark as well as Content Creation Winstone 2002. The latest version of Content Creation Winstone (2003) is not yet ready for prime time as there are still a number of bugs that need to be worked out before we'll start using the benchmark, until then we'll make do with what we have.
The new Business Winstone test is pleasantly welcome as we haven't had a new Business Winstone test in almost 2 years and it's arguably more important than any content creation test as there are more users of Word and Outlook than there are of Macromedia Director; in any case, we include both sets of scores:
|
Here we see one of the only situations where enabling Hyper-Threading makes things slower; the 7% performance hit pushes the 3.06GHz Pentium 4 down to below the Athlon XP 2800+ and Pentium 4 2.80GHz. The performance advantage those two have over the HT-enabled P4 is negligible at around 3% so you're better off leaving HT enabled and just dealing with a loss here.
|
Hyper-Threading doesn't hurt the 3.06GHz Pentium 4 here but the Athlon XP 2800+ is able to pull slightly ahead thanks to NVIDIA's latest IDE drivers for nForce2; remembering back to our nForce2 Part II article, the nForce2 chipset improves performance here by around 10%.
Media Encoding Performance
What was once reserved for "professional" use only has now become a task for many home PCs - media encoding. Today's media encoding requirements are more demanding than ever and are still some of the most intensive procedures you can run on your PC.
We'll start off with a "quick" conversion of a DVD rip (more specifically, Chapter 40 from the Star Wars Episode I DVD) to a DiVX MPEG-4 file. We used the latest DiVX codec (5.02) in conjunction with Xmpeg 4.5 to perform the encoding at 720 x 480.
We set the encoding speed to Fastest, disabled audio processing and left all of the remaining settings on their defaults. We recorded the last frame rate given during the encoding process as the progress bar hit 100%.
|
As we saw in our HT investigation, enabling the feature results in a fairly decent performance boost in video encoding applications thanks to their high latency instructions and long dependency chains. The end result is that the Pentium 4 with Hyper-Threading enabled extends the lead even further at 3.06GHz; this would be the perfect CPU for a Media Center PC...
|
Hyper-Threading has no effect on single threaded applications so unless you're doing something alongside your MP3 encoding it's raw CPU power that's going to improve performance here. The 3.06GHz Pentium 4 extends Intel's lead to just over 10% faster than the Athlon XP 2800+.
Archiving Performance
Everyone compresses & decompresses files, but some people do it more than others. For those that are seriously into storage, compressing data can be a very stressful task for your CPU and your platform. In order to test this we used WinRAR 3.00 and compressed a 100MB folder using the best possible compression setting:
|
The Pentium 4 does very well under WinRAR and in this compression test in general, with the Pentium 4 3.06GHz taking the lead.
Gaming Performance - Unreal Tournament 2003
With this review we continue to use the final retail version of Unreal Tournament 2003 as a benchmark tool. The benchmark works similarly to the demo, except there are higher detail settings that can be chosen. As we've mentioned before, in order to make sure that all numbers are comparable you need to be sure to do the following:
By default the game will detect your video card and assign its internal defaults based on the capabilities of your video card to optimize the game for performance. In order to fairly compare different video cards you have to tell the engine to always use the same set of defaults which is accomplished by editing the .bat files in the X:\UT2003\Benchmark\ directory.
Add the following parameters to the statements in every one of the .bat files located in that directory:
-ini=..\\Benchmark\\Stuff\\MaxDetail.ini -userini=..\\Benchmark\\Stuff\\MaxDetailUser.ini
For example, in botmatch-antalus.bat will look like this after the additions:
..\System\ut2003 dm-antalus?spectatoronly=true?numbots=12?quickstart=true -benchmark -seconds=77 -exec=..\Benchmark\Stuff\botmatchexec.txt -ini=..\\Benchmark\\Stuff\\MaxDetail.ini -userini=..\\Benchmark\\Stuff\\MaxDetailUser.ini -nosound
Remember to do this to all of the .bat files in that directory before running Benchmark.exe.
|
The Flyby benchmark is a test of how well the CPU can feed polygon data to the GPU; once again we're dealing with a single-threaded application so there's no performance gains to be had from enabling Hyper-Threading. The 3.06GHz Pentium 4 and Athlon XP 2800+ are dangerously close to one another at the top of the chart, but let's take a look at a more CPU bound test:
|
Generally speaking the UT2003 Flyby benchmark is more of a graphics test since it doesn't really stress what the CPU is doing while you're playing a game; instead we have the Botmatch benchmark that focuses mostly on the physics & artificial intelligence calculations that go on while your GPU is making frames fly.
Here we see that AMD actually takes the lead with their XP 2800+, although the 6% lead can be negligible it's interesting to note that both processor families are closely matched in this test.
Just for comparison purposes (since we know a lot of you run 3DMark), here are the processor standings in 3DMark 2001 SE:
|
Gaming Performance (continued)
|
Based on the Quake III engine we see that the Pentium 4 continues to do quite well in Jedi Knight 2, with the 3.06GHz Pentium 4 scoring almost 9% higher than the Athlon XP 2800+.
|
As we move to Serious Sam 2 the performance leader goes back to AMD with the Athlon XP 2800+.
|
Finally with Comanche 4 we have the 3.06GHz Pentium 4 take the lead once more.
3D Rendering Performance - 3dsmax 5
When the Athlon was first released 3 years ago, 3D Studio MAX was a strong point of its performance. The Athlon's raw FPU performance was right up 3dsmax's ally and thus it put Intel's competing solutions (at the time, the Pentium III) to shame. Things have changed a bit, the latest version of 3ds max (R5) does have some Pentium 4 optimizations that keep things quite competitive between the Athlon XP and the Pentium 4.
For our 3ds max 5 benchmarks we chose all of the benchmark scenes that ship with the product - SinglePipe2.max, Underwater_Environment_Finished.max, 3dsmax5_rays.max, cballs2.max and vol_light2.max.
|
Hyper-Threading helps performance incredibly here giving the 3.06GHz Pentium 4 a huge advantage (~20%) over even its 2.8GHz predecessor.
|
The performance advantage continues to be quite great in this scene, the Athlon XP 2800+ is not able to keep up at all.
|
Hyper-Threading doesn't help in all situations though and thus some scenes are much closer calls.
|
The 3.06GHz Pentium 4 is able to hold a 20% lead over the Athlon XP 2800+, what was once a strength for AMD has fallen back into the hands of Intel it seems.
|
If you're a 3dsmax user it's clear what CPU offers the best performance for your needs...
3D Rendering Performance - Maya 4.0.1
|
The standings in Maya are very similar to what we've seen in 3dsmax, except that this particular test doesn't gain much from Hyper-Threading and thus the 3.06GHz Pentium 4 only holds a 10% lead over the Athlon XP 2800+.
3D Rendering Performance using SSE2
While 3dsmax 5 is SSE2 optimized, the level of optimization is nowhere near what NewTek reported with Lightwave upon releasing version 7.0b. The performance improvements offered by the new SSE2 optimized version were all above 20% using NewTek's supplied benchmarking scenes.
We chose three benchmarks to use, two of the lesser SSE2 optimized scenes and another that is more optimized just to get an idea of the potential that lies for Pentium 4 users running heavily optimized application
|
Where SSE2 comes into play, the Pentium 4 truly excels and can overshadow the Athlon XP's strong x87 FP capabilities.
|
We see even more SSE2 optimized functions in the radiosity_reflective_things scene, but watch as things change once we switch scenes:
|
Now the Athlon XP has the lead; it just goes to show you that the days of absolute performance leaders are gone, you can only hope to pick the processor that does better overall.
High End Workstation Performance - SPEC Viewperf 7.0
The latest version of SPEC Viewperf makes for an interesting CPU test, although the results don't always match up with the real world applications that the benchmark represents. The benchmarks included version 7 of the benchmark suite are:
3ds max (3dsmax-01)
Unigraphics (ugs-01)
Pro/Engineer (proe-01)
DesignReview (drv-08)
Data Explorer (dx-07)
Lightscape (light-05)
For more information on the tests run visit SPEC's page on the new Viewperf benchmark.
|
|
|
High End Workstation Performance (continued)
|
|
|
High End Workstation Performance - ScienceMark 2.0 Beta
With this review we're also introducing ScienceMark 2.0 into our test suite; the benchmark focuses on FP-intensive scientific calculations. For more information on what's going on in the tests themselves visit www.sciencemark.org.
|
The Athlon XP does quite well in this test thanks to its very strong FPU however the presence of Hyper-Threading helps keep the 3.06GHz Pentium 4 within 0.2 seconds of the Athlon XP 2800+.
|
Hyper-Threading does not help nearly as much in the Primordia test where the Athlon XPs manage to pull away from Intel's flagship.
Final Words
At the beginning of this year Intel had just introduced their 0.13-micron Northwood core at 2/2.2GHz and now, 10 months later, Intel has kept their promise of scaling up clock speeds faster than before as we finish reviewing their 3.06GHz Pentium 4. As you read this, Intel is sampling their 90nm parts, readying for a 2H-03 launch of the interim successor to the Pentium 4 - Prescott. Prescott will bring us a larger L2 cache, new instructions and it will continue to offer the Hyper-Threading technology we've evaluated today.
We have to hand it to Intel, we honestly expected Hyper-Threading to be a big flop initially on the desktop because of losses in performance. It seems as if Intel has worked out virtually all of the issues we ran into when we first looked at Hyper-Threading on the Xeon processors several months ago. With the 3.06GHz Pentium 4 you thankfully won't even have to worry about whether you should enable Hyper-Threading or not, the technology does more good than harm.
Hyper-Threading in its current form is very much an infant technology, the potential for it is huge and it can grow into something much larger than what we see here today. It is impressive that we are able to see some serious performance gains in encoding and 3D rendering applications, as well as in isolated multitasking scenarios but the true benefit of Hyper-Threading comes much further down the road. With compiler optimizations and programmers developing with Hyper-Threading in mind, we'll see much more dramatic performance increases in the future.
Today, Hyper-Threading still comes at a fairly high cost as you have to purchase Intel's flagship Pentium 4 processor to get the technology. The beauty of it is that at < 5% die cost, Intel won't hesitate to migrate the technology across their entire line of CPUs. What will be interesting to see is whether or not the value Celeron line of processors gets the technology as well.
The 3.06GHz Pentium 4 is the first step in a long road ahead for Intel as they embark on a quest to increase Thread Level Parallelism after extracting parallelism from instructions for the past decade...