Hyper-Threading Technology
Hyper-Threading Technology
INTRODUCTION:
Also called HT Technology, hyper threading was developed by Intel for use in Pentium 4 and Xeon processors. It is the process of executing two "threads" of information simultaneously. This allows the CPU to act as though it were 2 separate CPU's.
It uses additional registers to overlap two instruction streams in order to achieve an approximate 30% gain in performance. Multithreaded applications take advantage of the Hyper-Threaded hardware as they would on any dual-processor system; however, the performance gain cannot equal that of true dual-processor CPUs.
DEFINITION: Hyper-Threading technology is a groundbreaking innovation from Intel that enables multi-threaded server software applications to execute threads in parallel within each processor in a server platform. The Intel Xeon processor family uses Hyper-Threading technology, along with the Intel Net Burst micro architecture, to increase compute power and throughput for today's Internet, e-Business, and enterprise server applications. This level of threading technology has never been seen before in a generalpurpose microprocessor. Hyper-Threading technology helps increase transaction rates, reduces end-user response times, and enhances business productivity providing a competitive edge to e-Businesses and the enterprise. The Intel Xeon processor family for servers
represents the next leap forward in processor design and performance by being the first Intel processor to support thread-level parallelism on a single processor. HISTORY: The hyper-threading technology found its roots in Digital Equipment Corporation, but was brought on the market by Intel. Hyper-Threading was first introduced in the Foster MP-based Xeon in 2002. It appeared on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then appeared in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor. Previous generations of Intels processors based on the Core micro architecture do not have Hyper-Threading, because the Core micro architecture is a descendant of the P6 micro architecture used in iterations of Pentium since the Pentium Pro through the Pentium III and the Celeron and the Pentium II Xeon and Pentium III Xeon models. Intel released the Nehalem (Core i7) in November 2008 in which hyperthreading makes a return. The first generation Nehalem contains 4 cores and effectively scales 8 threads. Since then, both 2- and 6-core models have been released, scaling 4 and 12 threads respectively. The Intel Atom is an in-order processor with hyper-threading, for low power mobile PCs and low-price desktop PCs. The Itanium 9300 launched with eight threads per processor (2 threads per core) through enhanced hyper-threading technology. Polson, the next-generation Itanium, is scheduled to have additional hyper-threading enhancements. The Intel Xeon 5500 server chips also utilize two-way hyper-threading. ABSTRACT: Hyper threading technology, which brings the concept of simultaneous multithreading to the Intel architecture, was first introduced on the Intel Xeon processor in early 2002 for the server market. In November 2002, Intel launched the technology on the Intel Pentium 4 at clock frequencies of 3.06 GHz and higher, making the technology widely available to the consumer market. This technology signals a new direction in micro architecture development and fundamentally changes the cost-benefit tradeoffs of micro architecture design choices.
BENEFITS: In terms of day-to-day tasks like web browsing, email and word processing, Hyper-Threading wont have much of an impact. Yes, Hyper-Threading is theoretically better at multi-tasking. However, todays processors are so fast that basic programs are rarely limited by the speed of your processor. The way programs are coded can also be a limitation. You may sometimes find that you have numerous programs open, but only one of your processor cores is being put too much use. Thats because the programs are, for whatever reason, not having their work divided among the different cores. When youre trying to do some heavy lifting, however, Hyper-Threading can be more helpful. The applications most likely to benefit are 3D rendering programs, heavy-duty audio/video Transco ding apps, and scientific applications built for maximum multi-threaded performance. But you may also enjoy a performance boost when encoding audio files in iTunes, playing 3D games and zipping/unzipping folders. The boost in performance can be up to 30%, although there will also be situations where Hyper-Threading provides no boost at all. THREADING IN XP: 1) 2) Download Find the and following install files Windows in XP your c: Service Pack 2
(normally
windowsservicepackfiles)
ntkrnlmp.exehalmacpi.dll and copy them to your c:\windows\system32 folder. (This is considering your new motherboard has ACPI support. I know that these files will support non-ACPI 3) Open computers up boot.ini as in well, your text but that and has find not the been following Windows tested) line: XP
editor
Applications that exhibit good threading methods and scale well on multiprocessor servers today are likely to take advantage of Hyper-Threading technology. The performance increase seen is highly dependent on the nature of the application, the threading model it uses, as well as system dependencies.
Here is an i3 i5 i7 comparison which discusses how the three processor lines differ in terms of features and performance.
INTEL CORE I3 I5 I7 COMPARISON: TECHNICAL FEATURES. All the core i3 processors have twin core with clocking frequency ranging from 2.933 GHz to 3.2 GHz. A 4MB L3 smart cache, 2 x 256 KB L2 cache and Direct Media Interface bus, fitted with the brand new LGA 1156 socket, makes them the best entry level processors. All these chips are built on a 32 nm architecture which ensures that more transistors can be etched on the silicon chips. An integrated GPU (Graphic Processing Unit) makes graphic processing even faster. With Intel's hyper-threading and virtualization technology enabled, along with HD graphics, these chips are priced at $133 only.
As the i3 i5 i7 comparison chart on the Intel web site reveals, core i5 line consists of three separate series of processors with twin and quad cores. The twin, as well as quad cores comes with 4 threads each. The clocking frequencies of these processors range from 2.4 GHz to 3.33 GHz, powered by the Intel Turbo boost technology that boosts clocking frequencies to higher level when need be. With 4 MB to 8 MB L3 cache, direct media interface, integrated GPU, LGA 1156 socket, Intel HD graphics, Intel smart cache technology and Hyper-threading enabled, the cost of these processors ranges from $176 to $256. As a core i5 vs core i7 comparison will prove, i5 chips are faster than the i3 chips. They form the mid level segment.
With the core i7 line, Intel has fulfilled its dream of creating the 'best processors on the planet'. In the core i3 vs. i5 vs. i7 comparison, core i7 is miles ahead of the rest of the pack. This line consists of quad core processors with clocking frequencies reaching 3.33 GHz powered by Intel Turbo boost. As a quad core vs. dual core comparison would prove, greater number of cores can immensely boost computing speeds.
With L3 smart cache ranging from 8MB to as much as 12 MB, Intel Quick Path Interconnect technology (that can enhance data transfer speed to 25.6 GB/Sec), integrated GPU, Hyperthreading and Intel HD graphics, the core i7 series processors are indeed, unarguably, the best
processors ever manufactured. They are meant for high end computing applications, web servers and high end business users. The price of these processors ranges from $200 to as much as $1000.
Let us now make an Intel core i3 vs. i5 vs. i7 performance comparison. Besides being multicore, what makes the i3, i5, i7 processors to be computing power houses, is the hyper-threading technology, combined with the Turbo boost feature. An integrated GPU and an enhanced L3 cache make graphic processing super fast. As an Intel core i3 vs. core 2 duo comparison would reveal, the i3 processors surpass the computing power offered by the earlier core 2 duo series. If you are eying an entry level laptop computer, go for the core i3 line. It is great for home use desktop computers too.
If you are a business user, I suggest that you go for the core i5 line that can handle multitasking even better than the i3 line. With hyper-threading enabled and Intel's range of innovative technologies fully operational, the core i5 line is ideal for the business user or home users, who are into intensive gaming. If you want to settle for nothing less than the very best in computing today, go for the high end core i7 line. As you must have realized while going through the core i3 i5 i7 comparison, the i7 line puts phenomenal computing power at your fingertips that was available once only to users of supercomputers! Budget wise, they may be the costliest out of the whole lot, but they offer true value for your money.
As a core i3 vs. i5 vs. i7 comparison would reveal, there is a lot of choice offered by Intel too, for users with different levels of requirements in terms of features and price. Here are some of the best CPUs to look out for, that fall in the medium and low budget category, for you to choose from.
AMD Phenom II X4 965 Black Edition 3.4 GHz - $195 Intel Core i5 750 2.66 GHz - $196 Intel Pentium Processor G6950 2.80 - $87(!) AMD Phenom II X2 550 3.1 GHz - $102
- Moving data from one location in the computer & memory to another - Jump to new instruction sets based on logical operations or choices - Perform mathematical operations using the Arithmetic Logic Unit (ALU) In order to conduct these operations the processor makes use of an address bus that it uses to send addresses to the computer memory as well as a data bus that is used to retrieve or send information to the computer memory. It also has a separate control line that will notify the memory of the computer if it is getting or sending/setting a given memory location. In order to conduct all of its designed operations, the CPU also has a clock which forms the basis for synchronizing the processor's actions with the remainder of the computer. For accessing commonly used computer instructions or data, processors will also implement different caching schemes in order to gain access to the required data at a faster rate than using direct access RAM. PROCESSOR MEMORY The computer processor makes use of read only and random access memory (ROM and RAM respectfully). The processor & ROM is programmed with preset information that is permanently programmed with core functions in order to facility processor communication with the data bus. ROM is commonly referred to as the BIOS (Basic Input/output System) on Windows computers and is also used to retrieve the boot sector for the computer. The processor can read and write to the RAM depending on what action(s) the current instruction set has determined if the processor needs to conduct. RAM is not designed to permanently save data and is rest when the computer is turned off or loses power.
THE ROLE OF THE 64 BIT PROCESSOR Although 64 bit computer processors have been deployed since the early 1990s, they have only been deployed at the consumer-level in large numbers in recent years. All of the major computer processor manufacturers now produce 64 bit computer processors which
are available for use across different types of operating system. The primary advantage of a 64 bit computer processor over legacy designs is the significantly expanded address space available to the processor. The previous 32 bit processors would be limited to a maximum of two to four gigabytes of effective RAM access. 64 Gigabyte processors are also able to provide increased input/output access to hard drives and the computer's video card that help to further increase overall system performance. Early adopters of 64 bit processors don't necessarily see a large system performance if not doing high demand tasks such as video editing or playing networked 3D video games. This will continue to change as more applications are designed to take advantage of 64 bit processors and the increased memory capacity of the new computer processors. REQUIREMENTS OF HYPER-THREADING TECHNOLOGY HT technology requires the following fundamentals:1. A processor built-in with HT technology Not all processors support HT; therefore before purchasing a computer make sure that it supports HT Technology. You can easily identify HT enabled processor by checking its specification and CPU logo. Normally, Intel clearly puts tags on HT built-in processors. Some of Intel family processors that support HT technology are Intel Atom, core processors, Xeon, Core i-series, Pentium 4 and Pentium mobile processors. An operating system that supports HT, HT enabled single processor appears as two processors to the operating system. However, if the OS dont support HT, you cant benefit from this technology even though you have HT enabled processor. The OS must recognize that you have HT enabled processors so that it will schedule two threads or sets of instruction for processing. Windows XP and later Operating systems are optimized for HT technology.
2. HT compatible Chipset 3. HT enabled system BIOS you can easily enable/disable HT on system BIOS. Consult your system manual for this. ADVANTAGES: Intel claims threading an application can result in increased performance on a uniprocessor machine or for a multi-processor application. Threads can make a GUI more responsive. They can also facilitate the overlap of I/ O and computation. If multiple processors are available, threaded applications may see substantial speedup. HOW TO SEE THREADING IN COMPUTER? 1. Click the Start button, right-click My Computer, and then click Properties. 2. Click Hardware and click Device Manager. In the Device Manager window, click the plus (+) sign next to the processor type. If HyperThreading is enabled, the processor is listed twice. TO ENABLE OR DISABLE HYPER-THREADING: 1. Shut down and restart the computer. 2. When the DELL logo appears, press <F2> immediately to enter the system setup program. If you wait too long and the Microsoft Windows logo appears, continue to wait until you see the Windows desktop. Then shut down your computer through the Start menu and try again. 3. When the system setup program screen appears, highlight CPU Information and press <Enter>. 4. When the CPU information screen appears, highlight Hyper-Threading and press the spacebar on the keyboard to select Enable or Disable.
5. Press <ESC> to save the setting and exit the CPU Information screen. 6. Press <ESC> to Save and Exit. 7. When you see the message Save changes and exit now, press <Enter>. Your computer will restart.
LogicalProcPerPhysicalProc
Syntax: unsigned char LogicalProcPerPhysicalProc (void) Description: This function returns a byte value that contains the maximum number of logical processors per physical package. This is the maximum value of a logical processor that a physical package can handle. In order to get the number of available logical processors that a program can use, use the function CPUCount to get the value of AvailLogicalNum. Return Value: Number of logical processors. CorePerPhysicalProc Syntax: unsigned char CorePerPhysicalProc (void) Description: This function returns a byte value that contains the maximum number of cores per physical package. This is the maximum value of cores that a physical package can handle. In order to get the number of available cores that a program can use, use the function CPUCount to get the value of AvailCoreNum. Return Value: Number of maximum cores per physical package. HTSupported Syntax: unsigned int HTSupported (void) Description: This function checks if the processor has Hyper-Threading technology built-in.
HTSupported
Syntax: unsigned int HTSupported (void) Description: This function checks if the processor has Hyper-Threading technology built-in. Return Value: 0 If Hyper-Threading is not built-in.
CPUCount Syntax: unsigned char CPUCount (unsigned char *AvailLogicalNum, unsigned char *AvailCoreNum, unsigned char *PhysicalNum,)
DISADVANTAGE: Threading an existing serial application increases the complexity of the application, Intel says. Sharing of resources, such as global data, can introduce common parallel programming errors such as storage conflicts and other race conditions. Debugging such problems is difficult as they are non-deterministic, and introducing debugging probes, such as print statements, can mask these errors. IMPROVING MULTI-THREADING VALIDATION: Clearly, with MT-mode bugs constituting nearly twice the Number of post-silicon bugs, 15% versus 8% of the presilicon bugs, coupled with the high cost of fixing post silicon MT bugs (full layer versus metal taproots), there is an opportunity for improving pre-silicon validation of future MT-capable processors. Driven by the analysis of pre- and post-silicon MT-mode bugs [2, 3], we are improving pre-silicon validation by doing the following:
Enhancing the Cluster Test Environments to improve MT-mode functionality checking. Increasing the focus on micro architecture validation of multi-cluster protocols such as SMC, atomic Operations and forward progress mechanisms. Increasing the use of coverage-based validation Techniques to address hardware/microcode. HYPER-THREADING VS. DUALCORE: Some Intel processor supports hyper-threading technology, which allows that processor to execute simultaneously. Programs that are designed to use HTT may run 10% to 30% faster on a HTTP enabled processor on a similar non-HTT model. Dual core processor has two threads to run.
How Hyper-Threading works: The current computing paradigm implies multithreading calculations. It concerns not only servers, but also workstations and desktop systems. Threads can relate to one or different applications, but there are almost always more than 1 active threads (to make sure open in the Windows 2000/XP the Task Manager and display the number of threads). At the same time a usual processor can execute only one thread at a time and must switch between them constantly.
The Hyper-Threading technology was first realized in the Intel Xeon MP processor (Foster MP). Note that the Xeon MP, announced at IDF Spring 2002, uses a core similar to the Pentium 4 Willamette, has a 256 KBytes L2 cache and 512 KBytes/1 MBytes L3 cache and supports 4-processor configurations.
The Hyper-Threading support is also available in the processor for workstations -- Intel Xeon (Prestonia core, 512 Kbytes L2 cache) which appeared on the market earlier than the Xeon MP. We already examined dual-processor configurations on the Intel Xeon, that is why we are going to take a look at Hyper-Threading capabilities by the example of these CPUs both theoretically and practically. However that may be, the "usual" Xeon is more convenient than the Xeon MP in 4-processor systems...
The Hyper-Threading is based in the principles that at each point of time only a part of processor resources is used for execution of the program code. Unused resources can also be loaded, for example, with parallel execution of another application (or just another thread of the same application). One physical processor Intel Xeon forms two logical processors (LP) which share CPU computational resources. An operating system and applications see two CPUs and can distribute a work load between them, like in case of a normal dual-processor system.
One of the aims of the Hyper-Threading is with only one active thread to let it be executed at the same rate as on a usual CPU. That is why the processor has two main modes: Single-Task (ST) and Multi-Task (MT). In the ST mode only one logical processor is active which uses available resources completely (ST0 and ST1 modes); the other LP is stopped by the HALT instruction. When the second thread appears the second processor gets enabled (by interrupt), and the physical CPU switches to the MT mode. Halting of an unused LP is on the shoulders of an OS which is responsible for the execution of one thread be as fast as without the Hyper-Threading.
Each of two LP has an Architecture State (AS) which includes a state of registers of different types -- of general purpose, controlling, APIC and service ones. Each LP has its own APIC (interrupt controller) and a set of registers; for their correct operation there is a Register Alias Table (RAT) which traces correspondence between 8 general-purpose registers IA-32 and 128 registers of the physical CPU (one RAT for each LP).
When two threads are executed two Next Instruction Pointers are supported. The most part of instructions is taken from the Trace Cache (TC) where they are kept in the decode form, and two active LPs access the TC in turn, in a cycle. At the same time, when only one LP is active it doesn't share the TC access. The Microcode ROM is accessed the same way. The ITLB (Instruction Translation Look-aside Buffer) units which get enabled when required instructions are lacking in the instruction cache, are duplicated and
deliver instructions for their threads. The IA-32 Instruction Decode Unit is shared, and when decoding of instructions is required for both threads, it serves them in turn (in a cycle). The Uop Queue and Allocator units are divided in two and provide half of elements for each LP. 5 schedulers process queues of decoded instructions (Uops) although they belong to LP0/LP1 and deliver instructions for execution to respective Execution Units -- depending on readiness for execution of the former ones and accessibility of the latter. Caches of all levels (L1/L2 for Xeon, and L3 for Xeon MP) are entirely shareable between the LPs, though to provide data integrity entries in the DTLB (Data Translation Look-aside Buffer) have descriptors in the form of IDs of logical processors. Thus, instructions of both logical CPUs can be executed simultaneously using resources of one physical processor which are divided into 4 classes:
Duplicated; Fully Shared; Entry Tagged; Partitioned depending on the operating mode - ST0/ST1 or MT.
The most of applications which work faster in multiprocessor systems can also speed up on the CPU with the Hyper-Threading without any modifications. But there can be problems: for example, if one of the processes is in the waiting cycle it can take all resources of the physical CPU hampering operation of the second LP. Thus, the performance with the HyperThreading enabled can even fall down (up to 20%). To prevent this Intel recommends to use the PAUSE instruction instead of empty waiting cycles (appeared in the IA-32 starting from the Pentium 4). Besides, automatic and semi-automatic code optimization is being worked on now - for example, the Intel OpenMP C++/Fortran Compilers series achieved a great success.
Another aim of Intel in development of the Hyper-Threading technology was to make the number of transistors, a die surface and power consumption grow much slower with a considerable efficiency increase. Well, incorporation of the Hyper-Threading into the Xeon/Xeon MP increased the die's surface and power consumption by just 5%. We are just to estimate what performance gain is obtained with it.
It is not possible to effectively detect whether or not processors are hyper threaded Or dual-core from within an operating system. An external tool is required to Perform this task. Intel has an extremely effective tool for this purpose. It can be download from their website either as a completely pre-packaged executable, or a text file of sample code can be downloaded and customized. This is an extremely handy tool and can be found at: http://intel.com/cd/ids/developer/asmo-na/eng/recent/275339.htm An example of the pre-packaged executables output is shown below in Figure:
Monitoring hyper-threaded CPUs within Windows operating systems can be accomplished via PerfMon.exe. As previously mentioned, the OS recognizes a single Hyper-threaded CPU as two logical processors.
Are there any licensing issues? Each logical processor that is contained within a Hyper-Threading processor appears to the operating system as an individual processor. This means that tools or services within Windows that display information about processors, such as the Windows Task Manager or Windows Performance Monitor, will display processor information for every logical processor that Windows is utilizing.
Intels processor identification methodology has been updated to support the software identification of Hyper-Threading using the CPUID instruction. Operating system and application software can use this identification mechanism to detect the presence of HyperThreading processors and to provide support for features such as Hyper-Threading-aware product licensing. Windows .NET Server supports an API that provides the logical to physical mapping for the processors in the system. The current Windows operating system licensing model for Hyper-Threading-enabled systems is to require a processor license for each physical processor. However, it is important to note that any software product that was released before the introduction of Hyper-Threading will not support Hyper-Threading detection and will treat each logical processor as if it were an individual physical processor. This licensing model applies to all 32-bit versions of Windows XP and Windows .NET Server. This model delivers the performance benefit of utilizing both logical processors for each processor that the Windows license supports. The processor limits which result from this licensing model for 32-bit versions of Windows .NET Server and Windows XP are shown below.
Maximum Windows Version Physical Processor Limit Windows XP Home Edition Windows XP Professional Windows .NET Standard Server Windows .NET Enterprise Server Windows .NET Datacenter Server 32 8 1 2 4
Maximum Logical Processor Limit 2 4 8 If seventeen HyperThreading 16 processors are listed by the 32 BIOS, Windows .NE
T Data enter Server will exhaust the 32-processor limit using both logical processors on the first 16 physical processors listed. The operating system will not use either logical processor on the seventeenth physical processor. As described earlier, utilizing a single logical processor on an idle physical Hyper-Threading processor provides better performance than
utilizing the second logical processor on a physical processor that already has an active logical processor. As a result, Microsofts recommendation for systems that contain more than 16 physical Hyper-Threading processors is to disable Hyper-Threading at the BIOS before installing or booting Windows. Because the performance benefit provided by the second logical processors in a Hyper-Threading system decreases as the number of physical processors in the system increases, it is not anticipated that the lack of Hyper-Threading support on systems with more than 16 physical Hyper-Threading processors will have a significant impact on the performance of the system.
Hyper-threading in Linux:
In order to make use of Hyper-Threading in Linux, you will need Hyper-Threading enabled in kernel. But how can you find out if your CPU supports HT? We can get the information from our running Linux system about its CPU by looking into /proc. For example, bellow you can see the output taken from a Xeon system: cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 3 cpu MHz : 3201.940 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes
Inside the flags section we are looking for a ht flag. If it is present, this means that the system supports HT. Lets look on another sample taken from a Pentium4 CPU (the un-needed infos were removed): model name : Intel(R) Pentium(R) 4 CPU 3.20GHz cpu MHz : 3192.092 Again this system also supports HT. If you dont see the HT flag, then your system doesnt support HT. Obviously this will not be available on AMD processors as this is an Intel technology (this might not be true anymore with newer AMD CPUs). Here is an example from an AMD Opteon system: model name : AMD Opteron(tm) Processor 242 cpu MHz : 1593.326
If your CPU supports HT, then you can take advantage of this technology only if HT support is enabled in your running kernel. You can either install a kernel provided by your Linux distribution with HT support (one that has *SMP* inside its name for ex.) or you can compile your own kernel and include HT support. Once you are running a HT enabled kernel your should normally see the virtual CPU as a regular extra CPU (you will see 2 CPUs on a single CPU system, 4 CPUs on a dual processor system, etc.). You can easily check this with: cat /proc/cpuinfo If you still see only one CPU even after you have installed a HT enabled kernel, then you might want to check:
Threading Compilers:
Intel's new HT compiler tools run the gamut of programming applications, Intel says. Version 7.0 of Intel C++ and Intel Fortran compilers for Windows and Linux can improve the performance of applications for Intel Itanium 2, Intel Xeon, and Intel Pentium 4 processor-based systems up to 40% compared to compilers currently available from other vendors, Intel claims. Specific to HT, the Version 7.0 Intel compilers include an auto-parallelization option that automatically looks in applications for opportunities to create multiple execution threads and enhancements to OpenMP, an open standard that enables the use of highlevel directives to simplify the creation and management of multi-threaded application software.
Conclusion: Whether you want desktop or laptop computer, HT technology is available in most computer types including servers and workstations. HT makes us to work more fast and high performance.