A hardware reset or hard reset of a computer system is a hardware operation that re-initializes the core hardware components of the system, thus ending all current software operations in the system. This is typically, but not always, followed by booting of the system into firmware that re-initializes the rest of the system, and restarts the operating system.
Hardware resets are an essential part of the power-on process, but may also be triggered without power cycling the system by direct user intervention via a physical reset button, watchdog timers, or by software intervention that, as its last action, activates the hardware reset line (e.g, in a fatal error where the computer crashes).
User initiated hard resets can be used to reset the device if the software hangs, crashes, or is otherwise unresponsive. However, data may become corrupted if this occurs.[1] Generally, a hard reset is initiated by pressing a dedicated reset button, or holding a combination of buttons on some mobile devices.[2][3] Devices may not have a dedicated Reset button, but have the user hold the power button to cut power, which the user can then turn the computer back on.[4] On some systems (e.g, the PlayStation 2 video game console), pressing and releasing the power button initiates a hard reset, and holding the button turns the system off.
Hardware reset in 80x86 IBM PC
The 8086 microprocessors provide RESET pin that is used to do the hardware reset. When a HIGH is applied to the pin, the CPU immediately stops, and sets the major registers to these values:
Register | Value |
---|---|
CS (Code Segment) | 0xFFFF |
DS (Data Segment) | 0x0000 |
ES (Extra Data Segment) | 0x0000 |
SS (Stack Segment) | 0x0000 |
IP (Instruction Pointer) | 0x0000 |
The CPU uses the values of CS and IP registers to find the location of the next instruction to execute. Location of next instruction is calculated using this simple equation:
Location of next instruction = (CS<<4) + (IP)
This implies that after the hardware reset, the CPU will start execution at the physical address 0xFFFF0. In IBM PC compatible computers, This address maps to BIOS ROM. The memory word at 0xFFFF0 usually contains a JMP instruction that redirects the CPU to execute the initialization code of BIOS. This JMP instruction is absolutely the first instruction executed after the reset.[5]
Hardware reset in later x86 CPUs
Later x86 processors reset the CS and IP registers similarly, refer to Reset vector.
See also
References
- The 80x86 IBM PC and Compatible Computers (Volumes I & II (4th Edition)), By Mohamed Ali Mazidi and Janice Gillispie Mazidi, Section 9.1, Page 241.
https://en.wikipedia.org/wiki/Hardware_reset
A power-on reset (PoR, POR) generator is a microcontroller or microprocessor peripheral that generates a reset signal when power is applied to the device. It ensures that the device starts operating in a known state.
PoR generator
In VLSI devices, the power-on reset (PoR) is an electronic device incorporated into the integrated circuit that detects the power applied to the chip and generates a reset impulse that goes to the entire circuit placing it into a known state.
A simple PoR uses the charging of a capacitor, in series with a resistor, to measure a time period during which the rest of the circuit is held in a reset state. A Schmitt trigger may be used to deassert the reset signal cleanly, once the rising voltage of the RC network passes the threshold voltage of the Schmitt trigger. The resistor and capacitor values should be determined so that the charging of the RC network takes long enough that the supply voltage will have stabilised by the time the threshold is reached.
One of the issues with using RC network to generate PoR pulse is the sensitivity of the R and C values to the power-supply ramp characteristics. When the power supply ramp is rapid, the R and C values can be calculated so that the time to reach the switching threshold of the schmitt trigger is enough to apply a long enough reset pulse. When the power supply ramp itself is slow, the RC network tends to get charged up along with the power-supply ramp up. So when the input schmitt stage is all powered up and ready, the input voltage from the RC network would already have crossed the schmitt trigger point. This means that there might not be a reset pulse supplied to the core of the VLSI.
Power-on reset on IBM mainframes
On an IBM mainframe, a power-on reset (POR) is a sequence of actions that the processor performs either due to a POR request from the operator or as part of turning on power. The operator requests a POR for configuration changes that cannot be recognized by a simple System Reset.
See also
https://en.wikipedia.org/wiki/Power-on_reset
In computing, the reset vector is the default location a central processing unit will go to find the first instruction it will execute after a reset. The reset vector is a pointer or address, where the CPU should always begin as soon as it is able to execute instructions. The address is in a section of non-volatile memory initialized to contain instructions to start the operation of the CPU, as the first step in the process of booting the system containing the CPU.[citation needed]
Processors
- The reset vector for the 8086 processor is at physical address FFFF0h (16 bytes below 1 MB). The value of the CS register at reset is FFFFh and the value of the IP register at reset is 0000h to form the segmented address FFFFh:0000h, which maps to physical address FFFF0h.[1]
- The reset vector for the 80286 processor is at physical address FFFFF0h (16 bytes below 16 MB). The value of the CS register at reset is F000h with the descriptor base set to FF0000h and the value of the IP register at reset is FFF0h to form the segmented address FF000h:FFF0h, which maps to physical address FFFFF0h in real mode.[2] This was changed to allow sufficient space to switch to protected mode without modifying the CS register.[3]
- The reset vector for the 80386 and later x86 processors is physical address FFFFFFF0h (16 bytes below 4 GB). The value of the selector portion of the CS register at reset is F000h, the value of the base portion of the CS register is FFFF0000h, and the value of the IP register at reset is FFF0h[4] to form the segmented address FFFF0000h:FFF0h, which maps to the physical address FFFFFFF0h in real mode.[5][6]
- The reset vector for PowerPC/Power ISA processors is at an effective address of 0x00000100 for 32-bit processors and 0x0000000000000100 for 64-bit processors.
- The reset vector for m68k Architecture processors is 0x0 for Initial Interrupt Stack Register (IISR; Not really a reset vector and is used to initialize the stack pointer after reset.) and 0x4 for initial program counter (reset).[7]
- The reset vector for SPARC version 8 processors is at an address of 0x00;[8] the reset vector for SPARC version 9 processors is at an address of 0x20 for power-on reset, 0x40 for watchdog reset, 0x60 for externally initiated reset, and 0x80 for software-initiated reset.[9]
- The reset vector for MIPS32 processors is at virtual address 0xBFC00000,[10] which is located in the last 4 Mbytes of the KSEG1 non-cacheable region of memory.[11] The core enters kernel mode both at reset and when an exception is recognized, hence able to map the virtual address to physical address.[12]
- The reset vector for the ARM family of processors is address 0x0[13] or 0xFFFF0000. During normal execution RAM is re-mapped to this location to improve performance, compared to the original ROM-based vector table.[14]
See also
References
the 286 begins execution in real mode with the instruction at physical location FFFFF0H.
After reset, CS:IP = F000:FFF0 on the iAPX 286. This change was made to allow sufficient code space to enter protected mode without reloading CS.
Execution begins with the instruction addressed by the initial contents of the CS and IP registers. To allow the initialization software to be placed in a ROM at the top of the address space, the high 12 bits of addresses issued for the code segment are set, until the first instruction which loads the CS register, such as a far jump or call. As a result, instruction fetching begins from address 0FFFFFFF0H.
The first instruction that is fetched and executed following a hardware reset is located at physical address FFFFFFF0h. This address is 16 bytes below the processor's uppermost physical address. The EPROM containing the software-initialization code must be located at this address.
- "Boot sequence for an ARM based embedded system -2 - DM". www.embeddedrelated.com. Retrieved 2017-11-10.
https://en.wikipedia.org/wiki/Reset_vector
A power-on self-test (POST) is a process performed by firmware or software routines immediately after a computer or other digital electronic device is powered on.[1]
This article mainly deals with POSTs on personal computers, but many other embedded systems such as those in major appliances, avionics, communications, or medical equipment also have self-test routines which are automatically invoked at power-on.[2]
The results of the POST may be displayed on a panel that is part of the device, output to an external device, or stored for future retrieval by a diagnostic tool. Since a self-test might detect that the system's usual human-readable display is non-functional, an indicator lamp or a speaker may be provided to show error codes as a sequence of flashes or beeps. In addition to running tests, the POST process may also set the initial state of the device from firmware.
In the case of a computer, the POST routines are part of a device's pre-boot sequence; if they complete successfully, the bootstrap loader code is invoked to load an operating system.
https://en.wikipedia.org/wiki/Power-on_self-test
https://en.wikipedia.org/wiki/PC_speaker
https://en.wikipedia.org/wiki/Avionics
Control-Alt-Delete (often abbreviated to Ctrl+Alt+Del and sometimes called the "three-finger salute" or "Security Keys")[1][2] is a computer keyboard command on IBM PC compatible computers, invoked by pressing the Delete key while holding the Control and Alt keys: Ctrl+Alt+Delete. The function of the key combination differs depending on the context but it generally interrupts or facilitates interrupting a function. For instance, in pre-boot environment (before an operating system starts)[3][4][5] or in MS-DOS, Windows 3.0 and earlier versions of Windows or OS/2, the key combination reboots the computer. Starting with Windows 95, the key combination invokes a task manager or security related component that facilitates ending a Windows session or killing a frozen application.
https://en.wikipedia.org/wiki/Control-Alt-Delete
An index register in a computer's CPU is a processor register (or an assigned memory location)[1] used for pointing to operand addresses during the run of a program. It is useful for stepping through strings and arrays. It can also be used for holding loop iterations and counters. In some architectures it is used for read/writing blocks of memory. Depending on the architecture it maybe a dedicated index register or a general-purpose register.[2] Some instruction sets allow more than one index register to be used; in that case additional instruction fields may specify which index registers to use.[3]
Generally, the contents of an index register is added to (in some cases subtracted from) an immediate address (that can be part of the instruction itself or held in another register) to form the "effective" address of the actual data (operand). Special instructions are typically provided to test the index register and, if the test fails, increments the index register by an immediate constant and branches, typically to the start of the loop. While normally processors that allow an instruction to specify multiple index registers add the contents together, IBM had a line of computers in which the contents were or'd together.[4]
Index registers has proved useful for doing vector/array operations and in commercial data processing for navigating from field to field within records. In both uses index registers substantially reduced the amount of memory used and increased execution speed.
https://en.wikipedia.org/wiki/Index_register
Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.
In computer programming, addressing modes are primarily of interest to those who write in assembly languages and to compiler writers. For a related concept see orthogonal instruction set which deals with the ability of any instruction to use any addressing mode.
https://en.wikipedia.org/wiki/Addressing_mode#Memory_indirect
History
In early computers without any form of indirect addressing, array operations had to be performed by modifying the instruction address, which required several additional program steps and used up more computer memory,[5] a scarce resource in computer installations of the early era (as well as in early microcomputers two decades later).
Index registers, commonly known as B-lines in early British computers, as B-registers on some machines and a X-registers[a] on others, were first used in the British Manchester Mark 1 computer, in 1949. In general, index registers became a standard part of computers during the technology's second generation, roughly 1954–1966. Most[b] machines in the IBM 700/7000 mainframe series had them, starting with the IBM 704 in 1954, though they were optional on some smaller machines such as the IBM 650 and IBM 1401.
Early "small machines" with index registers include the AN/USQ-17, around 1960, and the 9 series of real-time computers from Scientific Data Systems, from the early 1960s.
The 1962 UNIVAC 1107 has 15 X-registers, four of which were also A-registers.
The 1964 GE-635 has 8 dedicated X-registers; however, it also allows indexing by the instruction counter or by either half of the A or Q register.
The Digital Equipment Corporation (DEC) PDP-6, introduced in 1964, and the IBM System/360, announced in 1964, do not include dedicated index registers; instead, they have general-purpose registers (called "accumulators" in the PDP-6) that can contain either numerical values or addresses. The memory address of an operand is, in the PDP-6, the sum of the contents of a general-purpose register and an 18-bit offset and, on the System/360, the sum of the contents of two general-purpose registers and a 12-bit offset.[6][7] The compatible PDP-10 line of successors to the PDP-6, and the IBM System/370 and later compatible successors to the System/360, including the current z/Architecture, work in the same fashion.
The 1969 Data General Nova and successor Eclipse, and 1970 DEC PDP-11, minicomputers also provided general-purpose registers (called "accumulators" in the Nova and Eclipse), rather than separate accumulators and index registers, as did their Eclipse MV and VAX 32-bit superminicomputer successors. In the PDP-11 and VAX, all registers could be used when calculating the memory address of an operand; in the Nova, Eclipse, and Eclipse MV, only registers 2 and 3 could be used.[8][9][10]
The 1971 CDC STAR-100 has a register file of 256 64-bit registers, 9 of which are reserved. Unlike most computers, the STAR-100 instructions only have register fields and operand fields, so the registers serve more as pointer registers than as traditional index registers.
While the Intel 8080 allowed indirect addressing via a register, the first microprocessor with a true index register appears to have been the 1974 Motorola 6800.
In 1975, the 8-bit MOS Technology 6502 processor had two index registers 'X' and 'Y'.[11]
In 1978, the Intel 8086, the first x86 processor, had eight 16-bit registers, referred to as "general-purpose", all of which can be used as integer data registers in most operations; four of them, 'SI' (source index), 'DI' (destination index), 'BX' (base), and 'BP' (base pointer), can also be used when computing the memory address of an operand, which is the sum of one of those registers and a displacement, or the sum of one of 'BX' or 'BP", one of 'SI' or 'DI', and a displacement.[12] The 1979 Intel 8088, and the 16-bit Intel 80186, Intel 80188, and Intel 80286 successors work the same. In 1985, the i386, a 32-bit successor to those processors, introducing the IA-32 32-bit version of the x86 architecture, extended the eight 16-bit registers to 32 bits, with "E" added to the beginning of the register name; in IA-32, the memory address of an operand is the sum of one of those eight registers, one of seven of those registers (the stack pointer is not allowed as the second register here) multiplied by a power of 2 between 1 and 8, and a displacement.[13]: 3-11–3-12, 3-22–3-23 The Advanced Micro Devices Opteron, the first model of which was released in 2003, introduced x86-64, the 64-bit version of the x86 instruction set; in x86-64, the general-purpose registers were extended to 64 bits, and eight additional general-purpose registers were added; the memory address of an operand is the sum of two of those 16 registers and a displacement.[14][13]: 3–12, 3–24
The reduced instruction set computing (RISC) instruction sets introduced in the 1980s and 1990s all provide general-purpose registers that can contain either numerical values or address values. In most of those instruction sets, there are 32 general-purpose registers (in some of those instruction sets, the value of one of those registers is hardwired to zero) could be used to calculate the operand address; they did not have dedicated index registers. In the 32-bit version of the ARM architecture, first developed in 1985, there are only 16 registers designated as "general-purpose registers", but only 13 of them can be used for all purposes, with register R15 containing the program counter. The memory address of a load or store instruction is the sum of any of the 16 registers and either a displacement or another of the registers with the exception of R15 (possibly shifted left for scaling).[15] In the 64-bit version of the ARM architecture, there are 31 64-bit general-purpose registers plus a stack pointer and a zero register; the memory address of a load or store instruction is the sum of any of the 31 registers and either a displacement or another of the registers.[16]
Examples
Here is a simple example of index register use in assembly language pseudo-code that sums a 100 entry array of 4-byte words:
Clear_accumulator Load_index 400,index2 //load 4*array size into index register 2 (index2) loop_start : Add_word_to_accumulator array_start,index2 //Add to AC the word at the address (array_start + index2) Branch_and_decrement_if_index_not_zero loop_start,4,index2 //loop decrementing by 4 until index register is zero
See also
Notes
- The 702, 705 and 7080 did not have index registers.
References
- Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile. Arm. 2022. pp. C1-227, C3-252.
https://en.wikipedia.org/wiki/Index_register
In digital electronics, especially computing, hardware registers are circuits typically composed of flip flops, often with many characteristics similar to memory, such as:[citation needed]
- The ability to read or write multiple bits at a time, and
- Using an address to select a particular register in a manner similar to a memory address.
Their distinguishing characteristic, however, is that they also have special hardware-related functions beyond those of ordinary memory. So, depending on the point of view, hardware registers are like memory with additional hardware-related functions; or, memory circuits are like hardware registers that just store data.[citation needed]
Hardware registers are used in the interface between software and peripherals. Software writes them to send information to the device, and reads them to get information from the device. Some hardware devices also include registers that are not visible to software, for their internal use.
Depending on their complexity, modern hardware devices can have many registers. Standard integrated circuits typically document their externally-exposed registers as part of their electronic component datasheet.
Functionality
Typical uses of hardware registers include:
- configuration and start-up of certain features, especially during initialization
- buffer storage e.g. video memory for graphics cards
- input/output (I/O) of different kinds
- status reporting such as whether a certain event has occurred in the hardware unit, for example a modem status register or a line status register.[1]
Reading a hardware register in "peripheral units" — computer hardware outside the CPU — involves accessing its memory-mapped I/O address or port-mapped I/O address with a "load" or "store" instruction, issued by the processor. Hardware registers are addressed in words, but sometimes only use a few bits of the word read in to, or written out to the register.
Commercial design tools simplify and automate memory-mapped register specification and code generation for hardware, firmware, hardware verification, testing and documentation.
Registers can be read/write, read-only or write-only.
Write-only registers are generally avoided. They are suitable for registers that cause a transient action when written but store no persistent data to be read, such as a 'reset a peripheral' register. They may be the only option in designs that cannot afford gates for the relatively large logic circuit and signal routing needed for register data readback, such as the Atari 2600 games console's TIA chip. However, write-only registers make debugging more difficult[2] and lead to the read-modify-write problem so read/write registers are preferred. On PCs, write-only registers made it difficult for the Advanced Configuration and Power Interface (ACPI) to determine the device's state when entering sleep mode in order to restore that state when exiting sleep mode,[3]
Register varieties
The hardware registers inside a central processing unit (CPU) are called processor registers.
Strobe registers have the same interface as normal hardware registers, but instead of storing data, they trigger an action each time they are written to (or, in rare cases, read from). They are a means of signaling.
Registers are normally measured by the number of bits they can hold, for example, an "8-bit register" or a "32-bit register".
Designers can implement registers in a wide variety of ways, including:
- register files
- standard SRAM
- individual flip-flops
- high-speed core memory
In addition to the "programmer-visible" registers that can be read and written with software, many chips have internal microarchitectural registers that are used for state machines and pipelining; for example, registered memory.
Standards
SPIRIT IP-XACT and DITA SIDSC XML define standard XML formats for memory-mapped registers.[4][5][6]
See also
References
Once the INS 8250 has been properly initialized, we should make proper use of the Modem Status register (MSR), Line Status register (LSR) and the Interrupt Identification register (IIR) for controlling the device during actual operation.
https://en.wikipedia.org/wiki/Hardware_register
A barrel processor is a CPU that switches between threads of execution on every cycle. This CPU design technique is also known as "interleaved" or "fine-grained" temporal multithreading. Unlike simultaneous multithreading in modern superscalar architectures, it generally does not allow execution of multiple instructions in one cycle.
Like preemptive multitasking, each thread of execution is assigned its own program counter and other hardware registers (each thread's architectural state). A barrel processor can guarantee that each thread will execute one instruction every n cycles, unlike a preemptive multitasking machine, that typically runs one thread of execution for tens of millions of cycles, while all other threads wait their turn.
A technique called C-slowing can automatically generate a corresponding barrel processor design from a single-tasking processor design. An n-way barrel processor generated this way acts much like n separate multiprocessing copies of the original single-tasking processor, each one running at roughly 1/n the original speed.[citation needed]
History
One of the earliest examples of a barrel processor was the I/O processing system in the CDC 6000 series supercomputers. These executed one instruction (or a portion of an instruction) from each of 10 different virtual processors (called peripheral processors) before returning to the first processor.[1] From CDC 6000 series we read that "The peripheral processors are collectively implemented as a barrel processor. Each executes routines independently of the others. They are a loose predecessor of bus mastering or direct memory access."
One motivation for barrel processors was to reduce hardware costs. In the case of the CDC 6x00 PPUs, the digital logic of the processor was much faster than the core memory, so rather than having ten separate processors, there are ten separate core memory units for the PPUs, but they all share the single set of processor logic.
Another example is the Honeywell 800, which had 8 groups of registers, allowing up to 8 concurrent programs. After each instruction, the processor would (in most cases) switch to the next active program in sequence.[2]
Barrel processors have also been used as large-scale central processors. The Tera MTA (1988) was a large-scale barrel processor design with 128 threads per core.[3][4] The MTA architecture has seen continued development in successive products, such as the Cray Urika-GD, originally introduced in 2012 (as the YarcData uRiKA) and targeted at data-mining applications.[5]
Barrel processors are also found in embedded systems, where they are particularly useful for their deterministic real-time thread performance.
An example is the XMOS XCore XS1 (2007), a four-stage barrel processor with eight threads per core. (Newer processors from XMOS also have the same type of architecture.) The XS1 is found in Ethernet, USB, audio, and control devices, and other applications where I/O performance is critical. When the XS1 is programmed in the 'XC' language, software controlled direct memory access may be implemented.
Barrel processors have also been used in specialized devices such as the eight-thread Ubicom IP3023 network I/O processor (2004). Some 8-bit microcontrollers by Padauk Technology feature barrel processors with up to 8 threads per core.
Comparison with single-threaded processors
Advantages
A single-tasking processor spends a lot of time idle, not doing anything useful whenever a cache miss or pipeline stall occurs. Advantages to employing barrel processors over single-tasking processors include:
- The ability to do useful work on the other threads while the stalled thread is waiting.
- Designing an n-way barrel processor with an n-deep pipeline is much simpler than designing a single-tasking processor because a barrel processor never has a pipeline stall and doesn't need feed-forward circuits.
- For real-time applications, a barrel processor can guarantee that a "real-time" thread can execute with precise timing, no matter what happens to the other threads, even if some other thread locks up in an infinite loop or is continuously interrupted by hardware interrupts.
Disadvantages
There are a few disadvantages to barrel processors.
- The state of each thread must be kept on-chip, typically in registers, to avoid costly off-chip context switches. This requires a large number of registers compared to typical processors.
- Either all threads must share the same cache, which slows overall system performance, or there must be one unit of cache for each execution thread, which can significantly increase the transistor count and thus the cost of such a CPU. However, in hard real-time embedded systems where barrel processors are often found, memory access costs are typically calculated assuming worst-case cache behavior, so this is a minor concern.[citation needed] Some barrel processors such as the XMOS XS1 do not have a cache at all.
See also
- Super-threading
- Computer multitasking
- Simultaneous multithreading (SMT)
- Hyper-threading
- Vector processor
- Cray XMT
References
- "Cray's YarcData division launches new big data graph appliance" (Press release). Seattle, WA and Santa Clara, CA: Cray Inc. February 29, 2012. Archived from the original on 2017-03-18. Retrieved 2017-08-24.
External links
- Soft peripherals Embedded.com article examines Ubicom's IP3023 processor
- An Evaluation of the Design of the Gamma 60
- Histoire et architecture du Gamma 60 (French and English)
https://en.wikipedia.org/wiki/Barrel_processor
The address generation unit (AGU), sometimes also called address computation unit (ACU),[1] is an execution unit inside central processing units (CPUs) that calculates addresses used by the CPU to access main memory. By having address calculations handled by separate circuitry that operates in parallel with the rest of the CPU, the number of CPU cycles required for executing various machine instructions can be reduced, bringing performance improvements.[2][3]
While performing various operations, CPUs need to calculate memory addresses required for fetching data from the memory; for example, in-memory positions of array elements must be calculated before the CPU can fetch the data from actual memory locations. Those address-generation calculations involve different integer arithmetic operations, such as addition, subtraction, modulo operations, or bit shifts. Often, calculating a memory address involves more than one general-purpose machine instruction, which do not necessarily decode and execute quickly. By incorporating an AGU into a CPU design, together with introducing specialized instructions that use the AGU, various address-generation calculations can be offloaded from the rest of the CPU, and can often be executed quickly in a single CPU cycle.[2][3]
Capabilities of an AGU depend on a particular CPU and its architecture. Thus, some AGUs implement and expose more address-calculation operations, while some also include more advanced specialized instructions that can operate on multiple operands at a time.[2][3] Furthermore, some CPU architectures include multiple AGUs so more than one address-calculation operation can be executed simultaneously, bringing further performance improvements by capitalizing on the superscalar nature of advanced CPU designs. For example, Intel incorporates multiple AGUs into its Sandy Bridge and Haswell microarchitectures, which increase bandwidth of the CPU memory subsystem by allowing multiple memory-access instructions to be executed in parallel.[4][5][6]
See also
- Arithmetic logic unit (ALU) – a digital circuit that performs arithmetic and bitwise logical operations on integer binary numbers
- Floating-point unit (FPU) – the same as ALU but for floating-point numbers
- Load–store unit
- Bulldozer (microarchitecture) – another CPU microarchitecture that includes multiple AGUs, developed by AMD
- Register renaming – a technique that reuses CPU registers and avoids unnecessary serialization of program operations
- Reservation station – a CPU feature that allows results of various operations to be used while bypassing CPU registers
- Execution unit
References
- Per Hammarlund (August 2013). "Fourth-Generation Intel Core Processor, codenamed Haswell" (PDF). hotchips.org. p. 25. Retrieved December 8, 2014.
External links
- Address generation unit in the Motorola DSP56K family, June 2003, Motorola
- Address generation unit in DSP applications, September 2013, by Andreas Ehliar
- Computer Science from the Bottom Up, Chapter 3. Computer Architecture, September 2013, by Ian Wienand
https://en.wikipedia.org/wiki/Address_generation_unit
In computing, autonomous peripheral operation is a hardware feature found in some microcontroller architectures to off-load certain tasks into embedded autonomous peripherals in order to minimize latencies and improve throughput in hard real-time applications as well as to save energy in ultra-low-power designs.
https://en.wikipedia.org/wiki/Autonomous_peripheral_operation
In computer architecture, frequency scaling (also known as frequency ramping) is the technique of increasing a processor's frequency so as to enhance the performance of the system containing the processor in question. Frequency ramping was the dominant force in commodity processor performance increases from the mid-1980s until roughly the end of 2004.
The effect of processor frequency on computer speed can be seen by looking at the equation for computer program runtime:
where instructions per program is the total instructions being executed in a given program, cycles per instruction is a program-dependent, architecture-dependent average value, and time per cycle is by definition the inverse of processor frequency.[1] An increase in frequency thus decreases runtime.
However, power consumption in a chip is given by the equation
where P is power consumption, C is the capacitance being switched per clock cycle, V is voltage, and F is the processor frequency (cycles per second).[2] Increases in frequency thus increase the amount of power used in a processor. Increasing processor power consumption led ultimately to Intel's May 2004 cancellation of its Tejas and Jayhawk processors, which is generally cited as the end of frequency scaling as the dominant computer architecture paradigm.[3]
Moore's Law was[4] still in effect when frequency scaling ended. Despite power issues, transistor densities were still doubling every 18 to 24 months. With the end of frequency scaling, new transistors (which are no longer needed to facilitate frequency scaling) are used to add extra hardware, such as additional cores, to facilitate parallel computing - a technique that is being referred to as parallel scaling.
The end of frequency scaling as the dominant cause of processor performance gains has caused an industry-wide shift to parallel computing in the form of multicore processors.
See also
References
- "Moore's law really is dead this time". 11 February 2016.
https://en.wikipedia.org/wiki/Frequency_scaling
n a computer instruction set architecture (ISA), an execute instruction is a machine language instruction which treats data as a machine instruction and executes it.
It can be considered a fourth mode of instruction sequencing after ordinary sequential execution, branching, and interrupting.[1] Since it is an instruction that operates on other instructions like the repeat instruction, it has also been classified as a meta-instruction.[2]
Computer models
Many computer families introduced in the 1950s and 1960s include execute instructions: the IBM 709[1] and IBM 7090 (op code mnemonic: XEC),[3] the IBM 7030 Stretch (EX, EXIC),[4][1] the PDP-1/-4/-9/-15 (XCT),[5][6] the UNIVAC 1100/2200 (EXRI),[7] the CDC 924 (XEC),[8] the PDP-6/-10 (XCT), the IBM System/360 (EX),[9] the GE-600/Honeywell 6000 (XEC, XED),[10] and the SDS-9xx (EXU).[11][12]
Fewer 1970s designs include execute instructions: the Nuclear Data 812 minicomputer (1971) (XCT),[13] the HP 3000 (1972) (XEQ),[14] and the Texas Instruments TI-990 (1975)[15] and its microprocessor version, the TMS9900 (1976) (X).[16] An execute instruction was proposed for the PDP-11 in 1970,[17] but never implemented for it[18] or its successor, the VAX.[19]
Modern instruction sets do not include execute instructions because they interfere with pipelining, prefetching, and other optimizations.[citation needed]
Semantics
The instruction to be executed, the target instruction, may be in a register or fetched from memory. Some architectures allow the target instruction to itself be an execute instruction; others do not.
The target instruction is executed as if it were in the memory location of the execute instruction. If, for example, it is a subroutine call instruction, execution is transferred to the subroutine, with the return location being the location after the execute instruction. However, some architectures implement variants of the execute instruction which inhibit branches.[1]
The System/360 supports variable-length target instructions. It also supports modifying the target instruction before executing it. The target instruction must start on an even-numbered byte.[9]
The GE-600 series supports execution of two-instruction sequences, which must be doubleword-aligned.[10]
Some architectures support an execute instruction which operates in a different protection and address relocation mode. For example, the ITS PDP-10 paging device supports a privileged-mode XCTR 'execute relocated' instruction which allows memory reads, writes, or both to use the user-mode page mappings.[20] Similarly, the KL10 variant of the PDP-10 supports the privileged instruction PXCT 'previous context XCT'.[21]
The execute instruction can cause several problems when one execute instruction points to another one and so on:
- the processor may be uninterruptable for multiple clock cycles if the execute instruction cannot be interrupted in the middle of execution;
- similarly, the processor may go into an infinite loop if the series of execute instructions is circular and uninterruptable;
- if the execute instructions are on different swap pages, all of the pages need to be swapped in for the instruction to complete, which can cause thrashing.
Similar issues arise with multilevel indirect addressing modes.
Applications
The execute instruction has several applications:[1]
- Functioning as a single-instruction subroutine without the usual overhead of subroutine calls; that instruction may call a full subroutine if necessary.[1]
- Late binding
- Implementation of call by name and other thunks.[1]
- A table of execute targets may be used for dynamic dispatch of the methods or virtual functions of an object or class, especially when the method or function may often be implementable as a single instruction.[18]
- An execute target may contain a hook for adding functionality or for debugging; it is normally initialized as a NOP which may be overridden dynamically.
- An execute target may change between a fast version of an operation and a fully traced version.[22][23][24]
- Tracing, monitoring, and emulation
- This may maintain a pseudo-program counter, leaving the normal program counter unchanged.[1]
- Executing dynamically generated code, especially when memory protection prevents executable code from being writable.
- Emulating self-modifying code, especially when it must be reentrant or read-only.[17]
- In the IBM System/360, the execute instruction can modify bits 8-15 of the target instruction, effectively turning an instruction with a fixed argument (e.g., a length field) into an instruction with a variable argument.
- Privileged-mode execute instructions as on the KL10 are used by operating system kernels to execute operations such as block copies within the virtual space of user processes.
Notes
- Moon, David A. (April 1974). Maclisp Reference Manual (PDF). Revision 0. p. 181.
https://en.wikipedia.org/wiki/Execute_instruction
A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory, described by a single entry in the page table. It is the smallest unit of data for memory management in a virtual memory operating system. Similarly, a page frame is the smallest fixed-length contiguous block of physical memory into which memory pages are mapped by the operating system.[1][2][3]
A transfer of pages between main memory and an auxiliary store, such as a hard disk drive, is referred to as paging or swapping.[4]
Explanation
Computer memory is divided into pages so that information can be found more quickly.
The concept is named by analogy to the pages of a printed book. If a reader wanted to find, for example, the 5,000th word in the book, they could count from the first word. This would be time-consuming. It would be much faster if the reader had a listing of how many words are on each page. From this listing they could determine which page the 5,000th word appears on, and how many words to count on that page. This listing of the words per page of the book is analogous to a page table of a computer file system.[5]
Page size
Page size trade-off
This section needs additional citations for verification. (February 2015) |
Page size is usually determined by the processor architecture. Traditionally, pages in a system had uniform size, such as 4,096 bytes. However, processor designs often allow two or more, sometimes simultaneous, page sizes due to its benefits. There are several points that can factor into choosing the best page size.[6]
Page table size
A system with a smaller page size uses more pages, requiring a page table that occupies more space. For example, if a 232 virtual address space is mapped to 4 KiB (212 bytes) pages, the number of virtual pages is 220 = (232 / 212). However, if the page size is increased to 32 KiB (215 bytes), only 217 pages are required. A multi-level paging algorithm can decrease the memory cost of allocating a large page table for each process by further dividing the page table up into smaller tables, effectively paging the page table.
TLB usage
Since every access to memory must be mapped from virtual to physical address, reading the page table every time can be quite costly. Therefore, a very fast kind of cache, the translation lookaside buffer (TLB), is often used. The TLB is of limited size, and when it cannot satisfy a given request (a TLB miss) the page tables must be searched manually (either in hardware or software, depending on the architecture) for the correct mapping. Larger page sizes mean that a TLB cache of the same size can keep track of larger amounts of memory, which avoids the costly TLB misses.
Internal fragmentation
Rarely do processes require the use of an exact number of pages. As a result, the last page will likely only be partially full, wasting some amount of memory. Larger page sizes lead to a large amount of wasted memory, as more potentially unused portions of memory are loaded into the main memory. Smaller page sizes ensure a closer match to the actual amount of memory required in an allocation.
As an example, assume the page size is 1024 KiB. If a process allocates 1025 KiB, two pages must be used, resulting in 1023 KiB of unused space (where one page fully consumes 1024 KiB and the other only 1 KiB).
Disk access
When transferring from a rotational disk, much of the delay is caused by seek time, the time it takes to correctly position the read/write heads above the disk platters. Because of this, large sequential transfers are more efficient than several smaller transfers. Transferring the same amount of data from disk to memory often requires less time with larger pages than with smaller pages.
Getting page size programmatically
Most operating systems allow programs to discover the page size at runtime. This allows programs to use memory more efficiently by aligning allocations to this size and reducing overall internal fragmentation of pages.
Unix and POSIX-based operating systems
Unix and POSIX-based systems may use the system function sysconf()
,[7][8][9][10][11] as illustrated in the following example written in the C programming language.
#include <stdio.h>
#include <unistd.h> /* sysconf(3) */
int main(void)
{
printf("The page size for this system is %ld bytes.\n",
sysconf(_SC_PAGESIZE)); /* _SC_PAGE_SIZE is OK too. */
return 0;
}
In many Unix systems, the command-line utility getconf
can be used.[12][13][14]
For example, getconf PAGESIZE
will return the page size in bytes.
Windows-based operating systems
Win32-based operating systems, such as those in the Windows 9x and Windows NT families, may use the system function GetSystemInfo()
[15][16] from kernel32.dll
.
#include <stdio.h>
#include <windows.h>
int main(void)
{
SYSTEM_INFO si;
GetSystemInfo(&si);
printf("The page size for this system is %u bytes.\n", si.dwPageSize);
return 0;
}
Multiple page sizes
Some instruction set architectures can support multiple page sizes, including pages significantly larger than the standard page size. The available page sizes depend on the instruction set architecture, processor type, and operating (addressing) mode. The operating system selects one or more sizes from the sizes supported by the architecture. Note that not all processors implement all defined larger page sizes. This support for larger pages (known as "huge pages" in Linux, "superpages" in FreeBSD, and "large pages" in Microsoft Windows and IBM AIX terminology) allows for "the best of both worlds", reducing the pressure on the TLB cache (sometimes increasing speed by as much as 15%) for large allocations while still keeping memory usage at a reasonable level for small allocations.
Architecture | Smallest page size | Larger page sizes |
---|---|---|
32-bit x86[18] | 4 KiB | 4 MiB in PSE mode, 2 MiB in PAE mode[19] |
x86-64[18] | 4 KiB | 2 MiB, 1 GiB (only when the CPU has PDPE1GB flag)
|
IA-64 (Itanium)[20] | 4 KiB | 8 KiB, 64 KiB, 256 KiB, 1 MiB, 4 MiB, 16 MiB, 256 MiB[19] |
Power ISA[21] | 4 KiB | 64 KiB, 16 MiB, 16 GiB |
SPARC v8 with SPARC Reference MMU[22] | 4 KiB | 256 KiB, 16 MiB |
UltraSPARC Architecture 2007[23] | 8 KiB | 64 KiB, 512 KiB (optional), 4 MiB, 32 MiB (optional), 256 MiB (optional), 2 GiB (optional), 16 GiB (optional) |
ARMv7[24] | 4 KiB | 64 KiB, 1 MiB ("section"), 16 MiB ("supersection") (defined by a particular implementation) |
AArch64[25] | 4 KiB, 16 KiB, 64 KiB | 2 MiB, 32 MiB, 512 MiB, 1 GiB |
RISCV32[26] | 4 KiB | 4 MiB ("megapage") |
RISCV64[26] | 4 KiB | 2 MiB ("megapage"), 1 GiB ("gigapage"), 512 GiB ("terapage", only for CPUs with 43-bit address space or more), 256 TiB ("petapage", only for CPUs with 57-bit address space or more), |
Starting with the Pentium Pro, and the AMD Athlon, x86 processors support 4 MiB pages (called Page Size Extension) (2 MiB pages if using PAE) in addition to their standard 4 KiB pages; newer x86-64 processors, such as AMD's newer AMD64 processors and Intel's Westmere[27] and later Xeon processors can use 1 GiB pages in long mode. IA-64 supports as many as eight different page sizes, from 4 KiB up to 256 MiB, and some other architectures have similar features.[specify]
Larger pages, despite being available in the processors used in most contemporary personal computers, are not in common use except in large-scale applications, the applications typically found in large servers and in computational clusters, and in the operating system itself. Commonly, their use requires elevated privileges, cooperation from the application making the large allocation (usually setting a flag to ask the operating system for huge pages), or manual administrator configuration; operating systems commonly, sometimes by design, cannot page them out to disk.
However, SGI IRIX has general-purpose support for multiple page sizes. Each individual process can provide hints and the operating system will automatically use the largest page size possible for a given region of address space.[28] Later work proposed transparent operating system support for using a mix of page sizes for unmodified applications through preemptible reservations, opportunistic promotions, speculative demotions, and fragmentation control.[29]
Linux has supported huge pages on several architectures since the 2.6 series via the hugetlbfs
filesystem[30] and without hugetlbfs
since 2.6.38.[31] Windows Server 2003 (SP1 and newer), Windows Vista and Windows Server 2008 support huge pages under the name of large pages.[32] Windows 2000 and Windows XP support large pages internally, but do not expose them to applications.[33] Beginning with version 9, Solaris supports large pages on SPARC and x86.[34][35]
FreeBSD 7.2-RELEASE features superpages.[36]
Note that until recently in Linux, applications needed to be modified
in order to use huge pages. The 2.6.38 kernel introduced support for
transparent use of huge pages.[31] On Linux kernels supporting transparent huge pages, as well as FreeBSD and Solaris, applications take advantage of huge pages automatically, without the need for modification.[36]
See also
- Page fault
- Page table
- Memory paging
- Virtual memory
- Zero page - a (often 256-bytes large[37][38]) memory area at the very start of a processor's address room
- Zero page (CP/M) - a 256-byte[38] data structure at the start of a program
References
[…] ROM is further divided into pages, each of which contains 256 bytes. Thus locations 0 through 255 comprise page 0 of ROM, location 256 through 511 comprise page 1 and so on. […] Program random access memory (RAM) is organized exactly like ROM. […]
- "1. Introduction: Segment Alignment". 8086 Family Utilities - User's Guide for 8080/8085-Based Development Systems (PDF). Revision E (A620/5821 6K DD ed.). Santa Clara, California, USA: Intel Corporation. May 1982 [1980, 1978]. p. 1–6. Order Number: 9800639-04. Archived (PDF) from the original on 2020-02-29. Retrieved 2020-02-29.
Further reading
- Dandamudi, Sivarama P. (2003). Fundamentals of Computer Organization and Design (1st ed.). Springer. pp. 740–741. ISBN 0-387-95211-X.
https://en.wikipedia.org/wiki/Page_(computer_memory)
This article includes a list of general references, but it lacks sufficient corresponding inline citations. (February 2015) |
In computer science, hierarchical protection domains,[1][2] often called protection rings, are mechanisms to protect data and functionality from faults (by improving fault tolerance) and malicious behavior (by providing computer security).
Computer operating systems provide different levels of access to resources. A protection ring is one of two or more hierarchical levels or layers of privilege within the architecture of a computer system. This is generally hardware-enforced by some CPU architectures that provide different CPU modes at the hardware or microcode level. Rings are arranged in a hierarchy from most privileged (most trusted, usually numbered zero) to least privileged (least trusted, usually with the highest ring number). Ring 0 is the level with the most privileges and allows direct interaction with the physical hardware such as certain CPU functionality and chips on the motherboard.
Special call gates between rings are provided to allow an outer ring to access an inner ring's resources in a predefined manner, as opposed to allowing arbitrary usage. Correctly gating access between rings can improve security by preventing programs from one ring or privilege level from misusing resources intended for programs in another. For example, spyware running as a user program in Ring 3 should be prevented from turning on a web camera without informing the user, since hardware access should be a Ring 1 function reserved for device drivers. Programs such as web browsers running in higher numbered rings must request access to the network, a resource restricted to a lower numbered ring.
Implementations
Multiple rings of protection were among the most revolutionary concepts introduced by the Multics operating system, a highly secure predecessor of today's Unix family of operating systems. The GE 645 mainframe computer did have some hardware access control, but that was not sufficient to provide full support for rings in hardware, so Multics supported them by trapping ring transitions in software;[3] its successor, the Honeywell 6180, implemented them in hardware, with support for eight rings.[4] However, most general-purpose systems use only two rings, even if the hardware they run on provides more CPU modes than that. For example, Windows 7 and Windows Server 2008 (and their predecessors) use only two rings, with ring 0 corresponding to kernel mode and ring 3 to user mode,[5] because earlier versions of Windows ran on processors that supported only two protection levels.[6]
Many modern CPU architectures (including the popular Intel x86 architecture) include some form of ring protection, although the Windows NT operating system, like Unix, does not fully utilize this feature. OS/2 does to some extent, using three rings:[7] ring 0 for kernel code and device drivers, ring 2 for privileged code (user programs with I/O access permissions), and ring 3 for unprivileged code (nearly all user programs). Under DOS, the kernel, drivers and applications typically run on ring 3 (however, this is exclusive to the case where protected-mode drivers and/or DOS extenders are used; as a real-mode OS, the system runs with effectively no protection), whereas 386 memory managers such as EMM386 run at ring 0. In addition to this, DR-DOS' EMM386 3.xx can optionally run some modules (such as DPMS) on ring 1 instead. OpenVMS uses four modes called (in order of decreasing privileges) Kernel, Executive, Supervisor and User.
A renewed interest in this design structure came with the proliferation of the Xen VMM software, ongoing discussion on monolithic vs. micro-kernels (particularly in Usenet newsgroups and Web forums), Microsoft's Ring-1 design structure as part of their NGSCB initiative, and hypervisors based on x86 virtualization such as Intel VT-x (formerly Vanderpool).
The original Multics system had eight rings, but many modern systems have fewer. The hardware remains aware of the current ring of the executing instruction thread at all times, with the help of a special machine register. In some systems, areas of virtual memory are instead assigned ring numbers in hardware. One example is the Data General Eclipse MV/8000, in which the top three bits of the program counter (PC) served as the ring register. Thus code executing with the virtual PC set to 0xE200000, for example, would automatically be in ring 7, and calling a subroutine in a different section of memory would automatically cause a ring transfer.
The hardware severely restricts the ways in which control can be passed from one ring to another, and also enforces restrictions on the types of memory access that can be performed across rings. Using x86 as an example, there is a special[clarification needed] gate structure which is referenced by the call instruction that transfers control in a secure way[clarification needed] towards predefined entry points in lower-level (more trusted) rings; this functions as a supervisor call in many operating systems that use the ring architecture. The hardware restrictions are designed to limit opportunities for accidental or malicious breaches of security. In addition, the most privileged ring may be given special capabilities, (such as real memory addressing that bypasses the virtual memory hardware).
ARM version 7 architecture implements three privilege levels: application (PL0), operating system (PL1), and hypervisor (PL2). Unusually, level 0 (PL0) is the least-privileged level, while level 2 is the most-privileged level.[8] ARM version 8 implements four exception levels: application (EL0), operating system (EL1), hypervisor (EL2), and secure monitor / firmware (EL3), for AArch64[9]: D1-2454 and AArch32.[9]: G1-6013
Ring protection can be combined with processor modes (master/kernel/privileged/supervisor mode versus slave/unprivileged/user mode) in some systems. Operating systems running on hardware supporting both may use both forms of protection or only one.
Effective use of ring architecture requires close cooperation between hardware and the operating system[why?]. Operating systems designed to work on multiple hardware platforms may make only limited use of rings if they are not present on every supported platform. Often the security model is simplified to "kernel" and "user" even if hardware provides finer granularity through rings.
Modes
Supervisor mode
In computer terms, supervisor mode is a hardware-mediated flag that can be changed by code running in system-level software. System-level tasks or threads may[a] have this flag set while they are running, whereas user-level applications will not. This flag determines whether it would be possible to execute machine code operations such as modifying registers for various descriptor tables, or performing operations such as disabling interrupts. The idea of having two different modes to operate in comes from "with more power comes more responsibility" – a program in supervisor mode is trusted never to fail, since a failure may cause the whole computer system to crash.
Supervisor mode is "an execution mode on some processors which enables execution of all instructions, including privileged instructions. It may also give access to a different address space, to memory management hardware and to other peripherals. This is the mode in which the operating system usually runs."[10]
In a monolithic kernel, the operating system runs in supervisor mode and the applications run in user mode. Other types of operating systems, like those with an exokernel or microkernel, do not necessarily share this behavior.
Some examples from the PC world:
- Linux, macOS and Windows are three operating systems that use supervisor/user mode. To perform specialized functions, user mode code must perform a system call into supervisor mode or even to the kernel space where trusted code of the operating system will perform the needed task and return the execution back to the userspace. Additional code can be added into kernel space through the use of loadable kernel modules, but only by a user with the requisite permissions, as this code is not subject to the access control and safety limitations of user mode.
- DOS (for as long as no 386 memory manager such as EMM386 is loaded), as well as other simple operating systems and many embedded devices run in supervisor mode permanently, meaning that drivers can be written directly as user programs.
Most processors have at least two different modes. The x86-processors have four different modes divided into four different rings. Programs that run in Ring 0 can do anything with the system, and code that runs in Ring 3 should be able to fail at any time without impact to the rest of the computer system. Ring 1 and Ring 2 are rarely used, but could be configured with different levels of access.
In most existing systems, switching from user mode to kernel mode
has an associated high cost in performance. It has been measured, on
the basic request getpid
,
to cost 1000–1500 cycles on most machines. Of these just around 100 are
for the actual switch (70 from user to kernel space, and 40 back), the
rest is "kernel overhead".[11][12] In the L3 microkernel, the minimization of this overhead reduced the overall cost to around 150 cycles.[11]
Maurice Wilkes wrote:[13]
... it eventually became clear that the hierarchical protection that rings provided did not closely match the requirements of the system programmer and gave little or no improvement on the simple system of having two modes only. Rings of protection lent themselves to efficient implementation in hardware, but there was little else to be said for them. [...] The attractiveness of fine-grained protection remained, even after it was seen that rings of protection did not provide the answer... This again proved a blind alley...
To gain performance and determinism, some systems place functions that would likely be viewed as application logic, rather than as device drivers, in kernel mode; security applications (access control, firewalls, etc.) and operating system monitors are cited as examples. At least one embedded database management system, eXtremeDB Kernel Mode, has been developed specifically for kernel mode deployment, to provide a local database for kernel-based application functions, and to eliminate the context switches that would otherwise occur when kernel functions interact with a database system running in user mode.[14]
Functions are also sometimes moved across rings in the other direction. The Linux kernel, for instance, injects into processes a vDSO section which contains functions that would normally require a system call, i.e. a ring transition. Instead of doing a syscall these functions use static data provided by the kernel. This avoids the need for a ring transition and so is more lightweight than a syscall. The function gettimeofday can be provided this way.
Hypervisor mode
Recent CPUs from Intel and AMD offer x86 virtualization instructions for a hypervisor to control Ring 0 hardware access. Although they are mutually incompatible, both Intel VT-x (codenamed "Vanderpool") and AMD-V (codenamed "Pacifica") create a new "Ring −1" so that a guest operating system can run Ring 0 operations natively without affecting other guests or the host OS.
To assist virtualization, VT-x and SVM insert a new privilege level beneath Ring 0. Both add nine new machine code instructions that only work at "Ring −1", intended to be used by the hypervisor.[15]
Privilege level
A privilege level in the x86 instruction set controls the access of the program currently running on the processor to resources such as memory regions, I/O ports, and special instructions. There are 4 privilege levels ranging from 0 which is the most privileged, to 3 which is least privileged. Most modern operating systems use level 0 for the kernel/executive, and use level 3 for application programs. Any resource available to level n is also available to levels 0 to n, so the privilege levels are rings. When a lesser privileged process tries to access a higher privileged process, a general protection fault exception is reported to the OS.
It is not necessary to use all four privilege levels. Current operating systems with wide market share including Microsoft Windows, macOS, Linux, iOS and Android mostly use a paging mechanism with only one bit to specify the privilege level as either Supervisor or User (U/S Bit). Windows NT uses the two-level system.[16] The real mode programs in 8086 are executed at level 0 (highest privilege level) whereas virtual mode in 8086 executes all programs at level 3.[17]
Potential future uses for the multiple privilege levels supported by the x86 ISA family include containerization and virtual machines. A host operating system kernel could use instructions with full privilege access (kernel mode), whereas applications running on the guest OS in a virtual machine or container could use the lowest level of privileges in user mode. The virtual machine and guest OS kernel could themselves use an intermediate level of instruction privilege to invoke and virtualize kernel-mode operations such as system calls from the point of view of the guest operating system.[18]
IOPL
The IOPL (I/O Privilege level) flag is a flag found on all IA-32 compatible x86 CPUs. It occupies bits 12 and 13 in the FLAGS register. In protected mode and long mode, it shows the I/O privilege level of the current program or task. The Current Privilege Level (CPL) (CPL0, CPL1, CPL2, CPL3) of the task or program must be less than or equal to the IOPL in order for the task or program to access I/O ports.
The IOPL can be changed using POPF(D)
and IRET(D)
only when the current privilege level is Ring 0.
Besides IOPL, the I/O Port Permissions in the TSS also take part in determining the ability of a task to access an I/O port.
Misc
In x86 systems, the x86 hardware virtualization (VT-x and SVM) is referred as "ring −1", the System Management Mode is referred as "ring −2", the Intel Management Engine and AMD Platform Security Processor are sometimes referred as "ring −3".[19]
Use of hardware features
Many CPU hardware architectures provide far more flexibility than is exploited by the operating systems that they normally run. Proper use of complex CPU modes requires very close cooperation between the operating system and the CPU, and thus tends to tie the OS to the CPU architecture. When the OS and the CPU are specifically designed for each other, this is not a problem (although some hardware features may still be left unexploited), but when the OS is designed to be compatible with multiple, different CPU architectures, a large part of the CPU mode features may be ignored by the OS. For example, the reason Windows uses only two levels (ring 0 and ring 3) is that some hardware architectures that were supported in the past (such as PowerPC or MIPS) implemented only two privilege levels.[5]
Multics was an operating system designed specifically for a special CPU architecture (which in turn was designed specifically for Multics), and it took full advantage of the CPU modes available to it. However, it was an exception to the rule. Today, this high degree of interoperation between the OS and the hardware is not often cost-effective, despite the potential advantages for security and stability.
Ultimately, the purpose of distinct operating modes for the CPU is to provide hardware protection against accidental or deliberate corruption of the system environment (and corresponding breaches of system security) by software. Only "trusted" portions of system software are allowed to execute in the unrestricted environment of kernel mode, and then, in paradigmatic designs, only when absolutely necessary. All other software executes in one or more user modes. If a processor generates a fault or exception condition in a user mode, in most cases system stability is unaffected; if a processor generates a fault or exception condition in kernel mode, most operating systems will halt the system with an unrecoverable error. When a hierarchy of modes exists (ring-based security), faults and exceptions at one privilege level may destabilize only the higher-numbered privilege levels. Thus, a fault in Ring 0 (the kernel mode with the highest privilege) will crash the entire system, but a fault in Ring 2 will only affect Rings 3 and beyond and Ring 2 itself, at most.
Transitions between modes are at the discretion of the executing thread when the transition is from a level of high privilege to one of low privilege (as from kernel to user modes), but transitions from lower to higher levels of privilege can take place only through secure, hardware-controlled "gates" that are traversed by executing special instructions or when external interrupts are received.
Microkernel operating systems attempt to minimize the amount of code running in privileged mode, for purposes of security and elegance, but ultimately sacrificing performance.
See also
- Call gate (Intel)
- Memory segmentation
- Protected mode – available on x86-compatible 80286 CPUs and newer
- IOPL (CONFIG.SYS directive) – an OS/2 directive to run DLL code at ring 2 instead of at ring 3
- Segment descriptor
- Supervisor Call instruction
- System Management Mode (SMM)
- Principle of least privilege
Notes
References
The reason Windows uses only two levels is that some hardware architectures that were supported in the past (such as Compaq Alpha and Silicon Graphics MIPS) implemented only two privilege levels.
- Gelas, Johan De. "Hardware Virtualization: the Nuts and Bolts". www.anandtech.com. Retrieved 13 March 2021.
- Intel 80386 Programmer's Reference
Further reading
- David T. Rogers (June 2003). A framework for dynamic subversion (PDF) (MSc). Naval Postgraduate School. hdl:10945/919.
- William J. Caelli (2002). "Relearning "Trusted Systems" in an Age of NIIP: Lessons from the Past for the Future". Archived from the original (PDF) on 20 April 2015.
- Haruna R. Isa; William R. Shockley; Cynthia E. Irvine (May 1999). "A Multi-threading Architecture for Multilevel Secure Transaction Processing" (PDF). Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland, CA. pp. 166–179. hdl:10945/7198.
- Ivan Kelly (8 May 2006). "Porting MINIX to Xen" (PDF). Archived from the original (PDF) on 27 August 2006.
- Paul Barham; Boris Dragovic; Keir Fraser; Steven Hand; Tim Harris; Alex Ho; Rolf Neugebauer; Ian Pratt; Andrew Warfield (2003). "Xen and the Art of Virtualization" (PDF).
- Marcus Peinado; Yuqun Chen; Paul England; John Manferdelli. "NGSCB: A Trusted Open System" (PDF). Archived from the original (PDF) on 4 March 2005.
- Michael D. Schroeder; Jerome H. Saltzer (1972). "A Hardware Architecture for Implementing Protection Rings".
- "Intel Architecture Software Developer's Manual Volume 3: System Programming (Order Number 243192)" (PDF). Chapter 4 "Protection"; section 4.5 "Privilege levels". Archived from the original (PDF) on 19 February 2009.
- Tzi-cker Chiueh; Ganesh Venkitachalam; Prashant Pradhan (December 1999). "Integrating segmentation and paging protection for safe, efficient and transparent software extensions". Proceedings of the seventeenth ACM symposium on Operating systems principles. Section 3: Protection hardware features in Intel X86 architecture; subsection 3.1 Protection checks. doi:10.1145/319151.319161. ISBN 1581131402. S2CID 9456119.
- Takahiro Shinagawa; Kenji Kono; Takashi Masuda (17 May 2000). "Exploiting Segmentation Mechanism for Protecting Against Malicious Mobile Code" (PDF). Chapter 3 Implementation; section 3.2.1 Ring Protection. Archived from the original (PDF) on 10 August 2017. Retrieved 2 April 2018.
- Boebert, William Earl; R. Kain (1985). A Practical Alternative to Hierarchical Integrity Policies. 8th National Computer Security Conference.
- Gorine, Andrei; Krivolapov, Alexander (May 2008). "Kernel Mode Databases: A DBMS technology for high-performance applications". Dr. Dobb's Journal.
https://en.wikipedia.org/wiki/Protection_ring
In computer engineering, register windows are a feature which dedicates registers to a subroutine by dynamically aliasing a subset of internal registers to fixed, programmer-visible registers. Register windows are implemented to improve the performance of a processor by reducing the number of stack operations required for function calls and returns. One of the most influential features of the Berkeley RISC design, they were later implemented in instruction set architectures such as AMD Am29000, Intel i960, Sun Microsystems SPARC, and Intel Itanium.
General Operation
Several sets of registers are provided for the different parts of the program. Registers are deliberately hidden from the programmer to force several subroutines to share processor resources.
Rendering the registers invisible can be implemented efficiently; the CPU recognizes the movement from one part of the program to another during a procedure call. It is accomplished by one of a small number of instructions (prologue) and ends with one of a similarly small set (epilogue). In the Berkeley design, these calls would cause a new set of registers to be "swapped in" at that point, or marked as "dead" (or "reusable") when the call ends.
Application in CPUs
In the Berkeley RISC design, only eight registers out of a total of 64 are visible to the programs. The complete set of registers are known as the register file, and any particular set of eight as a window. The file allows up to eight procedure calls to have their own register sets. As long as the program does not call down chains longer than eight calls deep, the registers never have to be spilled, i.e. saved out to main memory or cache which is a slow process compared to register access.
By comparison, the Sun Microsystems SPARC architecture provides simultaneous visibility into four sets of eight registers each. Three sets of eight registers each are "windowed". Eight registers (i0 through i7) form the input registers to the current procedure level. Eight registers (L0 through L7) are local to the current procedure level, and eight registers (o0 through o7) are the outputs from the current procedure level to the next level called. When a procedure is called, the register window shifts by sixteen registers, hiding the old input registers and old local registers and making the old output registers the new input registers. The common registers (old output registers and new input registers) are used for parameter passing. Finally, eight registers (g0 through g7) are globally visible to all procedure levels.
The AMD 29000 improved the design by allowing the windows to be of variable size, which helps utilization in the common case where fewer than eight registers are needed for a call. It also separated the registers into a global set of 64, and an additional 128 for the windows. Similarly, the IA-64 (Itanium) architecture used variable-sized windows, with 32 global registers and 96 for the windows.
In the Infineon C166 architecture, most registers are simply locations in internal RAM which have the additional property of being accessible as registers. Of these, the addresses of the 16 general-purpose registers (R0-R15) are not fixed. Instead, the R0 register is located at the address pointed to by the "Context Pointer" (CP) register, and the remaining 15 registers follow sequentially thereafter.[1]
Register windows also provide an easy upgrade path. Since the additional registers are invisible to the programs, additional windows can be added at any time. For instance, the use of object-oriented programming often results in a greater number of "smaller" calls, which can be accommodated by increasing the windows from eight to sixteen for instance. This was the approach used in the SPARC, which has included more register windows with newer generations of the architecture. The end result is fewer slow register window spill and fill operations because the register windows overflow less often.
Criticism
Register windows are not the only way to improve register performance. The group at Stanford University designing the MIPS saw the Berkeley work and decided that the problem was not a shortage of registers, but poor utilization of the existing ones. They instead invested more time in their compiler's register allocation, making sure it wisely used the larger set available in MIPS. This resulted in reduced complexity of the chip, with one half the total number of registers, while offering potentially higher performance in those cases where a single procedure could make use of the larger visible register space. In the end, with modern compilers, MIPS makes better use of its register space even during procedure calls.[citation needed]
References
- "Infineon C166 Family Instruction Set Manual" (PDF). Keil. Retrieved 2020-03-12.
- Frantzen, Mike; Shuey, Mike (2001). "StackGhost: Hardware Facilitated Stack Protection". Proceedings of the 10th Usenix Security Symposium. USENIX. pp. 55–66. Retrieved 27 August 2010.
- Magnusson, Peter (April 1997). "Understanding stacks and registers in the Sparc architecture(s)". Archived from the original on 24 December 2012. Retrieved 27 August 2010.
- Mueller, Frank. "setjmp/longjmp". Discussing the complex Sparc implementation due to windowing.
https://en.wikipedia.org/wiki/Register_window
Rekursiv was a computer processor designed by David M. Harland in the mid-1980s at a division of hi-fi manufacturer Linn Products. It was one of the few computer architectures intended to implement object-oriented concepts directly in hardware, a form of high-level language computer architecture. The Rekursiv operated directly on objects rather than bits, nibbles, bytes and words. Virtual memory was used as a persistent object store and unusually, the processor instruction set supported recursion (hence the name).
By the time the project had delivered its first implementation, new processors like the Sun SPARC and Intel 486 had surpassed its performance, and development was abandoned in 1988.
https://en.wikipedia.org/wiki/Rekursiv
https://en.wikipedia.org/wiki/Repeat_instruction
Scalar processors are a class of computer processors that process only one data item at a time. Typical data items include integers and floating point numbers.[1]
Classification
A scalar processor is classified as a single instruction, single data (SISD) processor in Flynn's taxonomy. The Intel 486 is an example of a scalar processor. It is to be contrasted with a vector processor where a single instruction operates simultaneously on multiple data items (and thus is referred to as a single instruction, multiple data (SIMD) processor).[2] The difference is analogous to the difference between scalar and vector arithmetic.
The term scalar in computing dates to the 1970 and 1980s when vector processors were first introduced. It was originally used to distinguish the older designs from the new vector processors.
Superscalar processor
A superscalar processor (such as the Intel P5) may execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor. Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit, a bit shifter, or a multiplier.[1] The Cortex-M7, like many consumer CPUs today, is a superscalar processor.[3]
Scalar data type
A scalar data type, or just scalar, is any non-composite value.
Generally, all basic primitive data types are considered scalar:
- The boolean data type (
bool
) - Numeric types (
int
, the floating point typesfloat
anddouble
) - Character types (
char
andstring
)
A variable (that is, a named location containing a scalar data type) is sometimes referred to as a "scalar".
See also
References
- "Cortex-M7". Arm Developer. Arm Limited. Retrieved 2021-07-03.
https://en.wikipedia.org/wiki/Scalar_processor
In integrated circuit design, static core generally refers to a microprocessor (MPU) entirely implemented in static logic.[1] A static core MPU may be halted by stopping the system clock oscillator that is driving it, maintaining its state and resume processing at the point where it was stopped when the clock signal is restarted, as long as power continues to be applied. Static core MPUs are fabricated in the CMOS process and hence consume very little power when the clock is stopped, making them useful in designs in which the MPU remains in standby mode until needed and minimal loading of the power source (often a battery) is desirable during standby.[2]
In comparison, typical microprocessor designs, those without a static core, only refresh and present valid outputs on their pins during specific periods of the clock cycle. If the clock is slowed, or stopped, the charge on the pin leaks out of the capacitors over time, quickly moving to the default state and no longer being valid. These designs have to run within a set range of clock frequencies to avoid this problem.
Static core microprocessors include the RCA 1802, Intel 80386EX, WDC W65C02S, WDC W65C816S and Freescale 683XX family.
Many low-power electronics systems are designed as fully static systems—such as, for example, the Psion Organiser, the TRS-80 Model 100, and the Galileo spacecraft. In such a fully static system, the processor has a static core and data is stored in static RAM, rather than dynamic RAM. Such design features allow the entire system to be "paused" indefinitely in a low power state, and then instantly resumed when needed.
References
- "Static core | Semantic Scholar". www.semanticscholar.org. Retrieved 2022-08-07.
See also
This article needs additional citations for verification. (August 2022) |
https://en.wikipedia.org/wiki/Static_core
Category:Clock signal
Pages in category "Clock signal"
The following 27 pages are in this category, out of 27 total. This list may not reflect recent changes.
C
H
https://en.wikipedia.org/wiki/Category:Clock_signal
In computer processors the parity flag indicates if the numbers of set bits is odd or even in the binary representation of the result of the last operation. It is normally a single bit in a processor status register.
For example, assume a machine where a set parity flag indicates even parity. If the result of the last operation were 26 (11010 in binary), the parity flag would be 0 since the number of set bits is odd. Similarly, if the result were 10 (1010 in binary) then the parity flag would be 1.
x86 processors
In x86 processors, the parity flag reflects the parity only of the least significant byte of the result, and is set if the number of set bits of ones is even (put another way, the parity bit is set if the sum of the bits is even). According to 80386 Intel manual, the parity flag is changed in the x86 processor family by the following instructions:
- All arithmetic instructions;
In conditional jumps, parity flag is used, where e.g. the JP instruction jumps to the given target when the parity flag is set and the JNP instruction jumps if it is not set. The flag may be also read directly with instructions such as PUSHF, which pushes the flags register on the stack.
One common reason to test the parity flag is to check an unrelated x87-FPU flag. The FPU has four condition flags (C0 to C3), but they can not be tested directly, and must instead be first copied to the flags register. When this happens, C0 is placed in the carry flag, C2 in the parity flag and C3 in the zero flag.[1] The C2 flag is set when e.g. incomparable floating point values (NaN or unsupported format) are compared with the FUCOM instructions.
References
- "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture". January 2011. pp. 97–98.
See also
https://en.wikipedia.org/wiki/Parity_flag
In computing, overclocking is the practice of increasing the clock rate of a computer to exceed that certified by the manufacturer. Commonly, operating voltage is also increased to maintain a component's operational stability at accelerated speeds. Semiconductor devices operated at higher frequencies and voltages increase power consumption and heat.[1] An overclocked device may be unreliable or fail completely if the additional heat load is not removed or power delivery components cannot meet increased power demands. Many device warranties state that overclocking or over-specification[clarification needed] voids any warranty, but some manufacturers allow overclocking as long as it is done (relatively) safely.[citation needed]
Overview
The purpose of overclocking is to increase the operating speed of a given component. Normally, on modern systems, the target of overclocking is increasing the performance of a major chip or subsystem, such as the main processor or graphics controller, but other components, such as system memory (RAM) or system buses (generally on the motherboard), are commonly involved. The trade-offs are an increase in power consumption (heat), fan noise (cooling), and shortened lifespan for the targeted components. Most components are designed with a margin of safety to deal with operating conditions outside of a manufacturer's control; examples are ambient temperature and fluctuations in operating voltage. Overclocking techniques in general aim to trade this safety margin by setting the device to run in the higher end of the margin, with the understanding that temperature and voltage must be more strictly monitored and controlled by the user. Examples are that operating temperature would need to be more strictly controlled with increased cooling, as the part will be less tolerant of increased temperatures at the higher speeds. Also base operating voltage may be increased to compensate for unexpected voltage drops and to strengthen signalling and timing signals, as low-voltage excursions are more likely to cause malfunctions at higher operating speeds.
While most modern devices are fairly tolerant of overclocking, all devices have finite limits. Generally for any given voltage most parts will have a maximum "stable" speed where they still operate correctly. Past this speed, the device starts giving incorrect results, which can cause malfunctions and sporadic behavior in any system depending on it. While in a PC context the usual result is a system crash, more subtle errors can go undetected, which over a long enough time can give unpleasant surprises such as data corruption (incorrectly calculated results, or worse writing to storage incorrectly) or the system failing only during certain specific tasks (general usage such as internet browsing and word processing appear fine, but any application wanting advanced graphics crashes the system).
At this point, an increase in operating voltage of a part may allow more headroom for further increases in clock speed, but the increased voltage can also significantly increase heat output, as well as shorten the lifespan further. At some point, there will be a limit imposed by the ability to supply the device with sufficient power, the user's ability to cool the part, and the device's own maximum voltage tolerance before it achieves destructive failure. Overzealous use of voltage or inadequate cooling can rapidly degrade a device's performance to the point of failure, or in extreme cases outright destroy it.
The speed gained by overclocking depends largely upon the applications and workloads being run on the system, and what components are being overclocked by the user; benchmarks for different purposes are published.
Underclocking
Conversely, the primary goal of underclocking is to reduce power consumption and the resultant heat generation of a device, with the trade-offs being lower clock speeds and reductions in performance. Reducing the cooling requirements needed to keep hardware at a given operational temperature has knock-on benefits such as lowering the number and speed of fans to allow quieter operation, and in mobile devices increase the length of battery life per charge. Some manufacturers underclock components of battery-powered equipment to improve battery life, or implement systems that detect when a device is operating under battery power and reduce clock frequency.
Underclocking and undervolting would be attempted on a desktop system to have it operate silently (such as for a home entertainment center) while potentially offering higher performance than currently offered by low-voltage processor offerings. This would use a "standard-voltage" part and attempt to run with lower voltages (while attempting to keep the desktop speeds) to meet an acceptable performance/noise target for the build. This was also attractive as using a "standard voltage" processor in a "low voltage" application avoided paying the traditional price premium for an officially certified low voltage version. However again like overclocking there is no guarantee of success, and the builder's time researching given system/processor combinations and especially the time and tedium of performing many iterations of stability testing need to be considered. The usefulness of underclocking (again like overclocking) is determined by what processor offerings, prices, and availability are at the specific time of the build. Underclocking is also sometimes used when troubleshooting.
Enthusiast culture
Overclocking has become more accessible with motherboard makers offering overclocking as a marketing feature on their mainstream product lines. However, the practice is embraced more by enthusiasts than professional users, as overclocking carries a risk of reduced reliability, accuracy and damage to data and equipment. Additionally, most manufacturer warranties and service agreements do not cover overclocked components nor any incidental damages caused by their use. While overclocking can still be an option for increasing personal computing capacity, and thus workflow productivity for professional users, the importance of stability testing components thoroughly before employing them into a production environment cannot be overstated.
Overclocking offers several draws for overclocking enthusiasts. Overclocking allows testing of components at speeds not currently offered by the manufacturer, or at speeds only officially offered on specialized, higher-priced versions of the product. A general trend in the computing industry is that new technologies tend to debut in the high-end market first, then later trickle down to the performance and mainstream market. If the high-end part only differs by an increased clock speed, an enthusiast can attempt to overclock a mainstream part to simulate the high-end offering. This can give insight on how over-the-horizon technologies will perform before they are officially available on the mainstream market, which can be especially helpful for other users considering if they should plan ahead to purchase or upgrade to the new feature when it is officially released.
Some hobbyists enjoy building, tuning, and "Hot-Rodding" their systems in competitive benchmarking competitions, competing with other like-minded users for high scores in standardized computer benchmark suites. Others will purchase a low-cost model of a component in a given product line, and attempt to overclock that part to match a more expensive model's stock performance. Another approach is overclocking older components to attempt to keep pace with increasing system requirements and extend the useful service life of the older part or at least delay a purchase of new hardware solely for performance reasons. Another rationale for overclocking older equipment is even if overclocking stresses equipment to the point of failure earlier, little is lost as it is already depreciated, and would have needed to be replaced in any case.[2]
Components
Technically any component that uses a timer (or clock) to synchronize its internal operations can be overclocked. Most efforts for computer components however focus on specific components, such as, processors (a.k.a. CPU), video cards, motherboard chipsets, and RAM. Most modern processors derive their effective operating speeds by multiplying a base clock (processor bus speed) by an internal multiplier within the processor (the CPU multiplier) to attain their final speed.
Computer processors generally are overclocked by manipulating the CPU multiplier if that option is available, but the processor and other components can also be overclocked by increasing the base speed of the bus clock. Some systems allow additional tuning of other clocks (such as a system clock) that influence the bus clock speed that, again is multiplied by the processor to allow for finer adjustments of the final processor speed.
Most OEM systems do not expose to the user the adjustments needed to change processor clock speed or voltage in the BIOS of the OEM's motherboard, which precludes overclocking (for warranty and support reasons). The same processor installed on a different motherboard offering adjustments will allow the user to change them.
Any given component will ultimately stop operating reliably past a certain clock speed. Components will generally show some sort of malfunctioning behavior or other indication of compromised stability that alerts the user that a given speed is not stable, but there is always a possibility that a component will permanently fail without warning, even if voltages are kept within some pre-determined safe values. The maximum speed is determined by overclocking to the point of first instability, then accepting the last stable slower setting. Components are only guaranteed to operate correctly up to their rated values; beyond that different samples may have different overclocking potential. The end-point of a given overclock is determined by parameters such as available CPU multipliers, bus dividers, voltages; the user's ability to manage thermal loads, cooling techniques; and several other factors of the individual devices themselves such as semiconductor clock and thermal tolerances, interaction with other components and the rest of the system.
Considerations
There are several things to be considered when overclocking. First is to ensure that the component is supplied with adequate power at a voltage sufficient to operate at the new clock rate. Supplying the power with improper settings or applying excessive voltage can permanently damage a component.
In a professional production environment, overclocking is only likely to be used where the increase in speed justifies the cost of the expert support required, the possibly reduced reliability, the consequent effect on maintenance contracts and warranties, and the higher power consumption. If faster speed is required it is often cheaper when all costs are considered to buy faster hardware.
Cooling
All electronic circuits produce heat generated by the movement of electric current. As clock frequencies in digital circuits and voltage applied increase, the heat generated by components running at the higher performance levels also increases. The relationship between clock frequencies and thermal design power (TDP) are linear. However, there is a limit to the maximum frequency which is called a "wall". To overcome this issue, overclockers raise the chip voltage to increase the overclocking potential. Voltage increases power consumption and consequently heat generation significantly (proportionally to the square of the voltage in a linear circuit, for example); this requires more cooling to avoid damaging the hardware by overheating. In addition, some digital circuits slow down at high temperatures due to changes in MOSFET device characteristics. Conversely, the overclocker may decide to decrease the chip voltage while overclocking (a process known as undervolting), to reduce heat emissions while performance remains optimal.
Stock cooling systems are designed for the amount of power produced during non-overclocked use; overclocked circuits can require more cooling, such as by powerful fans, larger heat sinks, heat pipes and water cooling. Mass, shape, and material all influence the ability of a heatsink to dissipate heat. Efficient heatsinks are often made entirely of copper, which has high thermal conductivity, but is expensive.[3] Aluminium is more widely used; it has good thermal characteristics, though not as good as copper, and is significantly cheaper. Cheaper materials such as steel do not have good thermal characteristics. Heat pipes can be used to improve conductivity. Many heatsinks combine two or more materials to achieve a balance between performance and cost.[3]
Water cooling carries waste heat to a radiator. Thermoelectric cooling devices which actually refrigerate using the Peltier effect can help with high thermal design power (TDP) processors made by Intel and AMD in the early twenty-first century. Thermoelectric cooling devices create temperature differences between two plates by running an electric current through the plates. This method of cooling is highly effective, but itself generates significant heat elsewhere which must be carried away, often by a convection-based heatsink or a water cooling system.
Other cooling methods are forced convection and phase transition cooling which is used in refrigerators and can be adapted for computer use. Liquid nitrogen, liquid helium, and dry ice are used as coolants in extreme cases,[4] such as record-setting attempts or one-off experiments rather than cooling an everyday system. In June 2006, IBM and Georgia Institute of Technology jointly announced a new record in silicon-based chip clock rate (the rate a transistor can be switched at, not the CPU clock rate[5]) above 500 GHz, which was done by cooling the chip to 4.5 K (−268.6 °C; −451.6 °F) using liquid helium.[6] Set in November 2012, the CPU Frequency World Record is 9008.82 MHz as of December 2022.[7] These extreme methods are generally impractical in the long term, as they require refilling reservoirs of vaporizing coolant, and condensation can form on chilled components.[4] Moreover, silicon-based junction gate field-effect transistors (JFET) will degrade below temperatures of roughly 100 K (−173 °C; −280 °F) and eventually cease to function or "freeze out" at 40 K (−233 °C; −388 °F) since the silicon ceases to be semiconducting,[8] so using extremely cold coolants may cause devices to fail.
Submersion cooling, used by the Cray-2 supercomputer, involves sinking a part of computer system directly into a chilled liquid that is thermally conductive but has low electrical conductivity. The advantage of this technique is that no condensation can form on components.[9] A good submersion liquid is Fluorinert made by 3M, which is expensive. Another option is mineral oil, but impurities such as those in water might cause it to conduct electricity.[9]
Amateur overclocking enthusiasts have used a mixture of dry ice and a solvent with a low freezing point, such as acetone or isopropyl alcohol.[10] This cooling bath, often used in laboratories, achieves a temperature of −78 °C (−108 °F).[11] However, this practice is discouraged due to its safety risks; the solvents are flammable and volatile, and dry ice can cause frostbite (through contact with exposed skin) and suffocation (due to the large volume of carbon dioxide generated when it sublimes).
Stability and functional correctness
As an overclocked component operates outside of the manufacturer's recommended operating conditions, it may function incorrectly, leading to system instability. Another risk is silent data corruption by undetected errors. Such failures might never be correctly diagnosed and may instead be incorrectly attributed to software bugs in applications, device drivers, or the operating system. Overclocked use may permanently damage components enough to cause them to misbehave (even under normal operating conditions) without becoming totally unusable.
A large-scale 2011 field study of hardware faults causing a system crash for consumer PCs and laptops showed a four to 20 times increase (depending on CPU manufacturer) in system crashes due to CPU failure for overclocked computers over an eight-month period.[12]
In general, overclockers claim that testing can ensure that an overclocked system is stable and functioning correctly. Although software tools are available for testing hardware stability, it is generally impossible for any private individual to thoroughly test the functionality of a processor.[13] Achieving good fault coverage requires immense engineering effort; even with all of the resources dedicated to validation by manufacturers, faulty components and even design faults are not always detected.
A particular "stress test" can verify only the functionality of the specific instruction sequence used in combination with the data and may not detect faults in those operations. For example, an arithmetic operation may produce the correct result but incorrect flags; if the flags are not checked, the error will go undetected.
To further complicate matters, in process technologies such as silicon on insulator (SOI), devices display hysteresis—a circuit's performance is affected by the events of the past, so without carefully targeted tests it is possible for a particular sequence of state changes to work at overclocked rates in one situation but not another even if the voltage and temperature are the same. Often, an overclocked system which passes stress tests experiences instabilities in other programs.[14]
In overclocking circles, "stress tests" or "torture tests" are used to check for correct operation of a component. These workloads are selected as they put a very high load on the component of interest (e.g. a graphically intensive application for testing video cards, or different math-intensive applications for testing general CPUs). Popular stress tests include Prime95, Everest, Superpi, OCCT, AIDA64, Linpack (via the LinX and IntelBurnTest GUIs), SiSoftware Sandra, BOINC, Intel Thermal Analysis Tool and Memtest86. The hope is that any functional-correctness issues with the overclocked component will manifest themselves during these tests, and if no errors are detected during the test, then the component is deemed "stable". Since fault coverage is important in stability testing, the tests are often run for long periods of time, hours or even days. An overclocked computer is sometimes described using the number of hours and the stability program used, such as "prime 12 hours stable".
Factors allowing overclocking
Overclockability arises in part due to the economics of the manufacturing processes of CPUs and other components. In many cases components are manufactured by the same process, and tested after manufacture to determine their actual maximum ratings. Components are then marked with a rating chosen by the market needs of the semiconductor manufacturer. If manufacturing yield is high, more higher-rated components than required may be produced, and the manufacturer may mark and sell higher-performing components as lower-rated for marketing reasons. In some cases, the true maximum rating of the component may exceed even the highest rated component sold. Many devices sold with a lower rating may behave in all ways as higher-rated ones, while in the worst case operation at the higher rating may be more problematical.
Notably, higher clocks must always mean greater waste heat generation, as semiconductors set to high must dump to ground more often. In some cases, this means that the chief drawback of the overclocked part is far more heat dissipated than the maximums published by the manufacturer. Pentium architect Bob Colwell calls overclocking an "uncontrolled experiment in better-than-worst-case system operation".[15]
Measuring effects of overclocking
Benchmarks are used to evaluate performance, and they can become a kind of "sport" in which users compete for the highest scores. As discussed above, stability and functional correctness may be compromised when overclocking, and meaningful benchmark results depend on the correct execution of the benchmark. Because of this, benchmark scores may be qualified with stability and correctness notes (e.g. an overclocker may report a score, noting that the benchmark only runs to completion 1 in 5 times, or that signs of incorrect execution such as display corruption are visible while running the benchmark). A widely used test of stability is Prime95, which has built-in error checking that fails if the computer is unstable.
Using only the benchmark scores, it may be difficult to judge the difference overclocking makes to the overall performance of a computer. For example, some benchmarks test only one aspect of the system, such as memory bandwidth, without taking into consideration how higher clock rates in this aspect will improve the system performance as a whole. Apart from demanding applications such as video encoding, high-demand databases and scientific computing, memory bandwidth is typically not a bottleneck, so a great increase in memory bandwidth may be unnoticeable to a user depending on the applications used. Other benchmarks, such as 3DMark, attempt to replicate game conditions.
Manufacturer and vendor overclocking
Commercial system builders or component resellers sometimes overclock to sell items at higher profit margins. The seller makes more money by overclocking lower-priced components which are found to operate correctly and selling equipment at prices appropriate for higher-rated components. While the equipment will normally operate correctly, this practice may be considered fraudulent if the buyer is unaware of it.[original research?]
Overclocking is sometimes offered as a legitimate service or feature for consumers, in which a manufacturer or retailer tests the overclocking capability of processors, memory, video cards, and other hardware products. Several video card manufactures now offer factory-overclocked versions of their graphics accelerators, complete with a warranty, usually at a price intermediate between that of the standard product and a non-overclocked product of higher performance.
It is speculated that manufacturers implement overclocking prevention mechanisms such as CPU multiplier locking to prevent users from buying lower-priced items and overclocking them. These measures are sometimes marketed as a consumer protection benefit, but are often criticized by buyers.
Many motherboards are sold, and advertised, with extensive facilities for overclocking implemented in hardware and controlled by BIOS settings.[16]
CPU multiplier locking
CPU multiplier locking is the process of permanently setting a CPU's clock multiplier. AMD CPUs are unlocked in early editions of a model and locked in later editions, but nearly all Intel CPUs are locked and recent[when?] models are very resistant to unlocking to prevent overclocking by users. AMD ships unlocked CPUs with their Opteron, FX, All Ryzen desktop chips (except 3D variants) and Black Series line-up, while Intel uses the monikers of "Extreme Edition" and "K-Series." Intel usually has one or two Extreme Edition CPUs on the market as well as X series and K series CPUs analogous to AMD's Black Edition. AMD has the majority of their desktop range in a Black Edition.
Users usually unlock CPUs to allow overclocking, but sometimes to allow for underclocking in order to maintain the front side bus speed (on older CPUs) compatibility with certain motherboards. Unlocking generally invalidates the manufacturer's warranty, and mistakes can cripple or destroy a CPU. Locking a chip's clock multiplier does not necessarily prevent users from overclocking, as the speed of the front-side bus or PCI multiplier (on newer CPUs) may still be changed to provide a performance increase. AMD Athlon and Athlon XP CPUs are generally unlocked by connecting bridges (jumper-like points) on the top of the CPU with conductive paint or pencil lead. Other CPU models may require different procedures.
Increasing front-side bus or northbridge/PCI clocks can overclock locked CPUs, but this throws many system frequencies out of sync, since the RAM and PCI frequencies are modified as well.
One of the easiest ways to unlock older AMD Athlon XP CPUs was called the pin mod method, because it was possible to unlock the CPU without permanently modifying bridges. A user could simply put one wire (or some more for a new multiplier/Vcore) into the socket to unlock the CPU. More recently however, notably with Intel's Skylake architecture, Intel had a bug with the Skylake (6th gen Core) processors where the base clock could be increased past 102.7 MHz, however functionality of certain features would not work. Intel intended to block base clock (BCLK) overclocking of locked processors when designing the Skylake architecture to prevent consumers from purchasing cheaper components and overclocking to previously-unseen heights (since the CPU's BCLK was no longer tied to the PCI buses), however for LGA1151, the 6th generation "Skylake" processors were able to be overclocked past 102.7 MHz (which was the intended limit by Intel, and was later mandated through later BIOS updates).[original research?] All other unlocked processors from LGA1151 and v2 (including 7th, 8th, and 9th generation) and BGA1440 allow for BCLK overclocking (as long as the OEM allows it), while all other locked processors from 7th, 8th, and 9th gen were not able to go past 102.7 Mhz. 10th gen however, could reach 103 Mhz [17] on the BCLK.
Advantages
This section possibly contains original research. (December 2011) |
- Higher performance in games, en-/decoding, video editing and system tasks at no additional direct monetary expense, but with increased electrical consumption and thermal output.
- System optimization: Some systems have "bottlenecks", where small overclocking of one component can help realize the full potential of another component to a greater percentage than when just the limiting hardware itself is overclocked. For instance: many motherboards with AMD Athlon 64 processors limit the clock rate of four units of RAM to 333 MHz. However, the memory performance is computed by dividing the processor clock rate (which is a base number times a CPU multiplier, for instance 1.8 GHz is most likely 9×200 MHz) by a fixed integer such that, at a stock clock rate, the RAM would run at a clock rate near 333 MHz. Manipulating elements of how the processor clock rate is set (usually adjusting the multiplier), it is often possible to overclock the processor a small amount, around 5-10%, and gain a small increase in RAM clock rate and/or reduction in RAM latency timings.
- It can be cheaper to purchase a lower performance component and overclock it to the clock rate of a more expensive component.
- Extending life on older equipment (through underclocking/undervolting).[original research?]
Disadvantages
General
This section possibly contains original research. (December 2011) |
- Higher clock rates and voltages increase power consumption, also increasing electricity cost and heat production. The additional heat increases the ambient air temperature within the system case, which may affect other components. The hot air blown out of the case heats the room it's in.
- Fan noise: High-performance fans running at maximum speed used for the required degree of cooling of an overclocked machine can be noisy, some producing 50 dB or more of noise. When maximum cooling is not required, in any equipment, fan speeds can be reduced below the maximum: fan noise has been found to be roughly proportional to the fifth power of fan speed; halving speed reduces noise by about 15 dB.[18] Fan noise can be reduced by design improvements, e.g. with aerodynamically optimized blades for smoother airflow, reducing noise to around 20 dB at approximately 1 metre[citation needed] or larger fans rotating more slowly, which produce less noise than smaller, faster fans with the same airflow. Acoustical insulation inside the case e.g. acoustic foam can reduce noise. Additional cooling methods which do not use fans can be used, such as liquid and phase-change cooling.
- An overclocked computer may become unreliable. For example: Microsoft Windows may appear to work with no problems, but when it is re-installed or upgraded, error messages may be received such as a "file copy error" during Windows Setup.[19] Because installing Windows is very memory-intensive, decoding errors may occur when files are extracted from the Windows XP CD-ROM
- The lifespan of semiconductor components may be reduced by increased voltages and heat.
- Warranties may be voided by overclocking.
Risks of overclocking
- Increasing the operation frequency of a component will usually increase its thermal output in a linear fashion, while an increase in voltage usually causes thermal power to increase quadratically.[20] Excessive voltages or improper cooling may cause chip temperatures to rise to dangerous levels, causing the chip to be damaged or destroyed.
- Exotic cooling methods used to facilitate overclocking such as water cooling are more likely to cause damage if they malfunction. Sub-ambient cooling methods such as phase-change cooling or liquid nitrogen will cause water condensation, which will cause electrical damage unless controlled; some methods include using kneaded erasers or shop towels to catch the condensation.
Limitations
Overclocking components can only be of noticeable benefit if the component is on the critical path for a process, if it is a bottleneck. If disk access or the speed of an Internet connection limit the speed of a process, a 20% increase in processor speed is unlikely to be noticed, however there are some scenarios where increasing the clock speed of a processor actually allows an SSD to be read and written to faster. Overclocking a CPU will not noticeably benefit a game when a graphics card's performance is the "bottleneck" of the game.
Graphics cards
Graphics cards can also be overclocked. There are utilities to achieve this, such as EVGA's Precision, RivaTuner, AMD Overdrive (on AMD cards only), MSI Afterburner, Zotac Firestorm, and the PEG Link Mode on Asus motherboards. Overclocking a GPU will often yield a marked increase in performance in synthetic benchmarks, usually reflected in game performance.[21] It is sometimes possible to see that a graphics card is being pushed beyond its limits before any permanent damage is done by observing on-screen artifacts or unexpected system crashes. It is common to run into one of those problems when overclocking graphics cards; both symptoms at the same time usually means that the card is severely pushed beyond its heat, clock rate, and/or voltage limits, however if seen when not overclocked, they indicate a faulty card. After a reboot, video settings are reset to standard values stored in the graphics card firmware, and the maximum clock rate of that specific card is now deducted.
Some overclockers apply a potentiometer to the graphics card to manually adjust the voltage (which usually invalidates the warranty). This allows for finer adjustments, as overclocking software for graphics cards can only go so far. Excessive voltage increases may damage or destroy components on the graphics card or the entire graphics card itself (practically speaking).
Flashing
Alternatives
Flashing and unlocking can be used to improve the performance of a video card, without technically overclocking (but is much riskier than overclocking just through software).
Flashing refers to using the firmware of a different card with the same (or sometimes similar) core and compatible firmware, effectively making it a higher model card; it can be difficult, and may be irreversible. Sometimes standalone software to modify the firmware files can be found, e.g. NiBiTor (GeForce 6/7 series are well regarded in this aspect), without using firmware for a better model video card. For example, video cards with 3D accelerators (most, as of 2011) have two voltage and clock rate settings, one for 2D and one for 3D, but were designed to operate with three voltage stages, the third being somewhere between the aforementioned two, serving as a fallback when the card overheats or as a middle-stage when going from 2D to 3D operation mode. Therefore, it could be wise to set this middle-stage prior to "serious" overclocking, specifically because of this fallback ability; the card can drop down to this clock rate, reducing by a few (or sometimes a few dozen, depending on the setting) percent of its efficiency and cool down, without dropping out of 3D mode (and afterwards return to the desired high performance clock and voltage settings).
Some cards have abilities not directly connected with overclocking. For example, Nvidia's GeForce 6600GT (AGP flavor) has a temperature monitor used internally by the card, invisible to the user if standard firmware is used. Modifying the firmware can display a 'Temperature' tab.
Unlocking refers to enabling extra pipelines or pixel shaders. The 6800LE, the 6800GS and 6800 (AGP models only) were some of the first cards to benefit from unlocking. While these models have either 8 or 12 pipes enabled, they share the same 16x6 GPU core as a 6800GT or Ultra, but pipelines and shaders beyond those specified are disabled; the GPU may be fully functional, or may have been found to have faults which do not affect operation at the lower specification. GPUs found to be fully functional can be unlocked successfully, although it is not possible to be sure that there are undiscovered faults; in the worst case the card may become permanently unusable.
History
Overclocked processors first became commercially available in 1983,[citation needed] when AMD sold an overclocked version of the Intel 8088 CPU. In 1984, some consumers were overclocking IBM's version of the Intel 80286 CPU by replacing the clock crystal. Xeon W-3175X is the only Xeon with a multiplier unlocked for overclocking.[original research?]
See also
References
- "Alt+Esc | GTX 780 Overclocking Guide". Archived from the original on June 24, 2013. Retrieved June 18, 2013.
- Notes
- Colwell, Bob (March 2004). "The Zen of Overclocking". Computer. 37 (3): 9–12. doi:10.1109/MC.2004.1273994. S2CID 21582410.
External links
- Media related to Overclocking at Wikimedia Commons
- OverClocked inside
- How to Overclock a PC, WikiHow
- Overclocking guide for the Apple iMac G4 main logic board
Overclocking and benchmark databases
- OC Database of all PC hardware for the past decade (applications, modifications and more)
- HWBOT: Worldwide Overclocking League – Overclocking competition and data
- Comprehensive CPU OC Database
- Segunda Convencion Nacional de OC: Overclocking Extremo by Imperio Gamer
- Tool for overclock
https://en.wikipedia.org/wiki/Overclocking
No comments:
Post a Comment