Placeholder Image

Subtitles section Play video

  • In computing, FLOPS is a measure of computer performance, useful in fields of scientific

  • calculations that make heavy use of floating-point calculations. For such cases it is a more

  • accurate measure than the generic instructions per second.

  • Since the final S stands for "second", conservative speakers consider "FLOPS" as both the singular

  • and plural of the term, although the singular "FLOP" is frequently encountered. Alternatively,

  • the singular FLOP is used as an abbreviation for "FLoating-point OPeration", and a flop

  • count is a count of these operations. In this context, "flops" is simply the plural rather

  • than a rate, which would then be "flop/s". The expression 1 flops is actually interpreted

  • as .

  • Computing One can calculate FLOPS using this equation:

  • Most microprocessors today can do 4 FLOPs per clock cycle. Therefore, a single-core

  • 2.5 GHz processor has a theoretical performance of 10 billion FLOPS = 10 GFLOPS.

  • Note: In this context, sockets is referring to chip sockets on a motherboard, in other

  • words, how many computer chips are in use, with each chip having one or more cores on

  • it. This equation only applies to one very specific hardware architecture and it ignores

  • limits imposed by memory bandwidth and other constraints. In general, GigaFLOPS are not

  • determined by theoretical calculations such as this one; instead, they are measured by

  • benchmarks of actual performance/throughput. Because this equation ignores all sources

  • of overhead, in the real world, one will never get actual performance that is anywhere near

  • to what this equation predicts. Records

  • Single computer records In late 1996, Intel's ASCI Red was the world's

  • first computer to achieve one TFLOPS and beyond. Sandia director Bill Camp said that ASCI Red

  • had the best reliability of any supercomputer ever built, andwas supercomputing’s

  • high-water mark in longevity, price, and performance.” NEC's SX-9 supercomputer was the world's first

  • vector processor to exceed 100 gigaFLOPS per single core.

  • For comparison, a handheld calculator performs relatively few FLOPS. A computer response

  • time below 0.1 second in a calculation context is usually perceived as instantaneous by a

  • human operator, so a simple calculator needs only about 10 FLOPS to be considered functional.

  • In June 2006, a new computer was announced by Japanese research institute RIKEN, the

  • MDGRAPE-3. The computer's performance tops out at one petaFLOPS, almost two times faster

  • than the Blue Gene/L, but MDGRAPE-3 is not a general purpose computer, which is why it

  • does not appear in the Top500.org list. It has special-purpose pipelines for simulating

  • molecular dynamics. By 2007, Intel Corporation unveiled the experimental

  • multi-core POLARIS chip, which achievesTFLOPS at 3.13 GHz. The 80-core chip can raise this

  • result toTFLOPS at 6.26 GHz, although the thermal dissipation at this frequency

  • exceeds 190 watts. On June 26, 2007, IBM announced the second

  • generation of its top supercomputer, dubbed Blue Gene/P and designed to continuously operate

  • at speeds exceeding one petaFLOPS. When configured to do so, it can reach speeds in excess of

  • three petaFLOPS. In June 2007, Top500.org reported the fastest

  • computer in the world to be the IBM Blue Gene/L supercomputer, measuring a peak of 596 teraFLOPS.

  • The Cray XT4 hit second place with 101.7 teraFLOPS. On October 25, 2007, NEC Corporation of Japan

  • issued a press release announcing its SX series model SX-9, claiming it to be the world's

  • fastest vector supercomputer. The SX-9 features the first CPU capable of a peak vector performance

  • of 102.4 gigaFLOPS per single core. On February 4, 2008, the NSF and the University

  • of Texas at Austin opened full scale research runs on an AMD, Sun supercomputer named Ranger,

  • the most powerful supercomputing system in the world for open science research, which

  • operates at sustained speed of .5 petaFLOPS. On May 25, 2008, an American supercomputer

  • built by IBM, named 'Roadrunner', reached the computing milestone of one petaflops by

  • processing more than 1.026 quadrillion calculations per second. It headed the June 2008 and November

  • 2008 TOP500 list of the most powerful supercomputers. The computer is located at Los Alamos National

  • Laboratory in New Mexico, and the computer's name refers to the New Mexico state bird,

  • the Greater Roadrunner. In June 2008, AMD released ATI Radeon HD4800

  • series, which are reported to be the first GPUs to achieve one teraFLOPS scale. On August

  • 12, 2008 AMD released the ATI Radeon HD 4870X2 graphics card with two Radeon R770 GPUs totaling

  • 2.4 teraFLOPS. In November 2008, an upgrade to the Cray XT

  • Jaguar supercomputer at the Department of Energy’s Oak Ridge National Laboratory raised

  • the system's computing power to a peak 1.64 “petaflops,” or a quadrillion mathematical

  • calculations per second, making Jaguar the world’s first petaflops system dedicated

  • to open research. In early 2009 the supercomputer was named after a mythical creature, Kraken.

  • Kraken was declared the world's fastest university-managed supercomputer and sixth fastest overall in

  • the 2009 TOP500 list, which is the global standard for ranking supercomputers. In 2010

  • Kraken was upgraded and can operate faster and is more powerful.

  • In 2009, the Cray Jaguar performed at 1.75 petaFLOPS, beating the IBM Roadrunner for

  • the number one spot on the TOP500 list. In October 2010, China unveiled the Tianhe-I,

  • a supercomputer that operates at a peak computing rate of 2.5 petaflops.

  • As of 2010, the fastest six-core PC processor reaches 109 gigaFLOPS in double precision

  • calculations. GPUs are considerably more powerful. For example, Nvidia Tesla C2050 GPU computing

  • processors perform around 515 gigaFLOPS in double precision calculations, and the AMD

  • FireStream 9270 peaks at 240 gigaFLOPS. In single precision performance, Nvidia Tesla

  • C2050 computing processors perform around 1.03 teraFLOPS and the AMD FireStream 9270

  • cards peak at 1.2 teraFLOPS. Both Nvidia and AMD's consumer gaming GPUs may reach higher

  • FLOPS. For example, AMD’s HemlockXT 5970 reaches 928 gigaFLOPS in double precision

  • calculations with two GPUs on board and the Nvidia GTX 480 reaches 672 gigaFLOPS with

  • one GPU on board. On December 2, 2010, the US Air Force unveiled

  • a defense supercomputer made up of 1,760 PlayStation 3 consoles that can run 500 trillion floating-point

  • operations per second. In November 2011, it was announced that Japan

  • had achieved 10.51 petaflops with its K computer. It is still under development and software

  • performance tuning is currently underway. It has 88,128 SPARC64 VIIIfx processors in

  • 864 racks, with theoretical performance of 11.28 petaflops. It is named after the Japanese

  • word "kei", which stands for 10 quadrillion, corresponding to the target speed of 10 petaFLOPS.

  • On November 15, 2011, Intel demonstrated a single x86-based processor, code-named "Knights

  • Corner", sustaining more than a TeraFlop on a wide range of DGEMM operations. Intel emphasized

  • during the demonstration that this was a sustained TeraFlop, and that it was the first general

  • purpose processor to ever cross a TeraFlop. On June 18, 2012, IBM's Sequoia supercomputer

  • system, based at the U.S. Lawrence Livermore National Laboratory, reached 16 petaFLOPS,

  • setting the world record and claiming first place in the latest TOP500 list.

  • On November 12, 2012, the TOP500 list certified Titan as the world's fastest supercomputer

  • per the LINPACK benchmark, at 17.59 petaFLOPS. It was developed by Cray Inc. at the Oak Ridge

  • National Laboratory and combines AMD Opteron processors withKeplerNVIDIA Tesla

  • graphic processing unit technologies. On June 10, 2013, China's Tianhe-2 was ranked

  • the world's fastest with a record of 33.86 petaflops.

  • On April 8, 2014, AMD launched R9 295X2, a dual R9 290X in a single PCB, with 11.6 TFlops.

  • Distributed computing records Distributed computing uses the Internet to

  • link personal computers to achieve more FLOPS: Folding@home is sustaining over 20.7 native

  • petaFLOPS as of June 2014 or 43.1 x86 petaFLOPS. It is the first computing project of any kind

  • to cross the 1, 2, 3, 4, and 5 native petaFLOPS milestone. This level of performance is primarily

  • enabled by the cumulative effort of a vast array of powerful GPU and CPU units.

  • As of July 2014, The entire BOINC network averages about 5.6 petaFLOPS.

  • As of July 2014, SETI@Home, employing the BOINC software platform, averages 681 teraFLOPS.

  • As of July 2014, Einstein@Home, a project using the BOINC network , is crunching at

  • 492 teraFLOPS. As of July 2014, MilkyWay@Home, using the

  • BOINC infrastructure, computes at 471 teraFLOPS. As of July 2014, GIMPS, is searching for Mersenne

  • primes and sustaining 173 teraFLOPS. Future developments

  • In 2008, James Bamford's book The Shadow Factory reported that NSA told the Pentagon it would

  • need an exaflop computer by 2018. Given the current speed of progress, supercomputers

  • are projected to reach 1 exaFLOPS in 2019. Cray, Inc. announced in December 2009 a plan

  • to build a 1 EFLOPS supercomputer before 2020. Erik P. DeBenedictis of Sandia National Laboratories

  • theorizes that a zettaFLOPS computer is required to accomplish full weather modeling of two

  • week time span. Such systems might be built around 2030.

  • In India, ISRO and Indian Institute of Science have stated that they have planned to make

  • a 132.8 EFLOPS supercomputer by 2017, 100 times faster than any supercomputer ever planned.

  • They have estimated that the project would cost US $2 billion, which the state has budgeted.

  • Cost of computing Hardware costs

  • The following is a list of examples of computers that demonstrates how drastically performance

  • has increased and price has decreased. The "cost per GFLOPS" is the cost for a set of

  • hardware that would theoretically operate at one billion floating-point operations per

  • second. During the era when no single computing platform was able to achieve one GFLOPS, this

  • table lists the total cost for multiple instances of a fast computing platform which speed sums

  • to one GFLOPS. Otherwise, the least expensive computing platform able to achieve one GFLOPS

  • is listed.

  • The trend toward placing ever more transistors inexpensively on an integrated circuit follows

  • Moore's law. This trend explains the rising speed and falling cost of computer processing.

  • Operation costs In energy cost, according to the Green500

  • list, as of June 2011 the most efficient TOP500 supercomputer runs at 2097.19 MFLOPS per watt.

  • This translates to an energy requirement of 0.477 watts per GFLOPS, however this energy

  • requirement will be much greater for less efficient supercomputers.

  • Hardware costs for low cost supercomputers may be less significant than energy costs

  • when running continuously for several years. Floating-point operation and integer operation

  • FLOPS measures the computing ability of a computer. An example of a floating-point operation

  • is the calculation of mathematical equations; as such, FLOPS is a useful measure of supercomputer

  • performance. MIPS is used to measure the integer performance of a computer. Examples of integer

  • operation include data movement or value testing. MIPS as a performance benchmark is adequate

  • for the computer when it is used in database query, word processing, spreadsheets, or to

  • run multiple virtual operating systems. Frank H. McMahon, of the Lawrence Livermore National

  • Laboratory, invented the terms FLOPS and MFLOPS so that he could compare the so-called supercomputers

  • of the day by the number of floating-point calculations they performed per second. This

  • was much better than using the prevalent MIPS to compare computers as this statistic usually

  • had little bearing on the arithmetic capability of the machine.

  • Fixed-point These designations refer to the format used

  • to store and manipulate numeric representations of data without using a decimal point. Fixed-point

  • are designed to represent and manipulate integerspositive and negative whole numbers; for

  • example, 16 bits, yielding up to 65,536 possible bit patterns that typically represent the

  • whole numbers from −32768 to +32767. Floating-point

  • This is needed for very large or very small real numbers, or numbers requiring the use

  • of a decimal point. The encoding scheme used by the processor for floating-point numbers

  • is more complicated than for fixed-point. Floating-point representation is similar to

  • scientific notation, except everything is carried out in base two, rather than base

  • ten. The encoding scheme stores the sign, the exponent and the mantissa. While several

  • similar formats are in use, the most common is ANSI/IEEE Std. 754-1985. This standard

  • defines the format for 32-bit numbers called single precision, as well as 64-bit numbers

  • called double precision and longer numbers called extended precision. Floating-point

  • representations can support a much wider range of values than fixed-point, with the ability

  • to represent very small numbers and very large numbers.

  • Dynamic range and precision The exponentiation inherent in floating-point

  • computation assures a much larger dynamic rangethe largest and smallest numbers

  • that can be representedwhich is especially important when processing data sets which

  • are extremely large or where the range may be unpredictable. As such, floating-point

  • processors are ideally suited for computationally intensive applications.

  • See also

  • Gordon Bell Prize Orders of magnitude

  • References