Altera also has a third-party, industry-expert-endorsed performance benchmarking methodology, which is used to compare FPGA performance between families from a single FPGA vendor and with those of competitive solutions. This ensures a consistent benchmarking environment when testing Altera FPGAs and when comparing them to competitive FPGAs.
The "million-dollar question" is...So, the "million dollar question is":
Why do Altera's FPGAs have higher performance, better utilization, and faster compiles compared to competing devices?
The key to high performance in Stratix III FPGAs is the area-efficient Adaptive Logic Module (ALM) and its MultiTrack routing architecture. Both of these topics will be considered in detail as follows:
Adaptive Logic Module (ALM) architecture
The Altera ALM consists of an 8-input fracturable LUT, two registers, and two adders, as shown in Fig 5. One ALM can implement all 6-input functions, select 7-input functions, or it can be fractured into smaller look up tables (LUTs) to implement two independent functions.

5. Adaptive Logic Module (ALM) block diagram.
(Click this image to view a larger, more detailed version)
The ALM is significantly more flexible and, as a result, is more area-efficient than the Xilinx Virtex-5 logic element (also called a LUT-flipflop pair). The Virtex-5 logic element consists of a basic 6-LUT, carry logic, and a single register as shown in Fig 6. In comparison, the combinational logic portion of the ALM has eight inputs and supports all 6-input functions plus many other combinations of smaller functions using its two outputs. The combinational logic portion of the Virtex-5 logic element, a basic 6-LUT, also has 64 bits of CRAM and two outputs like the ALM, but only contains 6 inputs and has a limited ability to implement more than one logic function.

6. Comparing the Stratix III ALM and the Virtex-5 LUT-Flipflop Pair.
(Click this image to view a larger, more detailed version)
Although the Virtex-5 basic 6-LUT has the ability to implement two smaller functions, it will usually be used only as a 6-LUT. Because the LUT only has six inputs, the required number of shared inputs places severe restrictions on the types of functions that can be combined. These restrictions make using the basic 6-LUT as two 5-LUTs a rare occurrence. In contrast, the two additional inputs in the ALM allow it to be used as two fully functional 5-LUTs, providing a significant area advantage. Table 6 gives the number of shared inputs required for a few combinations of functions.

Table 6. Stratix III ALM vs. Virtex-5 LUT flexibility.
For example, an Altera ALM can implement two independent 4-input functions (no inputs shared), while the Virtex-5 LUT requires three shared inputs. Fig 7 shows another example: the ALM can implement a 5-input and a 3-input function without any shared inputs, while the Virtex-5 LUT requires three shared inputs. The end result is that it is difficult to find functions that can be packed into a Virtex-5 LUT, resulting in functions with less than six inputs being implemented in 6-LUT resources.

7. Implementing 5- and 3-input functions in a Stratix III ALM and a Virtex-5 LUT-FlipflopPair.
(Click this image to view a larger, more detailed version)
As a result of the fracturable LUT, Stratix III ALMs can, on average, pack 1.8X more logic than the Virtex-5 LUT-FF pair.
MultiTrack routing architecture
The MultiTrack routing architecture used in Stratix III devices provides the connectivity between different clusters of logic blocks, and can be measured by the number of "hops" required to get from one logic array block (LAB) to another. The fewer the number of hops and more predictable the pattern, the better the performance and the easier it is for CAD tool optimization. The MultiTrack interconnect routing architecture provides more accessibility to all surrounding LABs with fewer connections, thus maximizing performance, reducing power, and minimizing area congestion for better logic packing. Considering only wires of length four for simplicity, Fig 8 shows the number of hops required to connect to LABs from a given LAB located at the location denoted by the gray box.

8. Altera's MultiTrack routing architecture.
(Click this image to view a larger, more detailed version)
Table 7 compares the connectivity of the Stratix III family with Virtex-5 in terms of the number of LABs/CLBs reachable in a given number of hops. In Stratix III devices, many more LABs (34) can be reached in one hop than CLBs in Virtex-5 devices. If the numbers are scaled by the greater efficiency of the ALM, the results are even more favorable to Stratix III devices. Because a LAB contains the equivalent of 25 4-LUT-based LEs versus the approximately 11 of Virtex-5 (using the 1.8X factor), if we scale the amount of logic that can be reached within a given number of hops by these factors, the improved routing connectivity in terms of logic capacity is correspondingly greater.

Table 7. Stratix III versus Virtex-5 connectivity capability.
Note:
(1) 1 ALM = 2.5 LEs and each LAB = 10 ALMs
Conclusion
With the increasing complexity of circuit designs utilizing FPGAs, necessary innovations are required in FPGA architectures while moving down the shrinking process node path. Core architecture enhancements come "relatively" easy, but the optimal architecture with the right tradeoffs to balance performance, increased logic capacity (density), and optimal routing architecture for software to easily translate these benefits into reality is extremely difficult.
Unbiased and meaningful benchmarks that truly compare the hardware architectures and software design tool chains are the goals of OpenCores. Altera has honestly and unbiasedly created these benchmarks by using comprehensive and representative customer designs.
This statement of purpose is, of course, very easy for us to say and – in fact – any company might use something like this as promotional hype and figure an angle to try to use it as a competitive advantage. In order to address this issue, Altera is making all aspects of these benchmarks available to anyone who wants to utilize them. This includes the RTL for the designs, the setup/constraint files for the tests, and so on.
The key benefits of OpenCore are that we tell potential customers exactly where the designs are, we make the files available, and we advocate our customers checking out the results for themselves. The results of the OpenCore Benchmarks show that the advantages seen in Stratix III FPGAs – performance, utilization, and compile times – increase as the design size increases. This fundamentally is due to the optimal architecture that Quartus II software easily exploits.
As a final note, the results presented here are for our 65 nm Stratix III FPGA family, because this provided a fair comparison with the 65 nm Virtex 5 family. Moving forward to decreasing process nodes, FPGA densities are more then 2x at 40-nm, and Altera's recently announced 40 nm Stratix IV FPGAs carry forward the validated advantages of the innovative ALM logic structure and MultiTrack routing architecture to deliver the highest performance, highest utilization, and lowest compile times.
Seyi Verma is the Senior High-End Technical Analysis Staff responsible for technical product analysis, FGPA architecture, and technology solutions for Altera's high-end FPGA product lines.
Seyi has been with Altera since 1998 also held Product Engineering positions at Altera where he was responsible for Configuration, Mercury and Stratix Series product lines. His experience covers all aspects of programmable solutions, including FPGA testing, silicon debug, characterization, failure analysis, software and tools. Seyi holds a BSEE from Rochester Institute of Technology, New York.