积分规则 网站地图 帮助中心    
嵌入式软件 单 片 机 DSP 存储器 传感控制 光电显示
嵌入式硬件 CPLD/FPGA SOPC AD/DA 接口电路 模拟设计
I C设计 通信产品 汽车电子 电源产品 消费电子 数控系统
工业控制 军工/航天 安防产品 医疗电子 计算机外设 测试测量
供应 I C
求购 公司库

  IC 求购 销售 公司 论文 DATASHEET 参考设计 论坛
当前位置: 首页 >> 英文资料
  相关分类:
How to perform meaningful benchmarks on FPGAs from different vendors(part2)
 
作者:Seyi Verma, Altera Corporation   来源:本站原创    点击数:623   更新时间:2008-7-29
您可以添加到网摘 让更多人关注此文章:

    

Performance advantages increase with design size
In Fig 2, the Y-axis shows the ratio of the fMAX achieved between Stratix III FPGAs and Virtex-5 FPGAs. The X-axis shows the number of cores stamped for each of the seven OpenCore designs. Any data point above the 1.0 line indicates a Stratix III FPGA advantage in terms of performance. To increase the design size (hence utilization), the number of stamps instantiated in the FPGA for each core was increased. As the number of stamps increases, the results show:

  • The fMAX ratio increases because of a more rapid performance degradation in the nearest competing device. The performance advantage of the Stratix III FPGA increases up to 65%.
  • Quartus II software exploits the efficient architecture and superior routing interconnects in Stratix III FPGAs to reach the most logic elements with fewest hops in high performance applications (Click Here for more details on the Stratix III device family architecture).


2. Stratix III performance advantage increases with design size (utilization).
(Click this image to view a larger, more detailed version)

Utilization advantages
Fig 3 and Table 3 show the maximum number of cores that could be instantiated in the FPGA core. In terms of utilization, the results show:

  • Stratix III FPGAs, on average, have a 46% utilization advantage over the nearest competing device.
  • Quartus II software maximizes utilization with the adaptive logic module (ALM) to implement logic functions, which is extremely efficient because of the ALM's fracturability.


3. Stratix III FPGAs fit more logic on comparable devices.
(Click this image to view a larger, more detailed version)


Table 3. Maximum number of stamps instantiated and utilization.

Table 4 shows the error codes received when attempting to increase the design size beyond the number of stamps shown above. Note that the nearest competing device often fails to compile prematurely with "no route" errors.


Table 4. Error code for next core stamp.

Note:
(1) LABs = Logic Array Blocks.

Compile time advantages
Fig 4 and Table 5 show a comparison in compile times limited by the maximum number of cores that could fit in the nearest competing device. In terms of compile times, the results show Stratix III FPGAs compile up to 9x faster than the nearest competing device.


4. Compile time comparison.
(Click this image to view a larger, more detailed version)


Table 5. Compile time comparison.

Click Here for details about the OpenCore Stamping and Benchmarking Methodology (PDF)

    Altera also has a third-party, industry-expert-endorsed performance benchmarking methodology, which is used to compare FPGA performance between families from a single FPGA vendor and with those of competitive solutions. This ensures a consistent benchmarking environment when testing Altera FPGAs and when comparing them to competitive FPGAs.

    • Click Here for more details about this benchmarking methodology.
    • Click Here to see Altera's FPGA Performance Benchmarking Methodology (White Paper).
    • Click Here to see Guidance for Accurately Benchmarking FPGAs (White Paper).

     

    [NextPage]

     

    The "million-dollar question" is...
    So, the "million dollar question is": Why do Altera's FPGAs have higher performance, better utilization, and faster compiles compared to competing devices?

    The key to high performance in Stratix III FPGAs is the area-efficient Adaptive Logic Module (ALM) and its MultiTrack routing architecture. Both of these topics will be considered in detail as follows:

    Adaptive Logic Module (ALM) architecture
    The Altera ALM consists of an 8-input fracturable LUT, two registers, and two adders, as shown in Fig 5. One ALM can implement all 6-input functions, select 7-input functions, or it can be fractured into smaller look up tables (LUTs) to implement two independent functions.


    5. Adaptive Logic Module (ALM) block diagram.
    (Click this image to view a larger, more detailed version)

    The ALM is significantly more flexible and, as a result, is more area-efficient than the Xilinx Virtex-5 logic element (also called a LUT-flipflop pair). The Virtex-5 logic element consists of a basic 6-LUT, carry logic, and a single register as shown in Fig 6. In comparison, the combinational logic portion of the ALM has eight inputs and supports all 6-input functions plus many other combinations of smaller functions using its two outputs. The combinational logic portion of the Virtex-5 logic element, a basic 6-LUT, also has 64 bits of CRAM and two outputs like the ALM, but only contains 6 inputs and has a limited ability to implement more than one logic function.


    6. Comparing the Stratix III ALM and the Virtex-5 LUT-Flipflop Pair.
    (Click this image to view a larger, more detailed version)

    Although the Virtex-5 basic 6-LUT has the ability to implement two smaller functions, it will usually be used only as a 6-LUT. Because the LUT only has six inputs, the required number of shared inputs places severe restrictions on the types of functions that can be combined. These restrictions make using the basic 6-LUT as two 5-LUTs a rare occurrence. In contrast, the two additional inputs in the ALM allow it to be used as two fully functional 5-LUTs, providing a significant area advantage. Table 6 gives the number of shared inputs required for a few combinations of functions.


    Table 6. Stratix III ALM vs. Virtex-5 LUT flexibility.

    For example, an Altera ALM can implement two independent 4-input functions (no inputs shared), while the Virtex-5 LUT requires three shared inputs. Fig 7 shows another example: the ALM can implement a 5-input and a 3-input function without any shared inputs, while the Virtex-5 LUT requires three shared inputs. The end result is that it is difficult to find functions that can be packed into a Virtex-5 LUT, resulting in functions with less than six inputs being implemented in 6-LUT resources.


    7. Implementing 5- and 3-input functions in a Stratix III ALM and a Virtex-5 LUT-FlipflopPair.
    (Click this image to view a larger, more detailed version)

    As a result of the fracturable LUT, Stratix III ALMs can, on average, pack 1.8X more logic than the Virtex-5 LUT-FF pair.

    MultiTrack routing architecture
    The MultiTrack routing architecture used in Stratix III devices provides the connectivity between different clusters of logic blocks, and can be measured by the number of "hops" required to get from one logic array block (LAB) to another. The fewer the number of hops and more predictable the pattern, the better the performance and the easier it is for CAD tool optimization. The MultiTrack interconnect routing architecture provides more accessibility to all surrounding LABs with fewer connections, thus maximizing performance, reducing power, and minimizing area congestion for better logic packing. Considering only wires of length four for simplicity, Fig 8 shows the number of hops required to connect to LABs from a given LAB located at the location denoted by the gray box.


    8. Altera's MultiTrack routing architecture.
    (Click this image to view a larger, more detailed version)

    Table 7 compares the connectivity of the Stratix III family with Virtex-5 in terms of the number of LABs/CLBs reachable in a given number of hops. In Stratix III devices, many more LABs (34) can be reached in one hop than CLBs in Virtex-5 devices. If the numbers are scaled by the greater efficiency of the ALM, the results are even more favorable to Stratix III devices. Because a LAB contains the equivalent of 25 4-LUT-based LEs versus the approximately 11 of Virtex-5 (using the 1.8X factor), if we scale the amount of logic that can be reached within a given number of hops by these factors, the improved routing connectivity in terms of logic capacity is correspondingly greater.


    Table 7. Stratix III versus Virtex-5 connectivity capability.

    Note:
    (1) 1 ALM = 2.5 LEs and each LAB = 10 ALMs

    Conclusion
    With the increasing complexity of circuit designs utilizing FPGAs, necessary innovations are required in FPGA architectures while moving down the shrinking process node path. Core architecture enhancements come "relatively" easy, but the optimal architecture with the right tradeoffs to balance performance, increased logic capacity (density), and optimal routing architecture for software to easily translate these benefits into reality is extremely difficult.

    Unbiased and meaningful benchmarks that truly compare the hardware architectures and software design tool chains are the goals of OpenCores. Altera has honestly and unbiasedly created these benchmarks by using comprehensive and representative customer designs.

    This statement of purpose is, of course, very easy for us to say and – in fact – any company might use something like this as promotional hype and figure an angle to try to use it as a competitive advantage. In order to address this issue, Altera is making all aspects of these benchmarks available to anyone who wants to utilize them. This includes the RTL for the designs, the setup/constraint files for the tests, and so on.

    The key benefits of OpenCore are that we tell potential customers exactly where the designs are, we make the files available, and we advocate our customers checking out the results for themselves. The results of the OpenCore Benchmarks show that the advantages seen in Stratix III FPGAs – performance, utilization, and compile times – increase as the design size increases. This fundamentally is due to the optimal architecture that Quartus II software easily exploits.

    As a final note, the results presented here are for our 65 nm Stratix III FPGA family, because this provided a fair comparison with the 65 nm Virtex 5 family. Moving forward to decreasing process nodes, FPGA densities are more then 2x at 40-nm, and Altera's recently announced 40 nm Stratix IV FPGAs carry forward the validated advantages of the innovative ALM logic structure and MultiTrack routing architecture to deliver the highest performance, highest utilization, and lowest compile times.

    Seyi Verma is the Senior High-End Technical Analysis Staff responsible for technical product analysis, FGPA architecture, and technology solutions for Altera's high-end FPGA product lines.

    Seyi has been with Altera since 1998 also held Product Engineering positions at Altera where he was responsible for Configuration, Mercury and Stratix Series product lines. His experience covers all aspects of programmable solutions, including FPGA testing, silicon debug, characterization, failure analysis, software and tools. Seyi holds a BSEE from Rochester Institute of Technology, New York.



    相关文章
    · HP实验室展示集CMOS电路与忆阻器3D芯片[23]
    · 德国一处理器架构开发商指控赛灵思和安富利侵权[25]
    · 低功耗、DFM及高速接口是65/40纳米设计重点[83]
    · 采用FPGA实现视频和图像处理设计[275]
    · 用FPGA构成液晶显示控制器[186]
    热门评论排行
    ·VHDL设计中电路简化问题的
    ·ARM嵌入式系统基础教程(N
    ·江苏嵌入式Linux教育培训
    ·ARM处理器应用开发4步骤
    ·锐极LINUX驱动培训班定于

    文章评论
        没有任何评论
    *只显示最新10条评论。评论内容只代表网友观点,与本站立场无关。更多评论
    发表评论
      * 请先[登陆]再进行评论,谢谢。
    评分: 1分 2分 3分 4分 5分
    内容: *
    发帖须知:
    一.所发文章必须遵守《互联网电子公告服务管理规定》;
    二.严禁发布供求代理信息,公司介绍,产品信息等广告宣传信息;
    三.严禁恶意重复发帖;
    四.严禁对个人,实体,民族,国家等进行漫骂,污蔑,诽谤。
     
    热点新闻 [更多]
     
    电子制造业如何化危为机
    谁能成为“Wii”加速度传感器的新供
    AMD抨击英特尔拖延反垄断指控案作法
    台湾立院通过DRAM业整并提案&nb
    Vishay 推出面向工业
    中航芯控:医疗领域RFID应用势在必
    安森美半导体为DDR3存储器模块应用
    避免危及半导体业 德出手挽
    HP实验室展示集CMOS电路与忆阻器
    全球第二大代工巨头伟创力瘦身应对金融
     
    热门下载 [更多]
     
    [ PCB设计] Protel99教程下载
    [ ] 手把手学单片机20个例
    [ ] 单片机做的智能台灯
    [ ] 单片机入门书
    [ ] linux系统移植开发文档
    [ ] IC卡的读写程序
    [ ] 8051单片机C语言彻底应用
    [ 常用软件] 555定时器电路设计软件V1.2
    [ 常用软件] 51定时器计算软件
    [ ] ARM处理器应用开发4步骤
     
    论坛新帖 [更多]
     
    [原创]工业平板电脑PVT-P...
    Palm软件设计前的六问...
    李嘉诚传中给年轻人提出了53条...
    WindRiver产品介绍...
    嵌入式Linux的GDB远程调...
    ARM无痛苦起步...
    最豪华适用的S3C2440开发...
    [推荐]10MF020以太网数...
    [推荐]10MF020以太网数...
    [推荐]F020以太网核心模块...

     
    赞助商 [更多]
     

    ICP许可证号:[粤 05056597]
    联系电话:010-82517432 82517615 传真: 010-82517615

    版权所有 Copyright © 2006 嵌入式技术网