最近一段时间在写一个proposal,所以片上网络小记-3更新慢了一些。各位看官请见谅。片上网络小记主要涉及的都是各类基础知识和基本概念,但是理清基础和基本概念还是相当重要的。片上网络小记也主要是回顾历史,回顾历史也是为了更好的展望未来。“如果一个人不知道他出生之前的事,那么它就永远只能是个孩子。” 理解了学科的历史才能真正理解这门学科。有时候我们走得太久,却忘记了当初为什么要出发。这也是目前我的合作导师Axel教授一直在不断教育我的,他本人对现在很多“走邪路”的科研不屑一顾。他一直在反复强调的就是:不要热衷于去解决那些实际上并不存在的问题。
Ok,说了那么多,言归正传。今天要总结的是基于MPSoC架构的NoC,一种介于前两者之间的架构。前文说到,CMP不过是片上的SMP或者说是DSM结构的并行计算机,其核心是统一进行内存编址,L2或者L3Cache共享。程序运行的时候对统一的内存进行访问,NoC只是作为存储的一部分,用于替代传统的总线互联方式。这一块是传统搞并行计算、并行机的人的地盘,而且王博也很精彩的回复了很多了。这些暂且按下不表。
关于MPSoC的解释,我找了IEEE上引用率最高的一篇文章“Multiprocessor System-on-Chip (MPSoC) Technology”中的文字:
MULTIPROCESSOR systems-on-chips (MPSoCs) have emerged in the past decade as
an important class of very large scale integration (VLSI) systems. An MPSoC is
a system-on-chip—a VLSI system that incorporates most or all the components
necessary for an application—that uses multiple programmable processors as
system components. MPSoCs are widely used in networking, communications, signal
processing, and multimedia among other applications.
具体的我就不解释了,但是请大家注意最后一句 :MPSoC被广泛的应用于网络、通信、信号处理和多媒体。这些是什么领域?这些是传统的高性能嵌入式系统面向的领域。换言之搞MPSoC的是什么人,是原来搞嵌入式系统的那帮人而非搞大型机、并行计算的那帮人。以Chip形式存在的嵌入式系统,其实就是SoC。而MPSoC就是包含有多个可编程处理器的SoC.与CMP这种同构的、对称的多处理器结构不同,MPSoC因为是从嵌入式系统发展而来的,其处理器大多为异构处理器。因为嵌入式系统长期以来都是针对专用目的设计的定制化系统,通常会根据计算任务的需求采用多种可编程器件配合的结构。无论是早几年FPGA+DSP+ARM结构的所谓软件无线电平台,还是ARM+DSP的2G时代的移动电话,还是到今天高端手机中的大小核搭配、协处理器技术等。所以从一开始MPSoC就呈现出多个独立子系统分工协作,按照任务属性不对分配任务。因为嵌入式系统一直也就是这么搞的。
下面上几个比较经典的MPSoC的图:
下面这个也是网络处理器,思科的。16个cluster,每个cluster有12个PE,一个PE里面包含一个基本的RISC处理器系统,指令按cluster share,数据有专门的cluster 内部的交换单元。16*12一共192个处理器——MD,看完这个我简直没有继续研究的兴趣了——哈哈。
The structure of the applications for
which MPSoCs are designed is also important. Different applications have
different data flows that suggest multiple different types of architectures;
some of which are quite different than the multiprocessors designed for
scientific computing. The homogeneous architectural style is generally used for
data-parallel systems. Wireless base stations, in which the same algorithm is
applied to several independent data streams, is one example; motion estimation,
in which different parts of the image can be treated separately, is another.
Lucent Daytona,传说中的最早的MPSoC。4个对称结构的SPARC处理器+4个SIMD处理器,是用于无线基站的。这个看起来还和CMP很一致啊。不要着急,接下来有了这个: Philips Viper Nexperia
上面七七八八列举了这么多MPSoC的图片,意义何在呢?其实是要对比一下和CMP的区别。最早的CMP据说是由L. Hammond , B. A. Nayfeh and K.
Olukotun等人在1997年提出的。而商用化的CMP实际出现得比MPSoC要晚。因为对于做通用计算的人来说,单核其实是最简单的编程模型。不到万不得已,那些习惯了玩应用层编程和“有对象”的程序是绝对没有兴趣去考虑什么针对底层硬件结构的程序优化的。而MPSoC则不同。作为MPSoC的前身MPSoB(呵呵,System on Board ,其实哪里有什么真正的SoC啊,最终的完整系统还不得在一个Board上面实现)早就是用各种各样的芯片、接插件、连接器甚至是机电一体化装置做成一个专用设备,而那些苦逼的嵌入式系统程序员当时理所应当的需要对着各种苦逼的硬件打着交道,即便是编程也是面向硬件操作的编程。所以他们觉得设计一个异构的,每个模块都是针对的不同功能而优化的多处理器芯片是很自然的事情。
“Multiprocessor System-on-Chip (MPSoC) Technology”对于为什么要在MPSoC上面实现多个异构处理器是这样描述的:Applications like
multimedia and high-speed data communication not only require high levels of
performance but also require implementations to meet strict quantitative goals.
The term “high-performance computing” is traditionally used to describe applications like scientific
computing that require large volumes of computation but do not set out strict
goals about how long those computations should take. Embedded computing, in
contrast, implies real-time performance. In real-time systems, if the
computation is not done by a certain deadline, the system fails. If the
computation is done early, the system may not benefit (and in some pathological
schedules, finishing one task early may cause another task to be unacceptably
delayed).High-performance embedded computing is a very different problem from
high-performance scientific computing. Furthermore, these high-performance
systems must often operate within strict power and cost budgets. As a result,
MPSoC designers have repeatedly concluded that business-as-usual is not
sufficient to build platforms for high-performance embedded applications.
文中还继续阐述了MPSoC和CMP面对的不同挑战: