异构加速计算崛起,不应只是关注计算芯片
原文标题:Why SYCL: Elephants in the SYCL Room
By James Reinders and Michael Wong
摘录自:https://www.hpcwire.com/2022/02/03/why-sycl-elephants-in-the-sycl-room/
Commentary — In the second of a series of guest posts on heterogeneous computing, James Reinders, who returned to Intel last year after a short “retirement,” follows up on his piece about how SYCL will contribute to a heterogeneous future for C++. He is joined by Michael Wong, of Codeplay Software Ltd., who is also the current SYCL committee chair. Together, they offer their responses to what might be called the ‘Elephants in the SYCL Room.’
评论——在第二个关于异构计算的一系列客座帖子中,James Reinders在短暂的“退休”后于去年回到了英特尔,他继续了讲述SYCL将如何为C++的异构未来做出贡献的文章。Codeplay Software Ltd.的Michael Wong也加入了他的行列,他也是现任SYCL委员会主席。他们一起对所谓的“SYCL房间里的大象”做出了回应。
The case for C++ programming, with SYCL bringing in full heterogeneous support, has been well articulated by persons close to the SYCL specification including a recent article “Considering a Heterogeneous Future for C++” and numerous other resources enumerated on sycl.tech. SYCL is a Khronos standard that introduces support for fully heterogeneous data parallelism to C++. While SYCL is not a cure-all, it is a solution to one aspect of a larger problem: How do we enable adequately enable full heterogeneous programming given the emerging explosion in hardware diversity?
熟悉 SYCL 规范的人已经很好地阐明了 SYCL 带来了全面异构支持的 C++ 编程案例,包括最近的一篇文章“考虑 C++ 的异构未来”以及 sycl.tech 上列举的许多其他资源。 SYCL 是一种 Khronos 标准,引入了对 C++ 的完全异构数据并行性的支持。 虽然 SYCL 并不是包治百病的灵丹妙药,但它是一个方面的解决方案:鉴于硬件多样性的爆炸式增长,我们如何充分启用完全异构编程?
In this article, we offer our perspective on key questions about SYCL, based on our perspectives of being having worked in this domain for decades. These important questions are asked by software developers looking to understand if SYCL matters to them. Let’s face it: at some point, every major project has Elephants in the Room.[1] Successful projects address their elephants openly.
在本文中,我们基于我们在该领域工作了数十年的观点,提出了对 SYCL 关键问题的看法。 这些重要问题是由希望了解 SYCL 对他们是否重要的软件开发人员提出的。 让我们面对现实吧:在某些时候,每个重大项目都会有“房间里的大象”。[1] 成功的项目公开地解决了他们的问题。
Elephant 1: Aren’t GPUs enough? Do other accelerators really matter?
大象一:GPU 还不够吗? 其他加速器真的重要吗?
Valid questions exist about which accelerators will stay, and which will be a passing fad. For decades, different accelerators have come and gone while CPUs persist. Today, GPUs are present in the vast majority of computer systems. Writing our applications to leverage GPUs makes a lot of sense given their near ubiquity.
关于哪些加速器将继续存在、哪些将成为昙花一现,存在一些合理的问题。 几十年来,不同的加速器来了又去,而 CPU 却一直存在。 如今,GPU 出现在绝大多数计算机系统中。 鉴于 GPU 几乎无处不在,编写应用程序来利用 GPU 非常有意义。
As a result, one of the first elephant questions is whether we really need to generalize, i.e., do we need to be multiarchitecture and multivendor?
因此,首要问题之一是我们是否真的需要泛化,即我们是否需要多架构和多供应商?
The expectation that the future will require “dedicated or semi-dedicated hardware accelerators” as a must-have feature for computing in this decade is expected by experts including researchers led by Prof. Masaaki Kondo in “White Paper on Next-Generation Advanced Computing Infrastructure” and by Hennessy & Patterson in their paper “A New Golden Age for Computer Architecture”.
以近藤正明教授为首的研究人员等专家在《下一代高级计算基础设施白皮书》中预计,未来将需要“专用或半专用硬件加速器”作为这十年计算的必备功能。 ”以及 Hennessy 和 Patterson 在他们的论文“计算机架构的新黄金时代”中。
As long as we are talking about dedicated accelerators, why stop at GPUs? Optimizing for different types of accelerators is a great objective, but we don’t want to write different code for different types of accelerators. We believe that the industry will benefit from a standardized language, that everyone can contribute to, collaborate on, is not locked into a particular vendor, and can evolve organically based on its members and public requirements.
既然我们谈论的是专用加速器,为什么只停留在 GPU 上呢? 针对不同类型的加速器进行优化是一个伟大的目标,但我们不想为不同类型的加速器编写不同的代码。 我们相信,该行业将受益于标准化语言,每个人都可以做出贡献、进行协作,不会被锁定到特定的供应商,并且可以根据其成员和公众要求有机发展。
SYCL takes an interesting approach that allows us to use common code when we want and specialize when we want. In this way, SYCL embraces accelerators in general, leaving it to us, the developers, to decide when to write common cross-architecture code, and when we feel it is sufficiently advantageous to specialize code.
SYCL 采用了一种有趣的方法,允许我们在需要时使用通用代码,并在需要时进行专业化。 通过这种方式,SYCL 总体上拥抱加速器,让我们开发人员来决定何时编写通用的跨架构代码,以及何时我们认为专门化代码有足够的优势。
Its underlying programming model, SPMD, has shown to be usable across many architectures. SPMD is how most programmers using Nvidia CUDA/OpenCL/SYCL think: writing code from the perspective of operating on one work item and expecting it to run concurrently on most hardware such that multiple work-items fill vector hardware lanes.
其底层编程模型 SPMD 已被证明可在多种体系结构中使用。 SPMD 是大多数使用 Nvidia CUDA/OpenCL/SYCL 的程序员的想法:从操作一个工作项的角度编写代码,并期望它在大多数硬件上同时运行,以便多个工作项填充矢量硬件通道。
SYCL offers a large degree of portability across vendors (e.g., many different sources of GPUs) as well as architecture (e.g., GPUs, FPGAs, ASICs).
SYCL 提供了跨供应商(例如,许多不同的 GPU 来源)以及架构(例如,GPU、FPGA、ASIC)的高度可移植性。
Elephant 2: Why not just use Nvidia CUDA?
大象2:为什么不直接使用Nvidia CUDA?
A vibrant GPU eco-system is emerging thanks to competition from multiple GPU vendors. This is part of a trend for more and more competition for accelerators in general. The installed base of CUDA applications that make use of Nvidia GPUs are poised to be able to adapt over time to an open, multivendor, multiarchitecture software approach created to serve all vendors, not just Nvidia.
由于多个 GPU 供应商的竞争,一个充满活力的 GPU 生态系统正在兴起。 这是加速器竞争越来越激烈的趋势的一部分。 使用 Nvidia GPU 的 CUDA 应用程序的安装基础将能够随着时间的推移适应开放的、多供应商、多架构的软件方法,该方法旨在为所有供应商(而不仅仅是 Nvidia)提供服务。
While CUDA has earned a strong following given its value proposition and the strength of Nvidia GPUs in the ecosystem, there are increasing concerns regarding the lock-in that use of CUDA creates. Such concerns stem from the proprietary focus highlighted by these factors:
虽然 CUDA 因其价值主张和 Nvidia GPU 在生态系统中的实力而赢得了众多追随者,但人们越来越担心 CUDA 的使用造成的锁定。 这些担忧源于以下因素所强调的专有关注:
The definition of CUDA, its implementation and evolution, is managed by Nvidia and evolves specifically to serve Nvidia GPU product designs. Details of new features in CUDA, are generally shielded from public view until NVIDIA has both hardware and software to support them. As discussed more fully below, this control stifles innovations from other vendors.
The licensing for CUDA tools and libraries, from Nvidia, specifically states they must be used to “develop applications only for use in systems with Nvidia GPUs.” Even “open source” from Nvidia includes licensing languagerestricting key parts in the same manner.
1. CUDA 的定义、其实现和发展由 Nvidia 管理,并专门为服务 Nvidia GPU 产品设计而发展。 CUDA 中新功能的详细信息通常不会公开,直到 NVIDIA 拥有支持它们的硬件和软件为止。 正如下面更全面讨论的,这种控制抑制了其他供应商的创新。
2. Nvidia 的 CUDA 工具和库的许可特别指出,它们必须用于“开发仅在具有 Nvidia GPU 的系统中使用的应用程序”。 即使是 Nvidia 的“开源”也包含以同样方式限制关键部分的许可语言。
Nvidia CUDA can claim credit for bringing accelerated computing to the masses using Nvidia GPUs.With the explosion of competition in the accelerator market, it could appear that CUDA has become a walled garden in an increasingly open and transparent world.The desire for an open, multivendor, multiarchitecture alternative to CUDA is not going away.
Nvidia CUDA 因使用 Nvidia GPU 为大众带来加速计算而享有盛誉。随着加速器市场竞争的爆发,CUDA 似乎已经成为一个日益开放和透明的世界中的围墙花园。对 CUDA 的开放、多供应商、多架构替代方案的渴望不会消失。
Elephant 3: Why not just use AMD HIP?
大象 3:为什么不直接使用 AMD HIP?
AMD Heterogeneous-Computing Interface for Portability (HIP) is a C++ dialect. AMD tools include a “HIPify tool” to help transform CUDA code into HIP. AMD states that “HIP code can run on AMD hardware (through the HCC compiler) or Nvidia hardware (through the NVCC compiler) with no performance loss compared with the original CUDA code.”
AMD 异构计算可移植接口 (HIP) 是一种 C++ 方言。 AMD 工具包括“HIPify 工具”,可帮助将 CUDA 代码转换为 HIP。 AMD 表示,“HIP 代码可以在 AMD 硬件(通过 HCC 编译器)或 Nvidia 硬件(通过 NVCC 编译器)上运行,与原始 CUDA 代码相比,不会有任何性能损失。”
HIP is a “follow CUDA” strategy – i.e., where AMD develops an update to HIP as quickly as possible after Nvidia has released an update to its CUDA platform. The arguments in favor of HIP rest on the virtue of reuse of a large CUDA codebase for AMD GPUs. Unfortunately, given the opaqueness of CUDA no one can follow CUDA too closely, timely, or accurately. This offers no opportunity for AMD to expose unique AMD hardware innovation without forcing CUDA developers to change their code with #ifdefs for AMD GPUs.
HIP 是一种“跟随 CUDA”策略,即在 Nvidia 发布其 CUDA 平台更新后,AMD 尽快开发 HIP 更新。 支持 HIP 的论点是基于 AMD GPU 重用大型 CUDA 代码库的优点。 不幸的是,鉴于 CUDA 的不透明性,没有人能够太密切、及时或准确地跟踪 CUDA。 如果不迫使 CUDA 开发人员使用 AMD GPU 的 #ifdefs 更改代码,AMD 就没有机会展示独特的 AMD 硬件创新。
While AMD has created value with HIP for those that seek AMD GPUs as an alternative to Nvidia GPUs, it is not hard to want more. Imagine having a solution that can keep pace with the feature innovation and performance of CUDA!
We believe that innovation will flourish the most in an open field rather than in the shadows of a walled garden.
[Editor’s note: There is a SYCL implementation called hipSYCL that sits on top of HIP and targets AMD GPUs running ROCm and Nvidia GPUs.]
虽然 AMD 通过 HIP 为那些寻求 AMD GPU 作为 Nvidia GPU 替代品的人创造了价值,但想要更多并不难。 想象一下,拥有一个能够与 CUDA 的功能创新和性能保持同步的解决方案!我们相信,创新将在开放的领域而不是在围墙花园的阴影中蓬勃发展。
[编者注:有一个名为 hipSYCL 的 SYCL 实现,它位于 HIP 之上,并针对运行 ROCm 和 Nvidia GPU 的 AMD GPU。]
Elephant 4: Why not just use OpenCL?
大象4:为什么不直接使用OpenCL?
OpenCL provides an open multivendor alternative, but at a lower layer of the software stack than SYCL or CUDA offers. SYCL grew out of a desire to bring the benefits of OpenCL’s open, multivendor, multiarchitecture approach by providing a standard C++ interface for heterogenous parallel architectures. SYCL implementations often utilize OpenCL for their implementations, but also have the flexibility to use other backends under the hood as of SYCL2020. SYCL delivers on the promise of OpenCL, in a higher productivity form through its C++ abstractions.
OpenCL 提供了一种开放的多供应商替代方案,但其软件堆栈层低于 SYCL 或 CUDA 提供的软件堆栈层。 SYCL 的诞生是为了通过为异构并行架构提供标准 C++ 接口来发挥 OpenCL 开放、多供应商、多架构方法的优势。 SYCL 实现通常使用 OpenCL 进行实现,但从 SYCL2020 开始,也可以灵活地在后台使用其他后端。 SYCL 通过其 C++ 抽象以更高的生产力形式兑现了 OpenCL 的承诺。
Elephant 5: Can’t we just use C++ ?
大象5:我们不能只使用C++吗?
Let’s start with the assumption that we want to program heterogeneous machines, we value portability, and we do not want to pay a penalty in performance for portability.
让我们首先假设我们想要对异构机器进行编程,我们重视可移植性,并且我们不想为可移植性付出性能上的代价。
We might answer ”yes” – C++ is enough when you have SYCL support too. After all, C++ was built to be extended by template libraries like SYCL. SYCL adds no new keywords, but it does benefit from SYCL-aware C++ compilers to help with cross-compilation, fat binaries, and remote memories. Those are simply things C++ compilers have not historically made easy.
我们可能会回答“是”——当您也有 SYCL 支持时,C++ 就足够了。 毕竟,C++ 的构建是为了通过 SYCL 等模板库进行扩展。 SYCL 没有添加新的关键字,但它确实受益于 SYCL 感知的 C++ 编译器来帮助交叉编译、胖二进制文件和远程内存。 这些都是 C++ 编译器历史上并不容易做到的事情。
SYCL also offers a solution today, within standard C++, to address programming for full heterogeneous computing built on top of ISO C++. This includes device enumeration (info), defining work (kernels), submitting and coordinating work across devices (queue), and managing remote memories.
如今,SYCL 还在标准 C++ 中提供了一种解决方案,用于解决构建在 ISO C++ 之上的完全异构计算的编程问题。 这包括设备枚举(信息)、定义工作(内核)、跨设备提交和协调工作(队列)以及管理远程内存。
That brings us to “No” – the C++ standard does not define support for heterogeneous systems with disjoint (non-coherent) memories. Some think it will add that one day, and there is effort to go in that direction, but even those involved believe the current direction will take at least 10 years and it is limited by the need for C++ to maintain backwards compatibility with millions of lines of existing code. In fact, one of us (MW) has written papers urging C++ in that direction. The response from WG21 (ISO C++), understandably because of the backward compatibility concerns, has been to start with parallel algorithms and executors, and add forward progress guarantees instead of making radical surgical change to the memory and addressing model. Therefore, if you are programming heterogeneous machines it is not likely to be enough to claim “C++ is enough.” There are some trying to move in that direction and that is the beauty of a competitive industry, we can see what will work out in the best interest of the market and consumers. However, today what will work immediately is “C++ plus SYCL” or “C++ plus CUDA” or “C++ plus OpenCL.”
这让我们得出“不”的结论——C++ 标准没有定义对具有不相交(非连贯)内存的异构系统的支持。 有些人认为有一天会添加这一点,并且正在朝着这个方向努力,但即使是那些参与其中的人也认为当前的方向至少需要 10 年,并且它受到 C++ 需要保持与数百万行向后兼容性的限制。 现有代码。 事实上,我们中的一位 (MW) 已经撰写了论文,敦促 C++ 朝这个方向发展。 出于向后兼容性的考虑,WG21 (ISO C++) 的反应是从并行算法和执行器开始,并添加向前的进度保证,而不是对内存和寻址模型进行根本性的外科手术改变。 因此,如果您正在对异构机器进行编程,那么声称“C++ 就足够了”可能还不够。 有些人试图朝这个方向前进,这就是竞争行业的美妙之处,我们可以看到什么将最符合市场和消费者的利益。 然而,今天立即起作用的是“C++ 加 SYCL”或“C++ 加 CUDA”或“C++ 加 OpenCL”。
The purpose of adding SYCL support into our C++ compiler and runtimes, is to add capabilities so C++ supports full heterogeneous support that it does not offer today without SYCL. It is also a way to show how C++ can support heterogeneity in the future, as ISO standards tend to standardize best practices of pre-existing knowledge. We will show one such example below.
将 SYCL 支持添加到我们的 C++ 编译器和运行时中的目的是添加功能,以便 C++ 支持完整的异构支持,而如果没有 SYCL,C++ 目前无法提供这种支持。 这也是展示 C++ 如何支持未来异构性的一种方式,因为 ISO 标准倾向于标准化现有知识的最佳实践。 下面我们将展示一个这样的例子。
Elephant 6: Can SYCL queues can make it into ISO C++?
大象6:SYCL队列可以进入ISO C++吗?
Queues are how SYCL assigns work to heterogeneous devices, including handing off data within complex memory systems (not necessarily unified and coherent).
队列是 SYCL 将工作分配给异构设备的方式,包括在复杂的内存系统(不一定是统一和一致的)内传递数据。
It is easy to speculate on whether a queue class belongs in C++ long-term, but such speculation is premature.
从长远来看,很容易推测一个队列类是否属于C++,但这种推测还为时过早。
Proposals for C++23 have included various constructs to direct execution to specific devices, including “std::execution” in p2300. We know C++23 will continue to rely on a unified global memory address space and will not support disjoint remote memories (complex memory systems).
C++23 的提案包括各种直接执行到特定设备的结构,包括 p2300 中的“std::execution”。 我们知道C++23将继续依赖统一的全局内存地址空间,并且不会支持不相交的远程内存(复杂的内存系统)。
It is easy to get caught up on syntax. Eventually, if C++ expands to include full heterogeneous support, the concepts embodied in SYCL queue will be needed. Until then, SYCL fills this void. Some important capabilities, such as parallel directives, and message passing, have remained independent standards (OpenMP and MPI). While it is possible C++ will not grow to include full heterogeneous support, we believe C++ will eventually add such support incrementally.
很容易陷入语法困境。 最终,如果 C++ 扩展到包括完整的异构支持,则将需要 SYCL 队列中体现的概念。 在此之前,SYCL 填补了这一空白。 一些重要的功能,例如并行指令和消息传递,仍然是独立的标准(OpenMP 和 MPI)。 虽然 C++ 可能不会发展到包含完整的异构支持,但我们相信 C++ 最终将逐步添加此类支持。
C++ aims to standardize established best practice instead of inventing new and unproven features, therefore SYCL is an important steppingstone as one of the many feeders of ‘established best practice’ into the intentionally slower moving C++ standardization process.
C++ 的目标是标准化既定的最佳实践,而不是发明新的和未经验证的功能,因此 SYCL 是一个重要的踏脚石,作为“既定的最佳实践”进入故意缓慢发展的 C++ 标准化过程的众多馈送者之一。
As C++23 settles, and C++26 is considered, the future of C++ for heterogeneous computing will begin to take shape, including syntax but likely a full solution will not emerge for another 5-10 years.
随着 C++23 的稳定和 C++26 的考虑,C++ 异构计算的未来将开始成形,包括语法,但完整的解决方案可能在未来 5-10 年内不会出现。
SYCL offers a solution today, within standard C++, to address programming for full heterogeneous computing. This includes device enumeration (info), defining work (kernels), submitting work to devices (queue), and managing remote memories.
SYCL 如今在标准 C++ 中提供了一种解决方案,用于解决完全异构计算的编程问题。 这包括设备枚举(信息)、定义工作(内核)、向设备提交工作(队列)以及管理远程内存。
Elephant 7: Who is behind SYCL? Is it really Open in the true sense of the word?
大象7:谁是SYCL的幕后推手? 它真的是真正意义上的开放吗?
We believe that open, international standards and Open Source Software (OSS) projects are good for everyone. When individuals from Intel and Codeplay get involved, we have found that they work hard to help develop and promote such standards and OSS – from WiFi, USB, PCIe to OpenMP, MPI, Fortran, C, C++, OpenCL, and SYCL.
我们相信开放的国际标准和开源软件 (OSS) 项目对每个人都有好处。 当英特尔和 Codeplay 的个人参与其中时,我们发现他们努力帮助开发和推广此类标准和 OSS——从 WiFi、USB、PCIe 到 OpenMP、MPI、Fortran、C、C++、OpenCL 和 SYCL。
Apple was the original force behind OpenCL, which began as a set of C interfaces at a fairly low level. SYCL originally grew out of efforts within OpenCL to consider higher level interfaces, specifically using C++. After multiple years of very open debates, SYCL was born. Codeplay has been instrumental in SYCL from the very beginning. Intel’s interest in SYCL grew after entering both the FPGA market and announcing the Intel Xe architecture to include GPUs for compute. Intel is proud to be an active member in the SYCL committee, and an active contributor to implementations to support SYCL. SYCL is a community effort, and the homes of both authors of this article (Intel and Codeplay) are enthusiastic participants along with many others.
Apple 是 OpenCL 背后的原始力量,它最初是一组相当低级别的 C 接口。 SYCL 最初源于 OpenCL 内部考虑更高级别接口(特别是使用 C++)的努力。 经过多年的公开辩论,SYCL 诞生了。 Codeplay 从一开始就在 SYCL 中发挥了重要作用。 在进入 FPGA 市场并宣布英特尔 Xe 架构包含用于计算的 GPU 后,英特尔对 SYCL 的兴趣与日俱增。 英特尔很自豪能够成为 SYCL 委员会的积极成员,并为支持 SYCL 的实施做出积极贡献。 SYCL 是一项社区努力,本文的两位作者(Intel 和 Codeplay)以及许多其他人都是热情的参与者。
Elephant 8: I see a herd of elephants – why should I believe in SYCL?
大象8:我看到一群大象——我为什么要相信SYCL?
If you have not yet needed to program an application for multiple heterogeneous machines, you may not yet feel the pain to really understand why we are so excited about the prospects for SYCL. Questioning the need is quite logical.
如果您还不需要为多个异构机器编写应用程序,那么您可能还没有真正理解为什么我们对 SYCL 的前景如此兴奋。 质疑这种需求是非常合乎逻辑的。
There are many use cases for heterogeneous programming. In our CPPCON 2021 tutorial, we taught programmers from large companies, small companies, and national labs, how to offload high throughput workloads to specialized accelerators.
异构编程有很多用例。 在我们的 CPPCON 2021 教程中,我们向来自大公司、小公司和国家实验室的程序员教授如何将高吞吐量工作负载卸载到专用加速器。
Based on many experiences like that, we have every reason to be confident that interest in SYCL will continue to grow at a rapid pace because of the need for C++ programming for heterogeneous platforms.
基于许多类似的经验,我们有充分的理由相信,由于异构平台对 C++ 编程的需求,对 SYCL 的兴趣将继续快速增长。
If you believe in the power of diversity of hardware and want to harness the impending explosion in architectural diversity, then SYCL is worth a look. Not only it open, multivendor, multiarchitecture play – but it is the key one for C++ programmers (as detailed in “Considering a Heterogeneous Future for C++”).
如果您相信硬件多样性的力量并希望利用即将到来的架构多样性爆炸,那么 SYCL 值得一看。 它不仅是开放的、多供应商、多架构的游戏,而且是 C++ 程序员的关键(详见“考虑 C++ 的异构未来”)。
Open, Industry Standards are Critical to Enable High-Volume Markets
开放的行业标准对于实现大容量市场至关重要
New technology often starts as proprietary developments, which may be sufficient to enable niche applications and markets. But, as these niche applications grow into technology ecosystems, so does the need for competition and industry standardization to enable widespread adoption. Accelerated computing, for many years only a niche capability, has certainly emerged with the status of “here to stay.” Multiple factors contributed to this, and they are not all going away (power wall, IPC wall, memory wall).
新技术通常始于专有开发,这可能足以实现利基应用和市场。 但是,随着这些利基应用程序成长为技术生态系统,竞争和行业标准化的需求也随之增加,以实现广泛采用。 多年来,加速计算一直只是一种小众功能,但无疑已经以“长期存在”的状态出现。 造成这种情况的因素有很多,而且它们并不会全部消失(电源墙、IPC 墙、内存墙)。
SYCL and related efforts, like oneAPI, were introduced to bring open, industry standards to the historically proprietary universe of accelerated computing.
SYCL 和相关工作(例如 oneAPI)的推出是为了将开放的行业标准带入历史上专有的加速计算领域。
The biggest question is: how many influencers are eager to promote a move to standards, vs. how many are locked up by proprietary interests?
最大的问题是:有多少影响者渴望推动标准的发展,而有多少人被专有利益所束缚?
As the Cambrian explosion of novel computer architectures unfolds, the case for open, multivendor, multiarchitecture standards only grow stronger.
随着新型计算机架构的大爆炸的展开,开放、多供应商、多架构标准的需求只会变得更加强烈。
SYCL is an open standard that invites feedback and contributions from everyone to the standard and the open source projects to implement it. The shared goal by everyone involved is to unambiguously ensure paths to high performance for all accelerators in this exciting new golden age for computer architecture.
SYCL 是一个开放标准,邀请每个人对该标准以及实施该标准的开源项目提供反馈和贡献。 所有参与者的共同目标是明确确保所有加速器在这个令人兴奋的计算机架构新黄金时代实现高性能。
About the Authors
James Reinders believes the full benefits of the evolution to full heterogeneous computing will be best realized with an open, multivendor, multiarchitecture approach. Reinders rejoined Intel a year ago, specifically because he believes Intel can meaningfully help realize this open future. Reinders is an author (or co-author and/or editor) of ten technical books related to parallel programming; his latest book is about SYCL (it can be freely downloaded here).
Michael Wong is the Distinguished Engineer at Codeplay Software. He is a current Director and VP of ISOCPP Foundation, and a senior member of the C++ Standards Committee with more than 25 years of experience. He is a member of the C++ Directions Group. He chairs the WG21 SG19 Machine Learning and SG14 Games Development/Low Latency/Financials C++ groups and is the co-author of a number C++/OpenMP/Transactional memory features including generalized attributes, user-defined literals, inheriting constructors, weakly ordered memory models, and explicit conversion operators. He has published numerous research papers and is the author of a book on C++11. He has been an invited speaker and keynote at numerous conferences. He is currently the editor of SG1 Concurrency TS and SG5 Transactional Memory TS. He is also the Chair of the SYCL standard and all Programming Languages for Standards Council of Canada. Previously, he was CEO of OpenMP involved with taking OpenMP toward Accelerator support and the Technical Strategy Architect responsible for moving IBM’s compilers to Clang/LLVM after leading IBM’s XL C++ compiler team.
[1] Elephants in the Room can be defined as important questions that are obvious, but no one mentions them because they make at least some persons uncomfortable.
你都看到这里了,不如我们唠叨几句吧!
从(国内)芯片公司的角度,不想&也不愿去考虑用户可能需要面对多个异构机器编写应用程序。但这是市场需要的,这种革命性的想法,只会来自于第三方。
我知道Codeplay 今年被intel全资收购了。但国内有这样的公司生存的土壤吗?像澎峰科技、一流科技这样的从事基础软件研发的公司,是近年中国少有的火苗,如果他们都不能生存,中国的计算产业有能有什么希望?也希望投资者别去扭曲这种小而美的软件企业,去帮助他们,大家一起获得成功。