TPU v5p for Training
v5e for cost effective
v5p for powerful
https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer
v5e for cost effective
v5p for powerful
https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer
Google Cloud Blog
Introducing Cloud TPU v5p and AI Hypercomputer | Google Cloud Blog
The new TPU v5p is a core element of AI Hypercomputer, which is tuned, managed, and orchestrated specifically for gen AI training and serving.
FP6 quantization for LLMs
https://www.linkedin.com/feed/update/urn:li:activity:7141243626730176512/
https://arxiv.org/pdf/2312.08583.pdf
https://www.linkedin.com/feed/update/urn:li:activity:7141243626730176512/
https://arxiv.org/pdf/2312.08583.pdf
Linkedin
Leon Song on LinkedIn: 2312.08583.pdf
Very proud that the team has just pushed out the algorithmic investigation into FP6 and its new quantization strategy for LLMs that tackles the quality challenges from the INT4 solutions. Very soon, my team will release an ultra fast FP6 kernel design on…
Soft GPGPU for FPGA
https://arxiv.org/abs/2401.04261
https://arxiv.org/abs/2401.04261
arXiv.org
A Statically and Dynamically Scalable Soft GPGPU
Current soft processor architectures for FPGAs do not utilize the potential of the massive parallelism available. FPGAs now support many thousands of embedded floating point operators, and have...
The main FPGA News today is the new name of Intel PSG
https://youtu.be/Mb34D4f5tc8?si=NTNt5hZxYdY7JaZe
https://youtu.be/Mb34D4f5tc8?si=NTNt5hZxYdY7JaZe
YouTube
We are Altera. We are for the innovators.
Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation.…
Collective Communications Library for FPGA?
https://arxiv.org/abs/2312.11742
https://arxiv.org/abs/2312.11742
arXiv.org
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
FPGAs are increasingly prevalent in cloud deployments, serving as Smart NICs or network-attached accelerators. Despite their potential, developing distributed FPGA-accelerated applications remains...
Next gen facebook accelerator (MTIA v2)
https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/
https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/
Meta
Our next generation Meta Training and Inference Accelerator
We are sharing details of our next generation chip in our Meta Training and Inference Accelerator (MTIA) family. MTIA is a long-term bet to provide the most efficient architecture for Meta’s unique workloads.
FPGA Startup offers LLM performance better than Nvidia A100
https://hc2023.hotchips.org/assets/program/posters/HC2023.hyperaccel.ai.Moon.Poster.pdf
https://hc2023.hotchips.org/assets/program/posters/HC2023.hyperaccel.ai.Moon.Poster.pdf
for #math researchers
Numerical behavior of NVIDIA tensor cores @ PubMed:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959640/
source thread: https://twitter.com/rzidane360/status/1786958225419706683
Numerical behavior of NVIDIA tensor cores @ PubMed:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959640/
source thread: https://twitter.com/rzidane360/status/1786958225419706683
PubMed Central (PMC)
Numerical behavior of NVIDIA tensor cores
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and ...
Google TPUv6 Trillium
https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus/
https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus/
Google Cloud Blog
Introducing Trillium, sixth-generation TPUs | Google Cloud Blog
The new sixth-generation Trillium Tensor Processing Unit (TPU) makes it possible to train and serve the next generation of AI foundation models.
Education of Chip Designers at a Large Scale: A Proposal
https://ieeexplore.ieee.org/document/10584365
https://ieeexplore.ieee.org/document/10584365
Exploring logic synthesis with Yosys
https://www.linkedin.com/posts/ashwinrajesh_a-guide-to-logic-synthesis-using-yosys-ugcPost-7221574339165396993-6sN5
56-pages doc: https://drive.google.com/file/d/13ER2Jb7fj6pUIeCzoba837SHPWG-xX-Y/view
https://www.linkedin.com/posts/ashwinrajesh_a-guide-to-logic-synthesis-using-yosys-ugcPost-7221574339165396993-6sN5
56-pages doc: https://drive.google.com/file/d/13ER2Jb7fj6pUIeCzoba837SHPWG-xX-Y/view
Linkedin
As a digital design student, have you ever wondered how the RTL code we write magically gets transformed into circuits? | Ashwin…
As a digital design student, have you ever wondered how the RTL code we write magically gets transformed into circuits?
Have you ever thought how those always blocks were synthesized into gates and LUTs, or even special blocks like BRAM and DSP blocks?
…
Have you ever thought how those always blocks were synthesized into gates and LUTs, or even special blocks like BRAM and DSP blocks?
…