High Level Synthesis

2023년 10월 4일 수요일

고위합성 자동화 도구의 동향(Automatic High-Level Code Deployment)

고위합성 자동화 도구의 동향

[역자주] 반도체 설계 분야의 최신 설계기법에 관한 동향보고서(survey)다. 영어공부도 편식하면 않되겠기에 이정도 글은 상식으로 읽어볼만 하다. 총 30여쪽인데 본문은 다소 전문적인 내용이 담겨 있으니 요약과 서론 부분을 발췌하여 읽기로 한다. 간간이 나오는 전문용어에는 나름 주석을 붙여 보겠다.

[원문출처] IEEE Access, Vol.8,2020, https://ieeexplore.ieee.org/abstract/document/9195872

---------------------------------------------------------------------------

Towards Automatic High-Level Code Deployment on Reconfigurable Platforms: A Survey of High-Level Synthesis Tools and Toolchains

제목부터 길고 난해하다. 동원된 단어들은 기본수준 이지만 생략되고 축약되어서 이 논문이 어느 잡지에 실렸는지 모른다면 도데체 무슨 내용인지 감 잡기 어렵다. 구글의 번역의 도움을 받아보니 이렇게 나왔다.

"재구성 가능한 플랫폼의 자동 고급 코드 배포를 향하여: 고급 합성 도구 및 도구 체인에 대한 조사"

번역된 내용은 나중에 따지기로 하고 몇가지 외국어들이 눈에 띈다. '플랫폼', '코드', '체인'은 우리말에 동화된 외래어가 되어 버렸다는 뜻일까? 그렇다면 우리는 이 외래어를 적절하게 사용할 수 있어야 할 것이다. 이 논문이 전기전자공학기술자들의 전문지 IEEE Explore에 실렸다는 점을 감안하면,

'platform'은 각종 논리회로의 요소들을 미리 배치해 놓은 FPGA(Field Programable Gate Array),
'code'는 계산법(algoritjm)을 높은 추상화 수준의 컴퓨팅 언어로 기술한 원시구문(source code)을 뜻한다. 이렇게 옮겨 봤다.

"높은 추상화 수준에서 작성된 설계구문을 재구성 가능한 반도체 토대 위에 구현하는 자동화된 방법에 대하여: 고위 합성과 그와 동반하는 일련의 도구들의 동향보고"

MOSTAFA W. NUMAN[1], BRADEN J. PHILLIPS[2], (Member, IEEE), GAVIN S. PUDDY[1], (Associate Member, IEEE), AND KATRINA FALKNER[1]
Corresponding author(교신저자): Mostafa W. Numan (mostafa.numan@adelaide.edu.au)

[1] School of Computer Science, The University of Adelaide, Adelaide, SA 5005, Australia / 호주 아델레이드 대학교, 컴퓨터과학과
[2] School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA 5005, Australia / 아델레이드 대학교 전기전자 공학과

This work was supported by the Maritime Division of the Defence Science and Technology Group, Australia.

이 논문은 국방과학기술단 해양과의 지원을 받아 수행되었음.

---------------------------------------------------------------------

ABSTRACT

요약

Heterogeneous computing systems with tightly coupled processors and reconfigurable logic blocks provide great scope to improve software performance by executing each section of code on the processor or custom hardware accelerator that best matches its requirements and the system optimisation goals.

범용 연산기와 재구성 가능한 논리회로부가 밀접히 결합된 이종 연산기 체계는 구문을 범용 계산기와 전용 계산회로로 구성된 가속기에서 나눠 실행시키고 체계적으로 최적화함으로써 높은 실행 성능을 발휘한다.

heterogeneous computing system: 이종 연산기 체계

processors: 범용계산기 CPU
reconfigurable logic blocks: 전용계산기 FPGA

'tightly coupled' 범용계산기와 재구성 가능 전용 계산기의 밀접한 결합을 강조하고 있다.
Reconfigurable 은 Programmable 과 같은 의미
'improve software performance' 는 HLS의 목적

This article is motivated by the idea of a software tool that can automatically accomplish the task of deploying code, originally written for a conventional computer, to the processors and reconfigurable logic blocks in a heterogeneous system.

원래 전통적인 전자계산기에 수행되도록 작성된 구문을 범용 계산기와 재구성 가능한 논리계산기로 구성된 이종 계산기 체계에 자동으로 구현하는 소프트웨어 도구들이 등장하였기에 이 논문을 작성하게 됐다.

We undertake an extensive survey of high-level synthesis tools to determine how close we are to this vision, and to identify any capability gaps.

우리는 고위합성도구가 이 기대에 얼마나 가까워 졌는지 그리고 그 간격이 어느 정도인지 알기 위해 심도있게 조사해봤다.

high-level synthesis(HLS): 고위합성 (높은 추상화 수준)
gaps: 계산법을 기술하는 방법 사이에서 추상성(abstraction level)의 차이.

범용 컴퓨팅 언어는 추상성 수준이 높다.
전용 하드웨어는 RTL(Register-Transfer Level)로 추상성이 낮다.

RTL(Register-Transfer Level): 클럭(number of clocks)과 비트-폭(bit-width)의 상세

The survey is structured according to a new framework that clearly expresses the relationships between the many tools surveyed.

이 조사는 다양한 도구들을 살펴보고 그들 사이의 관계를 분명히 하는 시각에서 살펴봤다.

new framework: 도구의 우열이나 장단점을 찾기보다 각 도구마다 가진 특징을 알아보고 그 도구들 사이의 관계를 정리해봤다.

We find that none of the existing tools can deploy general high-level code without manual intervention.

우리는 이번 조사에서 손보지 않고도 고위 구문을 낮은 RTL의 하드웨어로 변환 할 수 있는 도구는 없음을 알았다.

현재 성숙된 단계에 이른 HDL(Hardware Description Language)에도 합성 가능한 구문 형식이 있다(synthesizable subsets). 하물며 이보다 높은 추상화 수준의 C++는 어떠랴.

Logic synthesis from arbitrary high-level code remains an open problem with dynamic data structures, function pointers and recursion all presenting challenges.

매우 높은 추상화 수준의 구문으로부터 논리식 합성이 가지고 있는 과제로는 동적 자료처리, 주소변수 함수, 재귀호출 등은 여전히 해결되지 않고 있다.

dynamic data structures: 동적 자료구조. 메모리 할당(memory allocation), 링크 리스트(linked-list), 스택 포인터(stack pointer) 등.
function pointers: 함수의 주소를 포인터 변수에 대입하여 호출하는 기법
recursion: 함수 내에서 자신을 호출. 프랙탈(Fractals) 알고리즘

Other challenges include automating the tasks of code partitioning, optimisation and design space exploration.

그외 어려운 과제로는 자동화된 구문 분할, 최적화, 설계 구조 평가 등이 있다.

C++ 컴파일러의 다양한 기계어 코드 생성 옵션을 상기해 보자.

코드 크기/실행속도 최적화
인-라인 코드 인-라인
호출 방식
자료형 변환

-------------------------------------------------------------------------

요약만 보더라도 굉장히 많은 토론꺼리를 담고 있다. 분량이 다소 많긴 하지만 최신 반도체 설계 기법에 관심이 있다면 전체 논문을 읽어보길 권한다.

SECTION I.Introduction

For the last four decades, Moore's law and Dennard scaling have relentlessly delivered improvements in computing performance [1]. Since the early 2000s their impact has begun to wane and alternative ways to improve performance have begun to emerge.

Heterogeneous computing is a promising approach in which a group of processing nodes execute a workload in parallel. Given different kinds of nodes including multi-core CPUs, real-time processors, DSPs, GPUs, and accelerators on FPGAs or ASICs, the computing workload can be partitioned such that each part is executed on a processor that is well-matched to its requirements and the performance optimisation goals.

This article is concerned with the engineering task of writing software for a heterogeneous system and considers how close existing tools and technologies are to a fully automatic system in which high-level source code is partitioned and deployed to heterogeneous nodes with a minimum of human intervention. This is an ambitious scope so we constrain ourselves, in this article, to the task of deploying source code blocks onto custom FPGA logic.

It is possible, of course, to write software specifically for a particular heterogeneous system by manually partitioning tasks among the processors, and using the most appropriate programming language for each of the different processors. For example a Hardware Description Language (HDL) such as Verilog could be used for tasks executing on an FPGA, and CUDA for those on a GPU. An alternative, which has seen a great deal of research activity in recent years, is to use High-Level Synthesis (HLS) for generating hardware modules from code written in a High-Level Language (HLL) (such as C, C++ or Python).

There are benefits of using HLS instead of HDL so that the entire application is in a high-level language: simulation speed is generally faster; debugging is less difficult; it is easier to explore and evaluate design alternatives; and the high-level language may include features that cannot be easily expressed in a HDL [2].

Although current HLS tools do not always produce performance-optimised implementations, applications without stringent performance requirements can be more quickly and easily developed using HLS. HLS software developers do NOT necessarily need to be FPGA or HDL experts, and optimisation opportunities can be exposed to the designer that cannot be easily explored via HDL approaches. In some cases, a project that would not have been practical in HDL, given its complexity, limited time frame and small development team, can be feasible in HLS at a low performance cost compared to an HDL-based approach [3]–[5].

An HLS-based design for a heterogeneous system could be started from scratch, or use pre-existing code originally written for a conventional CPU. Either way, to effectively use current HLS technology, system developers require considerable knowledge and experience in the application domain, computer programming, and HLS design flow.

Deploying pre-existing code written for a conventional CPU onto a heterogeneous system with the aim of improving performance or efficiency is even more difficult. The code needs to be substantially restructured to be synthesisable, and to produce optimised hardware. This needs to be done for all the code: not just the application source code but also any library functions it uses. To date Automatic Code Deployment (ACD) tools capable of performing this challenging task without human intervention have been the subject of limited research (e.g. [6], [7]) but a considerable amount of work has been done on automating some of the more challenging, tedious and time-consuming steps in this process (e.g. [8]–[11]).

This article surveys recent toolchains and workflows for high-level synthesis to FPGA with a focus on technologies that might eventually be used for automatic code deployment. Section I introduces HLS and the motivation for heterogeneous computing based on HLS. Section II categorises different contemporary approaches to deploy compute-intensive code segments to FPGA hardware accelerators. The categories introduced in Section II are used to organise a thorough survey, in Sections III and IV, of approaches that take a candidate function expressed in a HLL and produce low-level HDL suitable for FPGA deployment. The arguments in these sections focus on contemporary HLS tools currently used in academia or industry; legacy tools are included in summary tables for completeness. Specification of a hypothetical tool for ACD to FPGA, as well as a brief summary of progress reported in the literature towards making HLS-based FPGA code deployment less dependent on human judgement and proficiency, are provided in Section V.

SECTION II.High-Level Code Deployment Approaches

FIGURE 1. Design flows for high-level code deployment.

SECTION III.Behavioural Approach for FPGA Synthesis

FIGURE 2. A generic HLS design flow.

TABLE 1 Currently Available Commercial HLS Tools

TABLE 2 Currently Available Academic HLS Tools

----------------------------------------------------------------------

원 논문에 각 HLS 도구들의 작동 방식을 간략히 설명하고 있으니 찾아보자. 어떤 기법들이 동원되고 있는지 눈여겨볼 만 하다.

SECTION VI.Conclusion

The motivation for this article was a vision of a software tool that could automatically deploy sections of code, originally written for a conventional CPU, to FPGA accelerators to achieve an implementation advantage, whether that be latency, throughput, energy or some other optimisation goal. How close are existing tools to this delivering this vision, and where are the capability gaps?

To survey and evaluate existing tools we provided a classification of design flows in FIGURE 1. This classification has neatly expressed the relationships between many different hardware synthesis tools surveyed under three broad approaches: manual re-coding, behavioural synthesis, and dataflow synthesis, as well as variations of these.

Sections III and IV surveyed many commercial and research tools for code deployment, organised according to the framework in FIGURE 1. Wherever possible we have identified the pedigree of these tools and their relationship to other tools in the survey. For the more widely used or surveyed tools we have provided an overview of salient features and capabilities.

None of the existing tools are able to fulfil the vision of fully automatic deployment of general C/C++ code to a heterogeneous system of FPGAs and CPUs. Capability gaps include the generation of synthesisable HLS code from HLL code that uses pointers to pointers or functions, recursive functions, or dynamic memory allocations. Other challenges include efficient partitioning of the code, optimisation of generated hardware, and design space exploration. All of these are the subject of active research efforts as surveyed in Section V-B.

This work has also identified a trend that is somewhat orthogonal to the idea of automatic code deployment. There is currently significant work in approaches to express the application in a high level language that is more amenable for execution on diverse platforms. Examples include the growing proliferation of domain-specific languages or dataflow representations.

2023년 9월 13일 수요일

오늘의 반도체 설계, 20년 전과 다를까?

각종 매체에 '반도체'라는 말이 등장 할 때 따라붙는 접미어를 보면 크게 '물질'과 '공정' 그리고 '설계'일 겁니다. '공정'은 '몇몇 나노 공정' 이라며, 숫자가 작을 수록 최신 공정이라며 뉴스꺼리로 주목을 많이 받습니다. 그리고 반도체 '물질' 역시 큰 주목을 받는데, 탄소 반도체, 초전도 반도체 등 입니다. 역시 신물질이라며 주목을 받죠. 그런데 '설계'에 관한 기사는 드믈고 그나마 '인력양성' 또는 '시스템 반도체' 라는 기사에 잠깐 언급되는 정도 입니다.

'공정'을 제조기술이라고 한다면 '설계'는 기능의 구현기술 이라고 하겠습니다. 말하자면 '설계'는 제조와 기능 사이에 연결고리가 되는 도면을 그리는 일입니다. 해야할 일이 많아지면 그 기능들을 수행할 전자회로는 복잡해 집니다. 그저 복잡하다고 표현 하는 정도가 아니라 너무나 복잡해 집니다. 요즘 PC에 사용되는 중앙연산장치 반도체(CPU) 내부의 트랜지스터 갯수는 수십억개 라고 합니다[참조]. 굳이 말로 표현하자면 이 반도체를 제조하려면 수십억(!)개의 부품으로 구성된 도면을 그려줘야 한다는 뜻입니다. 사람이 할일이 아니죠. 할 수도 없구요. 그래서 부품들을 모두 규격화 해놓고 수많은 부품을 규칙적으로 배치하고 배선해 주는 반도체 설계 자동화 도구라는 소프트웨어를 동원 합니다.

현대적인 반도체 설계는 할 일을 문서로 작성해 주면 자동화 소프트웨어가 이를 전자회로의 도면으로 변환해 줍니다. 이는 프로그래밍 언어로 할일을 작성하고 컴파일러로 실행 파일을 만드는 소프트웨어 개발과 다를바 없습니다. 요즘은 반도체 설계도 소프트웨어 개발 처럼 컴퓨터 언어로 할일을 작성해 주면 컴파일러(합성기와 배치배선기)가 알아서 도면을 작성해 줍니다. 그렇지 않고서야 어떻게 수십억개의 트랜지스터가 들어간 반도체 제조 도면을 만들겠습니까!

지지난 달 부터 대학에 나가 반도체 설계를 가르칠 기회가 생겨서 강의록을 만들려고 자료를 찾다가 20년 전의 반도체 설계 강좌 기사를 발견하여 읽어 봤습니다. 연재글 제목도 아주 매력적 입니다.

"마이크로프로세서 설계 무작정 따라하기" [링크]

컴퓨터 활용서나 코딩 교육 입문서에 '무작정 따라하기'가 붙은 제목은 봤어도 반도체 설계에 이런 제목이라니 매우 과감했다는 생각이 들더군요. 아마 그 시절의 시각에서 보면 어쩌면 황당했을 지도 모릅니다. 물론 적어도 전자공학을 전공하는 대학생을 염두에 둔 글이긴 하지만 따라하기에 필요한 도구(소프트웨어)들이 너무나 고가 인데다 쉽게 접근하기 어려웠기 때문 입니다. 물론 20년 전의 소프트웨어 개발 도구들(컴파일러)의 가격도 만만치 않았지만 마음만 먹으면 그럭저럭(?) 사용하는데 크게 무리는 없었지만 반도체 설계 도구들을 구하기는 매우 어려웠습니다. 도구의 희귀성으로 인해 오늘의 반도체 설계 인력 부족이라는 문제를 낳게 된 요인의 하나였을 것이라는 생각이 듭니다.

오늘 우리의 반도체 설계 여건은 20년 전에 비할 수 없이 달라졌습니다. 감히 엄두도 못내던 설계도구들이 제작사들의 관대함 덕분에 무료 라이센스가 발행되고 있습니다. 오픈 소스 소프트웨어는 소프트웨어 개발 도구 뿐만 아니라 반도체 설계도구도 예외는 아닙니다. 시뮬레이터, 컴파일러(합성기), 배치배선기, 레이아웃 편집기들은 교육용 뿐만 아니라 중소규모 반도체 설계용으로 손색이 없습니다. 이제 말그대로 '무작정 따라하기'에 장애가 없어졌습니다. 이에 덧붙여 국내 기관(ETRI 등), 대학 연구소에 설치된 실험용 공정에서 교육 목적으로 반도체 제작을 지원한다는 소식이 있으니 더욱 반갑습니다. 외국의 경우 이미 FPGA 를 사용한 반도체 설계가 취미로 자리하여 재미있는 '프로젝트' 들이 공개되는 것을 봅니다. 반도체 설계를 '무작정' 따라할 만큼 충분한 여건이 되었습니다.

한가지 덧붙이자면,

반도체와 관련된 뉴스의 화면에 방진복을 입은 작업자의 모습을 그만 봤으면 좋겠습니다. 팬데믹 사태를 격으며 힘들던 시절의 모습을 떠올리게 합니다. 청정실이라고는 하지만 마치 감옥에라도 갇혀있는 듣한 모습은 매력적이지도 않고 숨이 막힙니다. 반도체 업계는 마치 '공정'만 있는 듯이 보입니다. 허옇고 누런 색과 단조로운 기계적인 모습 대신 역동적인 화면(그래봐야 코드 리스팅이지만)과 설계자들의 자유분방한 화면을 보여주면 좋겠습니다. 적어도 시뮬레이션 화면의 파형 정도는 보여줘도 되는것 아닌가요? 소프트웨어 산업을 소개할 때처럼 남녀 설계자들의 매력적인 모습을 보여 줬으면 좋겠습니다.

---------------------------------------------------------------

두번째 덧붙이자면,

20년전이나 지금이나 '반도체 설계'는 규모만 커졌지 기조는 바뀐 것이 없습니다. 여기서 말하는 '설계'는 하드웨어용 언어로 작성된 할일(알고리즘)을 디지털 회로(트랜지스터 조합)로 변환해주는 과정을 말합니다. 오늘날 이런류의 '설계자동화'는 관심을 얻지 못합니다. 당연하게 여기게 된 것입니다. 마치 소프트웨어 언어의 컴파일러 제작이 고급 기술이 아닌 평범해 진것과 같은 이유 입니다. '하드웨어 문서에서 실리콘으로' 이어주던 자동화 기술이 정점에 이른 지금은 좀더 높은 수준의 자동화에 관심을 갖게 되었습니다. '하드웨어 문서'에서 '하드웨어' 라는 말을 빼려는 것입니다. 하드웨어를 목적으로 만든 문서를 전자회로로 바꾸는 설계 자동화는 평범해진 것입니다.

인간의 언어와 가깝도록 발전한 (소프트웨어) 프로그래밍 언어는 알고리즘을 수월하게 표현할 수 있습니다. 굉장히 많은 사람들이 이 언어를 이용해 전자회로에게 일을 시키고 있죠. 프로그래밍 언어로 작성된 알고리즘을 전자회로에서 작동시키려면 컴파일러라는 도구를 사용합니다. 이 도구는 범용 계산기(CPU, GPU 같은)에서 작동될 수 있도록 일반 문서를 기계용 문서로 바꿔 주는 역활을 합니다. 문제는 이 범용 계산기의 성능(처리속도)이 인공지능이나 기계학습처럼 대규모 데이터를 다뤄야 하는 응용에 만족스럽지 않다는 것입니다. 동시다발로 생성되는 데이터를 받아들이려면 그 숫자만큼의 컴퓨터가 필요한데 경제적으로나 기술적으로나 부담이 아닐 수 없습니다. 그래서 계산기에서 '범용'을 빼려고 합니다.

신경망이라고 하는 인간의 사고체계를 모형화하고 전자회로로 구현하고 싶어진 것입니다. 인간두뇌의 신경세포의 수 만큼은 아니더라도 수천(또는 수만개)개의 CPU가 서로 연결된 계산기를 만들고 싶어진 것입니다. 이런 계산구조의 유용성은 이미 증명되었습니다. 하지만 범용 CPU가 하나 달린 컴퓨터 여러개를 연결하려면 여간 수고로운게 아닙니다. 그래서 수천개의 CPU를 가진 컴퓨터를 만들려고 합니다. 다행이라면 신경망을 구성하는 계산기(신경세포)가 이것저것 다하는 범용 계산장치(CPU)보다 아주 단순하다는 것이죠. 손톱만한 반도체 위에 단순 계산장치 수천개를 서로 연결해 놓고자 합니다.

응용에 따라 알고리즘은 달라 집니다. 이 알고리즘은 프로그래밍 언어로 쉽게 작성 할 수 있습니다. 그래서 프로그래밍 언어로 작성된 알고리즘을 반도체 회로로 바꿔주는 자동화 도구가 등장 했는데 이를 고위합성(HLS, High-Level Synthesis)이라고 합니다.

어쨌든 계산을 수행하는 전자회로는 '하드웨어' 입니다. 이 하드웨어는 한번 만들어지면 고치지 못합니다. 그래서 컴퓨터는 다재다능하도록 만들어 놓은 범용 CPU를 두고 프로그램을 바꿔가며 해당기능을 수행 합니다. 이를 '소프트웨어'라고 합니다.

FPGA 라는 반도체 부품(IC)가 있습니다. 이 반도체는 논리적인 수준에서 구조를 바꿀 수 있습니다. 전자회로의 말단에 해당하는 트랜지스터는 고정되어 있지만 그보다 윗단계에서 산술논리 계산을 수행하는 계산기 구조를 마음대로 재구성 할 수 있습니다. 말하자면 특정 알고리즘에 맞춰 그에 최적화된 CPU로 변신 시킬 수 있는 반도체 부품이 바로 FPGA 라는 것입니다. 프로그램될 수 있는 하드웨어 입니다. 하드웨어가 소프트 해졌다는 뜻입니다.

HLS와 FPGA를 결합하여 프로그래밍 언어로 작성된 인공지능 알고리즘을 소프트해진 하드웨어에서 작동 시키고자 하는 연구가 한창이고 곧 실현될 조짐을 보이고 있습니다.

20년전에 반도체 설계라 하면 '하드웨어용 문서'를 작성하는 행위 였다면 오늘의 반도체 설계는 '하드웨어용'이라는 제약을 떼내고 '일반 문서'를 작성하는 행위로 바뀌었다는 것이 가장 큰 차이라 하겠습니다.

재작년(2021)에 Xilinx의 HLS 툴이 신통하길래 봐뒀는데 이렇게 유용할 줄은 생각도 못했습니다.

--------------------------------------

[참고]
[1] AI_accelerator, https://en.wikipedia.org/wiki/AI_accelerator
[2] Xilinx Research & Open Source Projects, https://www.youtube.com/@xilinxresearchopensourcepr321
[3] HLS Programming with FPGAs, https://www.youtube.com/@youngkyuchoi4260
[4] FINN tutorial at FPGA'21, https://xilinx.github.io/finn/2021/01/27/finn-tutorial-fpga21.html
[5] 고위합성 튜토리얼 개요 (Tutorial Description), https://hls-goodkook.blogspot.com/2021/08/1-tutorial-description.html

2023년 8월 10일 목요일

Open Source HLS

3D raytraced game with open source C to FPGA toolchain

A look at CFlexHDL and PipelineC

https://blog.yosyshq.com/p/3d-raytracing/

Graphics demos implemented using PipelineC.

https://github.com/JulianKemmerer/PipelineC-Graphics

PipelineC

A C-like(1) hardware description language (HDL)(2) adding high level synthesis(HLS)-like automatic pipelining(3) as a language construct/compiler feature.

https://github.com/JulianKemmerer/PipelineC

CflexHDL

Design digital circuits in C. Simulate really fast with a regular compiler!

https://github.com/suarezvictor/CflexHDL

LiteX

The LiteX framework provides a convenient and efficient infrastructure to create FPGA Cores/SoCs, to explore various digital design architectures and create full FPGA based systems.

https://github.com/enjoy-digital/litex

LLVM

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

https://github.com/llvm/llvm-project

MyHDL

From Python to Silicon

https://www.myhdl.org/

NNgen

A Fully-Customizable Hardware Synthesis Compiler for Deep Neural Network

https://github.com/NNgen/nngen

https://link.springer.com/chapter/10.1007/978-3-319-16214-0_42

Veriloggen

A Mixed-Paradigm Hardware Construction Framework

https://github.com/PyHDI/veriloggen