HDLCopilot: Hardware Design Library Querying with Natural Language

Manar Abdelatty manar˙[email protected] Brown UniversitySchool of EngineeringProvidenceRIUSA  and  Sherief Reda sherief˙[email protected] Brown UniversitySchool of EngineeringProvidenceRIUSA
Abstract.

Hardware design engineers routinely work with multiple Process Design Kits (PDKs) from various fabrication labs, each containing several standard cell libraries, optimized for specific metric such as speed, power, or density. These libraries include multiple views such as liberty files for timing information, LEF files for abstract layout details, and technology LEF for process design rules. Navigating this complex landscape to retrieve specific information about gates or design rules is often time-consuming and error-prone. To address this, we present HDLCopilot, an LLM-powered PDK query system that allows engineers to streamline interactions with PDKs in natural language format, making information retrieval accurate and more efficient. HDLCopilot achieves an accuracy of 94.23% on an evaluation set comprised of diverse and complex natural language queries. HDLCopilot positions itself as a powerful assistant in the hardware design process, enhancing productivity and reducing potential human errors.

LLM, RAG, PDK, Query, Retrieval, text-to-SQL, relational-database
copyright: iw3c2w3gconference: ; 2025;

1. Introduction

Refer to caption
Figure 1. Overview of HDLCopilot. The framework converts a user question to a SQL query to retrieve relevant information from the PDK database, then uses it as context for an LLM to generate natural language answers.

At the core of semiconductor design workflows lies a critical component: the Process Design Kit (PDK). PDKs serve as a comprehensive library of building blocks that are used for synthesizing abstract circuit definitions into manufacturable chips. PDKs usually contain different standard cell libraries, each optimized for a specific metric, such as speed, density, or power. These libraries include detailed files on cell timing information at different process corners, physical layout data, and metal stack properties.

Traditionally, hardware engineers navigate this complex landscape manually, parsing through extensive library files containing thousands of cells, each with numerous attributes, in order to locate specific information relevant to their current design task. This manual process is not only time consuming but also prone to human errors. Therefore, there is a need for more automated and efficient tools to assist engineers in managing and utilizing PDKs effectively, potentially accelerating the design process, and enhancing accuracy.

Large Language Models (LLMs) have enhanced productivity in various engineering domains, including hardware design. They’ve shown promise in tasks like Verilog code generation, RTL code bug fixing, and EDA tool scripting (Thakur et al., 2022; Pearce et al., 2020; Blocklove et al., 2023; Lu et al., 2024; Liu et al., 2023b, c, a; Tsai et al., 2023; Qiu et al., 2024; Wu et al., 2024). However, their application to PDK management remains underexplored. LLMs could potentially enhance engineer-PDK interactions through natural language interfaces, automated data retrieval, and intelligent suggestions, accelerating the design process and reducing errors.

Currently, Large Language Models (LLMs) are not inherently aware of Process Design Kits (PDK) specifics. Domain adaptive pre-training could be one way of enhancing LLMs knowledge with PDK-specific data (Gururangan et al., 2020). However, it would require expensive and time-consuming pre-training of the LLM on a large scale dataset of different PDKs from different manufacturers and process nodes. Moreover updating the PDK information would require retraining the LLM, making it impractical to maintain the LLM’s relevance to the most recent PDK version.

In contrast, Retrieval Augmented Generation (RAG) offers a more flexible and maintainable solution. RAG enhances LLM capabilities by grounding their responses in external knowledge sources, which can be easily updated without retraining the base model (Lewis et al., 2020). RAG has been applied to various data types: unstructured (e.g., text documents), semi-structured (e.g., JSON), and fully structured (e.g., relational databases) (Wang et al., 2023; Li et al., 2024; Zhong et al., 2020). For unstructured data, RAG uses semantic similarity matching, while for semi-structured data, it combines semantic and structural information. With fully structured data, RAG transforms the retrieval into a text-to-SQL task, generating SQL queries based on user questions and database schemas, then using the query results as context for the LLM’s response.

Since PDKs generally follow a structured format, they are well-suited for conversion into relational databases. By transforming PDK data into a SQL database, we can leverage the benefits of SQL-based retrieval while maintaining the flexibility to update and expand the knowledge base without retraining the underlying LLM. The SQL-based retrieval also allows for human verification of the generated SQL queries, ensuring reliability and transparency in the information retrieval process.

In light of this, we introduce HDLCopilot, an LLM-powered PDK query, designed for interacting with Process Design Kits (PDKs) using natural language. HDLCopilot harnesses the power of RAG and text-to-SQL conversion to provide an intuitive, efficient, and accurate interface for hardware designers to access and utilize PDK information. Fig. 1 presents an overview of the HDLCopilot framework, illustrating the flow of converting natural language queries to SQL queries to retrieve relevant data from the PDK database.

Our contributions are summarized as follows:

  • We introduce HDLCopilot, a multi-agent collaborative framework, designed to streamline interactions with Process Design Kits (PDKs) in natural language format, offering a novel approach for enhancing hardware design engineers efficiency.

  • We propose a database schema for storing PDK information in relational tables, which facilitates easy and seamless integration with LLM-enabled applications. This structured approach allows LLMs to perform precise data retrieval by dynamically generating SQL queries based on natural language inputs.

  • Experimental results show that HDLCopilot can answer diverse and complex user questions with high precision across different libraries in the PDK, achieving an accuracy of 94.23%. HDLCopilot also demonstrates high capability in generating efficient SQL queries with an efficiency score of 98.07%.

This paper is organized as follows. Section 2 discusses related work. Section 3 provides an overview of the HDLCopilot framework. Section 4 presents experimental results to validate the utility of the HDLCopilot framework. Finally, section 5 concludes the paper.

2. Related Work

In this section, we review relevant work in three key areas: LLMs for hardware design process in Section. 2.1, Retrieval augmented generation in Section. 2.2, LLMs for text-to-SQL applications in Section 2.3.

2.1. LLMs for Hardware Design

LLMs have been extensively applied to various tasks in the hardware design process from verilog code generation, EDA scripting to RTL bug fixing. Takhur et al. (Thakur et al., 2023) introduced the first finetuned LLM for verilog code generation. Wu et al. (Wu et al., 2024) introduced ChatEDA, demonstrating how LLMs can be used for EDA tool scripting and automation. Tsai et al. (Tsai et al., 2023) explored the use of LLMs for fixing RTL code bugs. Several studies have also explored LLMs for question-answering tasks in hardware design. For example, Liu et al. (Liu et al., 2023a) proposed ChipNeMo, an LLM fine-tuned for hardware design tasks including question answering.

2.2. Retrieval Augmented Generation (RAG)

Retrieval augmented generation emerged as a powerful approach to enhance the performance and reliability of LLMs by connecting them to external knowledge sources. Lewis et al.  (Lewis et al., 2020) introduced the RAG model, which combines a pre-trained neural retriever with a sequence-to-sequence model for open-domain question answering. Guu et al. (Guu et al., 2020) proposed REALM (Retrieval-Augmented Language Model Pre-Training), which integrates retrieval during model pre-training, showing improvements in open domain question-answering tasks. Shuster et al. (Shuster et al., 2021) have shown that using retrieval can reduce LLM hallucination in knowledge-grounded dialogue tasks.

Refer to caption
Figure 2. Overview of the HDLCopilot multi-agent collaborative framework. The framework consists of four LLM agents: (1) Dispatcher for routing user questions to the relevant standard cell library, library view, and operating conditions (2) Selector for choosing the relevant tables (3) SQL Generator for generating the SQL queries to retrieve the relevant information, and (4) Interpreter for formulating the answer based on the query result.

2.3. LLMs for Text-to-SQL

The transformation of natural language queries into SQL using LLMs has been a focus of recent research. Pourezza et al. (Pourreza and Rafiei, 2024) proposed decomposing the user question into sub-questions and feeding the solution of those sub-questions to the LLM to generate the final SQL query. Cyren et al. (Wang et al., 2023) proposed a multi-agent collaborative framework for converting the user question to SQL by first semantically choosing relevant tables, then performing query decomposition into multiple sub-queries.

While these works demonstrate the broad applicability of LLMs in hardware design and the potential of text-to-SQL in other domains, the application of text-to-SQL techniques for PDK management remains an underexplored area. HDLCopilot addresses this gap by combining the strengths of LLMs, text-to-SQL conversion, and structured PDK data to provide an efficient and accurate natural language interface for PDK queries.

3. Proposed HDLCopilot Framework

In Fig. 2, we present HDLCopilot, a multi-agent collaborative framework for streamlining interactions with Process Design Kits (PDKs) using natural language. HDLCopilot employs four specialized LLM agents, each designed to perform a specific function. This section first provides an overview of the PDK files conversion to a relational database, followed by a detailed explanation of each agent’s functionality and purpose.

3.1. LLM-Compatible PDK Database Schema

OperatingConditionsCondition_IDName Voltage Temperature Process Cell_Variant CellsCell_IDName Area Cell_Leakage_Power Footprint Condition_ID InputPinsInput_Pin_IDCell_ID Name Clock Capacitance Rise_CapacitanceOutputPinsOutput_Pin_IDCell_ID Name Max_Transition Max_CapacitanceTimingTablesTiming_Table__IDOutput_Pin_ID Timing_Type Index_2_LabelTimingTableValuesTiming_Value_IDTiming_Table_ID Index_1 Index_2 Value
Figure 3. Liberty schema for standard cell libraries, showing relationships between operating conditions, cells, pins, and timing tables.
MacrosMacroIDName Class ForeignName SizeWidth Cell_Variant PinsPinIDMacroID Name Direction Use AntennaGateArea PinPortsPortIDPinID Layer PinPortRectanglesRectIDPortID RectX1 RectY1 RectX2 RectY2 ObstructionsObstructionIDMacroID Layer ObstructionRectanglesObstructionRectIDObstructionID RectX1 RectY1 RectX2 RectY2
Figure 4. LEF schema for standard cell libraries, showing relationships between macros, pins, ports, and obstructions.
RoutingLayersLayerIDName Typ Direction Resistance Corner Cell_Variant AntennaDiffSideAreaRatiosRatioIDLayerID Typ X1 Y1 X2 Y2 CutLayersLayerIDName Typ Width Resistance Corner Cell_Variant AntennaDiffAreaRatiosRatioIDLayerID Typ X1 Y1 X2 Y2 ViasViaIDName ViaLayersViaLayerIDViaID LayerName RectX1 RectY1 RectX2 RectY2
Figure 5. Tech LEF schema for process corners, showing relationships between routing layers, cut layers, vias, and antenna ratios.

To facilitate integration with LLM-based retrieval, we first convert the PDK files into a relational database. LLM agents can then retrieve information from the database by dynamically generating SQL queries based on the user input. We mainly focus on three views in the PDK files: liberty for storing timing information at different operating conditions, LEF for abstract layout information for each cell, and Technology LEF for storing metal stack properties. We propose a schema for each view. These schema are designed to support storing information across different standard cell libraries different operating conditions within the PDK.

The liberty schema is shown in Fig. 3. This schema comprises tables capturing operating conditions, cell attributes, pin properties, and timing data. The LEF schema is shown in Fig. 4. This schema comprises tables that capture abstract physical information about various macros (cells), including cell dimensions, obstruction layers in cell layouts, and pin physical attributes such as antenna gate area and pin shapes. Fig. 5 depicts the schema for technology LEF files, which stores technology-specific information. This includes details about routing layers (such as preferred routing direction, width, and spacing rules, resistance), via layers, and their associated antenna ratios.

3.2. LLM Agents

HDLCopilot comprises four LLM agents: : the Dispatcher, Selector, SQL-Generator, and Interpreter, which collaborate together to ensure a reliable and accurate SQL generation pipeline.

Dispatcher: The main objective of the dispatcher agent is to route the user question to the appropriate standard cell library, library view, and operating conditions. Given the user question 𝒬𝒬\mathcal{Q}caligraphic_Q, the available libraries \mathcal{L}caligraphic_L and library views 𝒱𝒱\mathcal{V}caligraphic_V in the PDK, the dispatcher selects the relevant library superscript\mathcal{L^{\prime}}caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, library view 𝒱superscript𝒱\mathcal{V^{\prime}}caligraphic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and operating conditions 𝒞superscript𝒞\mathcal{C^{\prime}}caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (if applicable). The function of the dispatcher agent is described in Eq. 1, where fdispatcher(.|)f_{dispatcher}(.|\mathcal{M})italic_f start_POSTSUBSCRIPT italic_d italic_i italic_s italic_p italic_a italic_t italic_c italic_h italic_e italic_r end_POSTSUBSCRIPT ( . | caligraphic_M ) represents LLM agent \mathcal{M}caligraphic_M.

(1) {,𝒱,𝒞}=fdispatcher(𝒬,,𝒱)superscriptsuperscript𝒱superscript𝒞subscript𝑓dispatcher𝒬conditional𝒱\mathcal{\{L^{\prime},V^{\prime},C^{\prime}\}=}f_{\text{\emph{dispatcher}}}% \mathcal{(Q,L,V\mid M)}{ caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } = italic_f start_POSTSUBSCRIPT dispatcher end_POSTSUBSCRIPT ( caligraphic_Q , caligraphic_L , caligraphic_V ∣ caligraphic_M )

The selected library view 𝒱superscript𝒱\mathcal{V^{\prime}}caligraphic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is then used to filter the PDK tables, retaining only those relevant to the routed view as described in Eq. 2. The selected tables 𝒯superscript𝒯\mathcal{T^{\prime}}caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are then passed to to the selector agent to perform further fine-grained table selection by choosing only the tables relevant to the user question.

(2) 𝒯={t𝒯𝒱}superscript𝒯conditional-set𝑡𝒯superscript𝒱\mathcal{T^{\prime}}=\{t\in\mathcal{T}\mid\mathcal{V^{\prime}}\}caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_t ∈ caligraphic_T ∣ caligraphic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }

Selector: The selector agent performs a more refined reduction of the tables. Given the user question 𝒬𝒬\mathcal{Q}caligraphic_Q and the schema description of the routed tables 𝒯superscript𝒯\mathcal{T^{\prime}}caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the selector narrows down the the set of the routed tables to only those most relevant to the user question. The function of the selector agent is described in Eq. 3. The main purpose of this table reduction process is to make the text-to-SQL task easier by having the SQL-Generator only examine the relevant tables. This focused approach enhances the efficiency and accuracy of the SQL generation process.

(3) 𝒯′′=fselector(𝒬,𝒯)superscript𝒯′′subscript𝑓selector𝒬conditionalsuperscript𝒯\mathcal{T^{\prime\prime}}=f_{\text{\emph{selector}}}\mathcal{(Q,T^{\prime}% \mid M)}caligraphic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT selector end_POSTSUBSCRIPT ( caligraphic_Q , caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ caligraphic_M )

SQL-Generator: The SQL-generator serves as the core agent of the framework. Its primary function is to construct a SQL query that accurately retrieves the required information from the PDK database to address the user’s question. The generator employs a query decomposition approach proposed in (Pourreza and Rafiei, 2024; Wang et al., 2023) that breaks down the user question into smaller, manageable sub-questions. For each sub-question, the generator produces a corresponding sub-query. These sub-queries are then combined to form the final SQL query. This step-wise approach enhances accuracy and allows for handling complex user questions. The generator function is describe in Eq. 4. It takes as input the user question 𝒬𝒬\mathcal{Q}caligraphic_Q, the schema description of the selected tables 𝒯′′superscript𝒯′′\mathcal{T^{\prime\prime}}caligraphic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, the relevant standard cell library superscript\mathcal{L^{\prime}}caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and operating conditions 𝒞superscript𝒞\mathcal{C^{\prime}}caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

(4) 𝒮𝒬=fgenerator(𝒬,𝒯′′,,𝒞)𝒮𝒬subscript𝑓generator𝒬superscript𝒯′′superscriptconditionalsuperscript𝒞\mathcal{SQL=}f_{\text{\emph{generator}}}\mathcal{(Q,T^{\prime\prime},L^{% \prime},C^{\prime}\mid M)}caligraphic_S caligraphic_Q caligraphic_L = italic_f start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT ( caligraphic_Q , caligraphic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ caligraphic_M )

Interpreter: The interpreter main role is to translate the raw database results into a coherent, natural language response that directly addresses the user’s question. This agent processes the user question 𝒬𝒬\mathcal{Q}caligraphic_Q and the result \mathcal{R}caligraphic_R obtained from executing the generated SQL query (Eq. 5) and then then formulates an output answer 𝒪𝒪\mathcal{O}caligraphic_O in natural language format, as formalized in Eq. 6.

(5) =fexecute(𝒮𝒬,𝒟)subscript𝑓execute𝒮𝒬𝒟\mathcal{R}=f_{\text{execute}}\mathcal{(SQL,DB)}caligraphic_R = italic_f start_POSTSUBSCRIPT execute end_POSTSUBSCRIPT ( caligraphic_S caligraphic_Q caligraphic_L , caligraphic_D caligraphic_B )
(6) 𝒪=finterpreter(𝒬,)𝒪subscript𝑓interpreter𝒬conditional\mathcal{O}=f_{\text{\emph{interpreter}}}(\mathcal{Q},\mathcal{R}\mid\mathcal{% M})caligraphic_O = italic_f start_POSTSUBSCRIPT interpreter end_POSTSUBSCRIPT ( caligraphic_Q , caligraphic_R ∣ caligraphic_M )

4. Experimental Results

We conduct all experiments using OpenAI’s GPT models. All models were accessed through their API, specifically gpt-3.5-turbo-0125 for GPT3.5, gpt-4-turbo-2024-04-09 for GPP4, and gpt-4o-2024-05-13 for GPT4-o. For the Process Design Kit (PDK), we utilize the open-source Skywater 130nm (sky, [n. d.]). This PDK encompasses 6666 Standard Cell Libraries (SCLs). First, we converted the PDK files to a database using our proposed schema. The resulting Skywater database comprises 20202020 tables, 39,5763957639,57639 , 576 different cell entries, 4,986,16049861604,986,1604 , 986 , 160 entries for cell timing information at different process corners, with a total size of 7.2 GBtimes7.2gigabyte7.2\text{\,}\mathrm{GB}start_ARG 7.2 end_ARG start_ARG times end_ARG start_ARG roman_GB end_ARG. This comprehensive large-scale database provides a robust platform for evaluating our framework.

4.1. Evaluation Set

To evaluate HDLCopilot, we created an evaluation set of 52525252 user questions with corresponding SQL-queries. This set encompasses a diverse range of complexities, from simple single-table selections to complex multi-table joins with sub-queries and multiple conditions. As shown in Table 1, the set incorporates various SQL clauses, aggregation functions, and sub-queries, providing a comprehensive test of the framework’s SQL handling capabilities.

4.2. Evaluation Metrics

Table 1. Evaluation set statistics, showing the number of queries and clause occurrences in the set.
Statistic Total TechLef LEF Liberty
Total Queries 52 13 16 23
Clause Occurrences:
JOIN 7 - - 7
ORDER BY 13 2 3 8
WHERE 113 17 28 68
GROUP BY 2 - - 2
Aggregation Functions 48 6 1 41
Sub-queries 62 3 12 47
Average Length (chars) 222 130 137 333
Maximum Length (chars) 1075 202 258 1075

Following text-to-SQL work  (Wang et al., 2023; Li et al., 2024; Zhong et al., 2020), we use the Execution Accuracy (EX) and Valid Efficiency Score (VES) to evaluate the performance of our proposed framework.

The Execution Accuracy (EX) quantifies the framework’s ability to generate SQL queries that produce correct results. It measures the proportion of questions in the evaluation set where the execution results match those of the ground truth queries. It is formally defined in Eq. 7, where N𝑁Nitalic_N defines the number of questions in the evaluation set, Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT defines the set returned by the ground truth SQL query and Vi^^subscript𝑉𝑖\hat{V_{i}}over^ start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG defines the set returned by the predicted SQL query. 11(.)1\!\!1(.)1 1 ( . ) is an indicator function that is equal to 1111 if both the ground truth set and the predicted set are equal and 00 otherwise.

(7) EX=i=1N11(Vi,V^i)N,11(Vi,V^i)={1,if Vi=V^i0,if ViV^iformulae-sequence𝐸𝑋superscriptsubscript𝑖1𝑁11subscript𝑉𝑖subscript^𝑉𝑖𝑁11subscript𝑉𝑖subscript^𝑉𝑖cases1if subscript𝑉𝑖subscript^𝑉𝑖0if subscript𝑉𝑖subscript^𝑉𝑖EX=\frac{\sum_{i=1}^{N}1\!\!1(V_{i},\hat{V}_{i})}{N},\quad 1\!\!1(V_{i},\hat{V% }_{i})=\begin{cases}1,&\text{if }V_{i}=\hat{V}_{i}\\ 0,&\text{if }V_{i}\neq\hat{V}_{i}\end{cases}italic_E italic_X = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT 1 1 ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_N end_ARG , 1 1 ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { start_ROW start_CELL 1 , end_CELL start_CELL if italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW

The Valid Efficiency Score (VES) evaluates the correctly generated SQLs by comparing their execution time against those of the ground truth SQLs. VES is formally defined in Eq. 8, where R(.)R(.)italic_R ( . ) is the relative efficiency of the predicted SQL and the ground truth SQL, and E(.)E(.)italic_E ( . ) is the execution time of each SQL in the database. The VES metric provides insights into both the correctness and the computational efficiency of the generated SQL queries.

(8) VES=i=1N11(Vi,V^i)R(Yi,Y^i)N,R(Yi,Y^i)=E(Yi)E(Y^i)formulae-sequence𝑉𝐸𝑆superscriptsubscript𝑖1𝑁11subscript𝑉𝑖subscript^𝑉𝑖𝑅subscript𝑌𝑖subscript^𝑌𝑖𝑁𝑅subscript𝑌𝑖subscript^𝑌𝑖𝐸subscript𝑌𝑖𝐸subscript^𝑌𝑖VES=\frac{\sum_{i=1}^{N}1\!\!1(V_{i},\hat{V}_{i})\cdot R(Y_{i},\hat{Y}_{i})}{N% },\quad R(Y_{i},\hat{Y}_{i})=\sqrt{\frac{E(Y_{i})}{E(\hat{Y}_{i})}}italic_V italic_E italic_S = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT 1 1 ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_R ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_N end_ARG , italic_R ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = square-root start_ARG divide start_ARG italic_E ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_E ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG end_ARG

4.3. Main Results

First, we present three qualitative examples that showcase HDLCopilot’s capability in generating complex SQL queries, retrieving relevant information, and providing precise answers to user questions. Fig. 6 shows a user question that asks for comparing the width of the 4-input MUX across all libraries. The framework is able to answer the question with high precision. This analysis is potentially useful in providing designers with immediate insights into a specific cell’s footprint variation among different library options, helping them determine which library is most suitable for their design requirement. Fig. 7 also presents a cross-library comparison of leakage power in flip-flop cells. This analysis is useful for designers focusing on low-power applications, allowing them to quickly identify the most power-efficient cell for their specific needs. Fig. 8 showcases the framework’s ability to handle more sophisticated queries. In this example, the framework generates and executes a complex SQL query to compare the propagation delay of a 2-input MUX cell between two specific libraries. These three example highlight the system’s ability to perform diverse set of analyses that would be time-consuming if done manually.

User Question
Compare the width for the mux4_1 cell across all libraries.
Generated SQL SELECT SizeWidth,CellVariant FROM Macros WHERE Name LIKE ’sky130_fd_sc_%__mux4_1’
HDLCopilot Answer
The width for the mux4_1 cell across different libraries is as follows:
- sky130_fd_sc_ls: 9.6   - sky130_fd_sc_hs: 9.6   - sky130_fd_sc_lp: 10.08
- sky130_fd_sc_ms: 9.6  - sky130_fd_sc_hd: 9.66
        
Figure 6. Cross-library comparison of cell width
User Question
What is the lowest leakage power flip flop across the high density, high density low leakage, and low power libraries?
Generated SQL SELECT Name, MIN(Cell_Leakage_Power) AS Lowest_Leakage_Power FROM Cells WHERE Condition_ID IN ( SELECT Condition_ID FROM Operating_Conditions WHERE Temperature = 25.0 AND Voltage = 1.8 AND Cell_Variant IN (’sky130_fd_sc_hd’, ’sky130_fd_sc_hdll’, ’sky130_fd_sc_lp’) ) AND Name LIKE ’%df%’;
HDLCopilot Answer
The lowest leakage power flip flop across the high density,
high density low leakage, and low power libraries
is the flip flop cell named ’sky130_fd_sc_hdll__dfstp_1’
with a leakage power of 0.004486882.
Figure 7. Cross-library comparison of flip-flop cells power.
User Question
Compare the fall propagation delay and of the mux2_1 cell between the high-density and high-density-low-leakage libraries. Assume an output load of 0.0005 and input rise time of 0.01. Output the fall propagation delay from the related input pin S to the output pin for both libraries.
Generated SQL SELECT (SELECT Value FROM Timing_Table_Values WHERE Timing_Table_ID = ( SELECT Timing_Table_ID FROM Timing_Tables WHERE Output_Pin_ID = (SELECT Output_Pin_ID FROM Output_Pins WHERE Cell_ID = (SELECT Cell_ID FROM Cells WHERE Name = ’sky130_fd_sc_hd__mux2_1’ AND Condition_ID = (SELECT Condition_ID FROM Operating_Conditions WHERE Temperature = 25.0 AND Voltage = 1.8 AND Cell_Variant = ’sky130_fd_sc_hd’))) AND Timing_Type = ’cell_fall’ AND Related_Input_Pin = ’S’) AND Index_1 = 0.01 AND Index_2 = 0.0005) AS HD_Fall_Propagation_Delay, (SELECT Value FROM Timing_Table_Values WHERE Timing_Table_ID = (SELECT Timing_Table_ID FROM Timing_Tables WHERE Output_Pin_ID = (SELECT Output_Pin_ID FROM Output_Pins WHERE Cell_ID = (SELECT Cell_ID FROM Cells WHERE Name = ’sky130_fd_sc_hdll__mux2_1’ AND Condition_ID = (SELECT Condition_ID FROM Operating_Conditions WHERE Temperature = 25.0 AND Voltage = 1.8 AND Cell_Variant = ’sky130_fd_sc_hdll’))) AND Timing_Type = ’cell_fall’ AND Related_Input_Pin = ’S’) AND Index_1 = 0.01 AND Index_2 = 0.0005) AS HDLL_Fall_Propagation_Delay;
HDLCopilot Answer
The fall propagation delay of the mux2_1 cell between the
high density (HD) and high density low leakage (HDLL)
libraries with an output load of 0.0005 and input rise
time of 0.01 are as follows:

- High Density (HD) Library: 0.2685503
- High Density Low Leakage (HDLL): 0.2979633

Therefore, the fall propagation delay from the related input
pin S to the output pin for the HD library is 0.2685503 and
for the HDLL library is 0.2979633.
    
Figure 8. Cross library comparison of cell propagation delay.

We also present quantitative assessment of the system’s performance on the 52525252 examples in our evaluation set. We first evaluate the accuracy of the dispatcher and the selector independently to determine the most reliable setup for these agents. We conduct evaluations both with and without few-shot examples. Table. 2 shows that few-shot demonstrations generally improves the accuracy for both GPT3.5 and GPT-4 models. GPT-4-o achieves the highest overall accuracy even without few-shot examples, showcasing its ability to comprehend and execute the task without additional context. The best setup is achieved by using GPT-4 and GPT-4o models, with an overall dispatch accuracy of 99.35% across all three routing tasks and table selection accuracy of 98.07%.

Table 2. Router and Selector agents accuracy on the evaluation set using different base GPT models.
Dispatcher* Selector
SCL View Corner
GPT-3.5 w/ few-shot 98.07% 96.15% 95.65% 69.23%
w/o few-shot 78.85% 96.15% 61.04% 36.53%
GPT-4 w/ few-shot 98.07% 100.0% 100.0% 98.07%
w/o few-shot 94.23% 100.0% 97.83% 69.23%
GPT-4-o w/ few-shot 98.07% 100.0% 100.0% 98.07%
w/o few-shot 98.07% 100.0% 100.0% 84.61%
  • *

    Dispatcher w/few-shot configuration uses 6-shot examples for SCL dispatching and 3-shot examples for view and corner dispatching.

  • Selector w/few-shot configuration uses 3-shot examples.

Table 3. Execution Accuracy (EX) and Valid Efficiency Score (VES) of the entire framework on the evaluation set.
Tech. LEF Lib. Overall
EX VES EX VES EX VES EX VES
GPT3.5 92.30 89.7889.7889.7889.78 75.0075.0075.0075.00 90.4190.4190.4190.41 65.2165.2165.2165.21 83.6583.6583.6583.65 75.0075.0075.0075.00 87.9587.9587.9587.95
GPT-4 92.30 93.1593.1593.1593.15 100.00 88.9488.9488.9488.94 86.9586.9586.9586.95 88.6188.6188.6188.61 92.3092.3092.3092.30 90.0890.0890.0890.08
GPT-4o 92.30 93.62 100.00 98.31 91.30 101.57 94.23 98.58

Table. 3 shows the Execution Accuracy (EX) and valid Efficiency Score (VES) of the entire framework. GPT-4o demonstrates superior performance, achieving the highest overall EX of 94.23% and VES of 98.58%. The results demonstrate that liberty queries are particularly harder than the LEF and TechLef queries. This is mainly because liberty files contains more attributes and diverse data types. Nonetheless, GPT4-o demonstrates a high execution accuracy of 91.30% for the liberty set, and even generates more efficient SQL queries than the hand-crafted ground truth SQLs, as shown by its VES of 101.57%.

5. Conclusion

In this paper, we introduced HDLCopilot, an LLM-powered multi-agent collaborative framework, designed to streamline interactions with Process Design Kits (PDK) in natural language format. To facilitate integration with LLM-Agents, the PDK information are first converted to a relational database, which HDLCopilot agent interacts with by generating SQL queries to retrieve relevant information. HDLCopilot also has the potential of integrating well with other hardware design copilots in order to give LLMs PDK awareness.

References

  • (1)
  • sky ([n. d.]) [n. d.]. Open source process design kit for usage with SkyWater Technology Foundry’s 130nm node. https://github.com/google/skywater-pdk
  • Blocklove et al. (2023) Jason Blocklove, Siddharth Garg, Ramesh Karri, and Hammond Pearce. 2023. Chip-chat: Challenges and opportunities in conversational hardware design. In 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD). IEEE, 1–6.
  • Gururangan et al. (2020) Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. 2020. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964 (2020).
  • Guu et al. (2020) Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938.
  • Lewis et al. (2020) Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  • Li et al. (2024) Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2024).
  • Liu et al. (2023a) Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, et al. 2023a. Chipnemo: Domain-adapted llms for chip design. arXiv preprint arXiv:2311.00176 (2023).
  • Liu et al. (2023c) Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. 2023c. Invited Paper: VerilogEval: Evaluating Large Language Models for Verilog Code Generation. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). 1–8. https://doi.org/10.1109/ICCAD57390.2023.10323812
  • Liu et al. (2023b) Shang Liu, Wenji Fang, Yao Lu, Qijun Zhang, Hongce Zhang, and Zhiyao Xie. 2023b. Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution. arXiv preprint arXiv:2312.08617 (2023).
  • Lu et al. (2024) Yao Lu, Shang Liu, Qijun Zhang, and Zhiyao Xie. 2024. RTLLM: An open-source benchmark for design rtl generation with large language model. In 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 722–727.
  • Pearce et al. (2020) Hammond Pearce, Benjamin Tan, and Ramesh Karri. 2020. Dave: Deriving automatically verilog from english. In Proceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD. 27–32.
  • Pourreza and Rafiei (2024) Mohammadreza Pourreza and Davood Rafiei. 2024. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Advances in Neural Information Processing Systems 36 (2024).
  • Qiu et al. (2024) Siyu Qiu, Benjamin Tan, and Hammond Pearce. 2024. Explaining EDA synthesis errors with LLMs. arXiv preprint arXiv:2404.07235 (2024).
  • Shuster et al. (2021) Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567 (2021).
  • Thakur et al. (2022) Shailja Thakur, Baleegh Ahmad, Zhenxing Fan, Hammond Pearce, Benjamin Tan, Ramesh Karri, Brendan Dolan-Gavitt, and Siddharth Garg. 2022. Benchmarking Large Language Models for Automated Verilog RTL Code Generation. https://doi.org/10.48550/ARXIV.2212.11140
  • Thakur et al. (2023) Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, and Siddharth Garg. 2023. Verigen: A large language model for verilog code generation. ACM Transactions on Design Automation of Electronic Systems (2023).
  • Tsai et al. (2023) YunDa Tsai, Mingjie Liu, and Haoxing Ren. 2023. Rtlfixer: Automatically fixing rtl syntax errors with large language models. arXiv preprint arXiv:2311.16543 (2023).
  • Wang et al. (2023) Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Qian-Wen Zhang, Zhao Yan, and Zhoujun Li. 2023. Mac-sql: Multi-agent collaboration for text-to-sql. arXiv preprint arXiv:2312.11242 (2023).
  • Wu et al. (2024) Haoyuan Wu, Zhuolun He, Xinyun Zhang, Xufeng Yao, Su Zheng, Haisheng Zheng, and Bei Yu. 2024. Chateda: A large language model powered autonomous agent for eda. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024).
  • Zhong et al. (2020) Ruiqi Zhong, Tao Yu, and Dan Klein. 2020. Semantic evaluation for text-to-SQL with distilled test suites. arXiv preprint arXiv:2010.02840 (2020).