270 likes | 421 Views
WireMap FPGA Technology Mapping for Improved Routability. Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko, UC Berkeley. Outline. Motivation and algorithm overview Review of area recovery Algorithm details Results and summary. Motivation.
E N D
WireMapFPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko, UC Berkeley
Outline Motivation and algorithm overview Review of area recovery Algorithm details Results and summary
Motivation Generic: Cut-based mapping algorithms do well in minimizing logic level and area (LUT count) Could we change cut-based mapping to improve netlist for packing, placement, and routing? Specific 1 Fewer pin-to-pin connections should make the design easier to place and route Could we come up with a mapping algorithm to minimize the total # of connections in a design? Specific 2 Newer FPGAs allow two outputs per LUT Could we produce a mapping that “pack” better into these dual-output LUTs?
Area Recovery Overview Perform delay-optimal mapping first Not all paths are critical Perform area recovery on non-critical paths Consider all nodes with positive slack For each node, look for a different cut reducing area Area recovery heuristics Area-flow(global view) Chooses cuts with better logic sharing Exact local area(local view) Minimizes the number of LUTs by looking one node at a time Both are important
Edge Recovery Overview Find a simple-to-compute metric to minimize edge count and create smaller LUTs Definition Edge = pin-to-pin connection between LUTs • Cut-based area recovery algorithms can be extended to minimize edges!
Edge Flow Cost Functions Edge flow phase Use edge flow to minimize global edge count Exact local edge phase Exactly minimize edge count within MFFCs
WireMap Algorithm Input: And-Inverter Graph Compute K-feasible cuts for each node Compute best arrival time at each node In topological order (from PI to PO) Compute the depth of all cuts and choose the best one Perform area and edge recovery Using area flow and edge flow Using exact local area and exact local edge Choose the best cover Output: Mapped Netlist
Algorithm – Edge Flow Do delay-optimal mapping Compute slack at each node Do area recovery with area-flow Visit nodes in topological order from PI to PO Choose cuts, which do not exceed slack budget and have smallest area-flow If two cuts have the same area-flow, then choose the cut with the lower edge-flow
Algorithm - Exact Local Edges After optimization with area flow + edge flow described on the previous page Do edge recovery with exact edges Visit nodes in topological order from PI to PO Among all cuts within slack budget, choose cut with smallest area, and to break ties choose cuts with lower number of edges • Note: Unlike edge-flow, no estimation is involved
Experimental Setup Implemented WireMap in ABC Compared WireMap against two algorithms in ABC Baseline – basic mapping with area recovery Mapping with Structure Choices (MSC) – mapping with area recovery for several netlists produced by synthesis WireMap was implemented on top of MSC Used VPR to place/route design for wirelength and critical path delays Single LUT cluster, single length wire segment model Used SIS to pack single-output LUTs into dual-output LUTs using maximum cardinality matching
Results Summary MSC is superior to baseline mapping Single-output LUT count reduced by 9.1% Edge count reduced by 8.1% Dual-output LUT count reduced by 7.7% WireMap leads to further reduction in edges by 9.3% and dual-output LUT count by 9.4% versus MSC Single-output LUT count only reduced by 1.3% wrt. MSC WireMap reduction of edges and dual-output LUTs is not directly related to single-output LUT reduction
Comparison of Area Recovery and Area/Edge Recovery Flow Mapping (K = 6) Area recovery Area recovery Area/Edge recovery WireMap leads to further reduction in edges by 9.3% WireMap leads to dual-output LUT count reduction by 9.4%
Wirelength, Channel Width, and Critical Path Delay Comparison Wirelength was reduced by 8.5% vs. MSC Minimum channel width reduced by 6% Critical path delay reduced by 2.3% Area recovery Area recovery Area/Edge recovery twl = total wire length, mcw = minimum channel width required to route in VPR, cpd = critical path delay with min channel width across the three implementations
WireMap Results – LUT Packing Reduced LUT Distribution: MSC vs. WireMap Reduced 60.00% Increased 50.00% Increased 40.00% %LUTs 30.00% 20.00% 10.00% 0.00% LT2 LT3 LT4 LT5 LT6 4.71% 8.00% 15.87% 23.49% 47.93% MSC 10.12% 12.66% 17.89% 20.19% 39.14% WireMap MSC WireMap The histogram shows how the single-output LUT size distribution is affected, leading to a 9.4% reduction in dual output LUT6s
Summary Presented cut-based structural mapping with minimization of the number of edges Extended area recover to perform edge recovery Area flow Edge flow Exact local area Exact local edges Experimental results Reduced the total number of pin-to-pin connections Improved QoR after place-and-route Improves packing by increasing ratio of smaller LUTs
Backup MaterialTechnology Mapping Delay-optimal mapping Delay-optimal mapping for all nodes Area recovery Global area recovery Local (exact) area recovery Cut size K = 3 f 3 Cut {pqr} of node f has arrival time 3 s r p q 1 1 2 1 a c e b d f f 2 s Cut {stu} of node f has arrival time 2 1 t u 1 1 a c e b d f
Appendix - How to Measure Area? Suppose we use the naïve definition: Area (cut) = 1 + [ Σ area (fanin) ] cut {pcd} cut {abq} y x x y 1 ? p q r p q r a c e a c e b d f b d f Area of cut {pcd} = 1 + [1 + 0 + 0] = 2 Area of cut {abq} = 1 + [ 0 + 0 + 1] = 2 Naïve definition says both cuts are equally good in area Naïve definition ignores sharing due to multiple fanouts
Appendix - Area-flow area-flow (cut) = 1 + [ Σ ( area-flow ( fanin ) / fanout_num( fanin ) ) ] cut {pcd} y x x y ½ p q r p q r a c e a c e b d f b d f Area-flow of cut {pcd} = 1 + [1 + 0 + 0] = 2 Area-flow of cut {abq} = 1 + [ 0/1 + 0/1 + ½] = 1.5 Area-flow recognizes that cut {abq} is better Area-flow “correctly” accounts for sharing
Appendix - Exact Local Area Exact-local-area (cut) = 1 + [ Σ exact-local-area (fanin with no other fanout) ] f f p p 1 6 6 6 6 1 1 1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 q q s s t t a b c d e f a b c d e f Cut {pef} Area flow = 1+ [(.25+.25+3)/2] = 2.75 Exact area = 1 + 0 (p is used elsewhere) Exact area will choose this cut. Cut {stq} Area flow = 1+ [.25+.25 +1] = 2.5 Exact area = 1 + 1 = 2 (due to q) Area flow will choose this cut.
Example Exact-local-area (cut) = 1 + [ Σ exact-local-area (fanin with no other fanout) ] f f p p 2 6 6 6 6 2.5 2 1 2 2/4 2/4 1/8 1/8 1/8 1/8 q q s s t t a b c d e f a b c d e f Cut {pef} Area flow = 1+ [(.25+.25+3)/2] = 2.75 Edge flow = 3+ [1 + 1 + (2.5+2.5+2)/2] = 8.5 Exact area = 1 + 0 (p is used elsewhere) Exact edge = 3 + 0 (p is NOT MFFC) Cut {stq} Area flow = 1+ [.25+.25 +1] = 2.5 Edge flow = 3+ [2 + 4(0.25)] = 6 Exact area = 1 + 1 = 2 (due to q) Exact edge = 3 + 2 = 5 (q is MFFC)
Appendix - Tuning Mapping for Placement Placement-aware priority cost function The total number of edges in a mapped network Advantages Correlates with the total wire-length after placement Easy to take into account during area recovery Treat “edges” as “area”, resulting in Edge flow (similar to area flow) Exact local edges (similar to exact local area) WireMap New placement-aware mapping algorithm
Edge recovery overview Key: Find a simple to compute cut metric that minimizes edge counts and creates more small LUTs Edge flow phase: Use edge flow cost function to minimize global edge counts Exact edge phase: Use optimal algorithm to minimize edge counts within MFFCs
Appendix – Additional VPR Results VPR Result for 4-LUT cluster (resemble commercial FPGA SLICE structure)