Experimental Results of AMF-Placer 1.0

Apart from the fundamental mechanisms to support macro placement, some optional optimization techniques are evaluated:

SA-based initial placement
interconnection-density-aware pseudo net weight
utilization-guided search of spreading window
forgetting-rate-based cell spreading update
progressive macro legalization

Since existing open-source analytical FPGA placers do not support mixed-size FPGA placement of aforementioned macros on Ultrascale devices, for comprehensive comparison, according to some state-of-the-art solutions, we implement baseline placement solution with the following features:

quadratic placement, cell spreading and clock region planning algorithms from RippleFPGA[1] and clock-aware initial clustering from [2].
resource demand adjustment and packing algorithms from extended UTPlaceF[3]
SA initial placement
necessary modifications to support macro placement, e.g., macro legalization/packing, but without Tech2-5 shown above.
parallelized

Below are the comparison data which are normalized. We will keep improving our implementation.

	faceDetect				halfsqueezenet				optimsoc				minimap_GENE
	HPWL	Rhpwl	time(s)	Rtime	HPWL	Rhpwl	time(s)	Rtime	HPWL	Rhpwl	time(s)	Rtime	HPWL	Rhpwl	time(s)	Rtime
proposed	446908	1.000	105	1.054	339764	1.000	129	1.000	1771538	1.000	466	1.000	1275346	1.000	558	1.000
w/o tech1	479221	1.072	109	1.088	360567	1.061	132	1.023	9034564	5.100	871	1.870	5132073	4.024	1265	2.266
w/o tech2	487060	1.090	124	1.238	431386	1.270	240	1.857	1825819	1.031	543	1.164	1307249	1.025	626	1.121
w/o tech3	465208	1.041	131	1.315	491385	1.446	517	4.007	1841786	1.040	577	1.239	1790566	1.404	1019	1.826
w/o tech4	450199	1.007	100	1.000	351649	1.035	131	1.015	2658863	1.501	525	1.126	1291565	1.013	561	1.005
w/o tech5	511514	1.145	134	1.344	443765	1.306	138	1.067	1880431	1.061	546	1.172	1430330	1.122	708	1.268
baseline	794201	1.777	234	2.338	1865746	5.491	329	2.549	2231978	1.260	559	1.199	11886747	9.320	3539	6.339

	OpenPiton				digitRecognition				MemN2N				BLSTM_midDensity
	HPWL	Rhpwl	time(s)	Rtime	HPWL	Rhpwl	time(s)	Rtime	HPWL	Rhpwl	time(s)	Rtime	HPWL	Rhpwl	time(s)	Rtime
proposed	1139189	1.000	283	1.035	726929	1.000	254	1.097	801403	1.000	318	1.130	484650	1.000	326	1.241
w/o tech1	6218009	5.458	350	1.280	768796	1.058	256	1.105	963416	1.202	363	1.287	492194	1.016	285	1.083
w/o tech2	1157479	1.016	296	1.079	773957	1.065	272	1.174	862212	1.076	343	1.216	511861	1.056	296	1.126
w/o tech3	1159003	1.017	326	1.190	730071	1.004	321	1.389	916176	1.143	411	1.460	496540	1.025	364	1.383
w/o tech4	1212149	1.064	274	1.000	728094	1.002	231	1.000	822742	1.027	282	1.000	653010	1.347	307	1.169
w/o tech5	1142927	1.003	301	1.098	997207	1.372	244	1.053	923670	1.153	353	1.252	722495	1.491	263	1.000
baseline	1432535	1.258	297	1.086	1047885	1.442	298	1.288	1610425	2.010	547	1.942	907383	1.872	325	1.237

The dominant algorithm for each stage in the proposed placement flow can be parallelized and in the table below, acceleration ratios are demonstrated by changing the number of threads and evaluating placement runtime.

#threads	Rosetta FaceDetect	SpooNN	OptimSoC	MiniMap2	OpenPiton	Rosetta DigitRecog	MemN2N	BLSTM
8 threads	2.17x	2.07x	2.50x	2.86x	2.63x	2.22x	2.61x	2.23x
4 threads	2.01x	1.96x	2.29x	2.57x	2.37x	2.04x	2.35x	1.97x
2 threads	1.56x	1.52x	1.64x	1.81x	1.65x	1.54x	1.68x	1.77x
1 threads	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x

Below is the comparison of AMF-Placer Placement (upper ones) and Vivado Placement (lower ones): yellow for CARRY macros, red for MUX macros, green for BRAM macros, purple for DSP macros, blue for LUTRAM macros. The view of device is rotated left by 90 degree.

The placement congestion level of AMF-Placer is similar to Vivado and below is a figure for the most congested benchmarks (MiniMap2 with PCIE banks and OpenPiton with DDR Interface):

The Runtime Log of ICCAD-2021 Benchmarks

Since we keep developing AMF-Placer to meet more expectation from reviewers and users (especially for timing optimization, multi-SLR/die optimization and more applications), we provide the runtime log files of the old version of placer which generated the result in the ICCAD-2021 paper. We suggest open the log files with VSCode since it seems that VSCode can highlight different parts of the log files for review convenience.

The paper only considered the improvement of wirelength, while now AMF-Placer considers the timing and the clock region impact during global placement and the resultant placements are more practical (e.g., the delays of critical paths are reduced by 20~40%). We are happy to pave the way for people who targets at the practical improvements to gain better results so please feel free to let us know your expectation or demands in the GitHub Issue.

References in this page:

[1] C.-W. Pui, G. Chen, W.-K. Chow, K.-C. Lam, J. Kuang, P. Tu, H. Zhang, E. F. Young, and B. Yu, “Ripplefpga: A routability-driven placement for large-scale heterogeneous fpgas,” in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2016, pp. 1–8.

[2] J. Chen, Z. Lin, Y.-C. Kuo, C.-C. Huang, Y.-W. Chang, S.-C. Chen, C.-H. Chiang, and S.-Y. Kuo, “Clock-aware placement for large-scale heterogeneous fpgas,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 12, pp. 5042–5055, 2020.

[3] W. Li, S. Dhar, and D. Z. Pan, “Utplacef: A routability-driven fpga placer with physical and congestion aware packing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 4, pp. 869–882, 2017.