In the field of High Performance Computing, communications among processes represent a typical bottleneck for massively parallel scientific applications. Object of this research is the development of a network interface card with specific offloading capabilities that could help large scale simulations in terms of communication latency and scalability with the number of computing elements. In particular this work deals with the development of a double precision floating point complex arithmetic unit with a parallel-pipelined architecture, in order to implement a massively parallel computing system tailored for three dimensional Fast Fourier Transform.