This paper presents BlendNet, a neural network architecture employing a novel building block called Blend module, which relies on performing binary and fixed-point convolutions in its main and skip paths, respectively. There is a judicious deployment of batch normalizations on both main and skip paths inside the Blend module and in between consecutive Blend modules. This paper also presents a compiler for mapping various BlendNet models obtained by replacing some blocks/modules in various vision neural network models with BlendNet modules to FPGA devices with the goal of minimizing the end-to-end inference latency while achieving high output accuracy. BlendNet-20, derived from ResNet-20 trained on the CIFAR-10 dataset, achieves 88.0% classification accuracy (0.8% higher than the state-of-the-art binary neural network) while it only takes 0.38ms to process each image (1.4x faster than state-of-the-art). Similarly, our BlendMixer model trained on the CIFAR-10 dataset achieves 90.6% accuracy (1.59% less than full precision MLPMixer) while achieving a 3.5x reduction in the model size. Moreover, The reconfigurability of DSP blocks for performing 48-bit bitwise logic operations is utilized to achieve low-power FPGA implementation. Our measurements show that the proposed implementation yields 2.5x lower power consumption.