A new version of Montgomery’s algorithm for modular multiplicationof large integers and its implementation in hardware is presented.It has been designed to meet the predominant requirements of mostmodern devices: small chip area and low power consumption. The algorithmis superior to the original method by a factor of 2, with respectto both area and latency. The new method has a simple structure.It requires a small amount of precomputation and storage in orderto reduce the number of neccessary additions by a factor of 2.