- shlAddMod (with assembly division) is already 4x slower than Montgomery Multiplication based. - constant-time division will be even slower - use montgomery-multiplication based conversion