* Add MULX/ADOX/ADCX assembly for squaring 4 limbs * Add squarings for 6 limbs * Use the new square assembly where relevant * Fix 32-bit register name and calling convention * typo * Disable MontRed ASM for 2 limbs or less