* Clean-up code, part 1
* Managed to get best borrow code for the not inlined substraction #10
* Implement in place substraction in terms of substraction #10
* Another unneed proc removal/temporary step
* more cleanup
* Upgrade benchmark to Uint256
* Special case when divisor is less than halfSize x2 speed 🔥 (still 4x slower than ttmath on Uint256)
* Division: special case if dividend can overflow. 10% improvement.
* forgot to undo normalization (why did the test pass :??)
* 1st part, special cases of fast division
* Change bitops, simplify bithacks to detect new fast division cases
* 25% speed increase. Within 3x of ttmath
* Reimplement multiplication with minimum allocation
* Fix call. Now only 2x slower than ttmath
* Prepare for optimizing comparison operators
* Comparison inlining and optimization. 25% speed increase. 50% slower than ttmath now 🔥
* Fix comparison, optimize one()
* inline initMpUintImpl for another 20% speed. Only 20% slower than ttmath without ASM