basic functionality seems to work

This commit is contained in:
Balazs Komuves 2026-05-03 20:29:09 +02:00
parent a72ce0e474
commit cd6b61fab8
No known key found for this signature in database
GPG Key ID: F63B7AEF18435562
25 changed files with 15428 additions and 0 deletions

7
.gitignore vendored Normal file
View File

@ -0,0 +1,7 @@
.DS_Store
dist
dist-newstyle
tmp
*.a
*.o
*.hi

29
LICENSE Normal file
View File

@ -0,0 +1,29 @@
BSD 3-Clause License
Copyright (c) 2017 Christopher A. Taylor, and 2026 Logos
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

54
README.md Normal file
View File

@ -0,0 +1,54 @@
Leopard fast erasure coding library
-----------------------------------
This is a Haskell binding to the ["Leopard" erasure coding library](https://github.com/catid/leopard)
by Christopher A. Taylor.
### What's this about?
Erasure coding allows you to reconstruct a redundantly encoded data even if some
pieces are missing. For example if you encode a piece of data with 10-out-of-15
encoding (usually denoted by `K=10` and `N=15`), then the data is chunked into 15
pieces, and any 10 pieces (together with their index in 1..15) can reconstruct
the original data.
This is very useful for example when dealing with unreliable networks.
Leopard uses [Reed-Solomon code](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)
over binary fields `GF(2^8)` or `GF(2^16)` and low-level optimizations to achieve
high performance.
Reed-Solomon codes also guarantee that any `K` out of `N` pieces can recover the
data, where `K` pieces have exactly the size of the original data (however you also need
the additional information of which available piece is which one out of the `N`).
### Standard notations
The encoding algorithm is called the "code". The original data is chunked into
`K >= 1` pieces. This is then encoded into `N > K` redundant pieces. The ratio
`rho = K / N < 1` is called the "rate" of the code. The expansion factor `1 / rho = N / K`
is the redundancy overhead. Leopard only supports `1/2 <= rho < 1`, that is,
the encoded data is at most twice the size of the original data.
Leopard uses a so-called "systematic code", which means that the first `K` pieces
is simply the original data. The notation `M = N - K` for the number of the remaining,
"parity" pieces is also standard.
Internally, Leopard encodes `K` 8 or 16 bit words ("symbols") into `N` words. By
partitioning the original dataset into sets of `K` bytes (or 16 bit words), we can
trivially recover the above semantics.
### Limitations
Leopard itself has some limitations on the parameters:
- `K >= 2`
- `M <= K`
- `N = K + M <= 65536`
- the chunk size must by divisible by 64 bytes.
### Compatibility
I have not much experience about linking C++ with Haskell. This was tested only
on a single ARM-based computer running macOS.

29
cpp/LICENSE Normal file
View File

@ -0,0 +1,29 @@
BSD 3-Clause License
Copyright (c) 2017, Christopher A. Taylor
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

472
cpp/LeopardCommon.cpp Normal file
View File

@ -0,0 +1,472 @@
/*
Copyright (c) 2017 Christopher A. Taylor. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Leopard-RS nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
#include "LeopardCommon.h"
#include <thread>
namespace leopard {
//------------------------------------------------------------------------------
// Runtime CPU Architecture Check
//
// Feature checks stolen shamelessly from
// https://github.com/jedisct1/libsodium/blob/master/src/libsodium/sodium/runtime.c
#if defined(HAVE_ANDROID_GETCPUFEATURES)
#include <cpu-features.h>
#endif
#if defined(LEO_TRY_NEON)
# if defined(IOS) && defined(__ARM_NEON__)
// Requires iPhone 5S or newer
# else
// Remember to add LOCAL_STATIC_LIBRARIES := cpufeatures
bool CpuHasNeon = false; // V6 / V7
bool CpuHasNeon64 = false; // 64-bit
# endif
#endif
#if !defined(LEO_TARGET_MOBILE)
#ifdef _MSC_VER
#include <intrin.h> // __cpuid
#pragma warning(disable: 4752) // found Intel(R) Advanced Vector Extensions; consider using /arch:AVX
#endif
#ifdef LEO_TRY_AVX2
bool CpuHasAVX2 = false;
#endif
bool CpuHasSSSE3 = false;
#define CPUID_EBX_AVX2 0x00000020
#define CPUID_ECX_SSSE3 0x00000200
static void _cpuid(unsigned int cpu_info[4U], const unsigned int cpu_info_type)
{
#if defined(_MSC_VER) && (defined(_M_X64) || defined(_M_AMD64) || defined(_M_IX86))
__cpuid((int *) cpu_info, cpu_info_type);
#else //if defined(HAVE_CPUID)
cpu_info[0] = cpu_info[1] = cpu_info[2] = cpu_info[3] = 0;
# ifdef __i386__
__asm__ __volatile__ ("pushfl; pushfl; "
"popl %0; "
"movl %0, %1; xorl %2, %0; "
"pushl %0; "
"popfl; pushfl; popl %0; popfl" :
"=&r" (cpu_info[0]), "=&r" (cpu_info[1]) :
"i" (0x200000));
if (((cpu_info[0] ^ cpu_info[1]) & 0x200000) == 0) {
return; /* LCOV_EXCL_LINE */
}
# endif
# ifdef __i386__
__asm__ __volatile__ ("xchgl %%ebx, %k1; cpuid; xchgl %%ebx, %k1" :
"=a" (cpu_info[0]), "=&r" (cpu_info[1]),
"=c" (cpu_info[2]), "=d" (cpu_info[3]) :
"0" (cpu_info_type), "2" (0U));
# elif defined(__x86_64__)
__asm__ __volatile__ ("xchgq %%rbx, %q1; cpuid; xchgq %%rbx, %q1" :
"=a" (cpu_info[0]), "=&r" (cpu_info[1]),
"=c" (cpu_info[2]), "=d" (cpu_info[3]) :
"0" (cpu_info_type), "2" (0U));
# else
__asm__ __volatile__ ("cpuid" :
"=a" (cpu_info[0]), "=b" (cpu_info[1]),
"=c" (cpu_info[2]), "=d" (cpu_info[3]) :
"0" (cpu_info_type), "2" (0U));
# endif
#endif
}
#elif defined(LEO_USE_SSE2NEON)
bool CpuHasSSSE3 = true;
#endif // defined(LEO_TARGET_MOBILE)
void InitializeCPUArch()
{
#if defined(LEO_TRY_NEON) && defined(HAVE_ANDROID_GETCPUFEATURES)
AndroidCpuFamily family = android_getCpuFamily();
if (family == ANDROID_CPU_FAMILY_ARM)
{
if (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON)
CpuHasNeon = true;
}
else if (family == ANDROID_CPU_FAMILY_ARM64)
{
CpuHasNeon = true;
if (android_getCpuFeatures() & ANDROID_CPU_ARM64_FEATURE_ASIMD)
CpuHasNeon64 = true;
}
#endif
#if !defined(LEO_TARGET_MOBILE)
unsigned int cpu_info[4];
_cpuid(cpu_info, 1);
CpuHasSSSE3 = ((cpu_info[2] & CPUID_ECX_SSSE3) != 0);
#if defined(LEO_TRY_AVX2)
_cpuid(cpu_info, 7);
CpuHasAVX2 = ((cpu_info[1] & CPUID_EBX_AVX2) != 0);
#endif // LEO_TRY_AVX2
#ifndef LEO_USE_SSSE3_OPT
CpuHasSSSE3 = false;
#endif // LEO_USE_SSSE3_OPT
#ifndef LEO_USE_AVX2_OPT
CpuHasAVX2 = false;
#endif // LEO_USE_AVX2_OPT
#endif // LEO_TARGET_MOBILE
}
//------------------------------------------------------------------------------
// XOR Memory
void xor_mem(
void * LEO_RESTRICT vx, const void * LEO_RESTRICT vy,
uint64_t bytes)
{
#if defined(LEO_TRY_AVX2)
if (CpuHasAVX2)
{
LEO_M256 * LEO_RESTRICT x32 = reinterpret_cast<LEO_M256 *>(vx);
const LEO_M256 * LEO_RESTRICT y32 = reinterpret_cast<const LEO_M256 *>(vy);
while (bytes >= 128)
{
const LEO_M256 x0 = _mm256_xor_si256(_mm256_loadu_si256(x32), _mm256_loadu_si256(y32));
const LEO_M256 x1 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 1), _mm256_loadu_si256(y32 + 1));
const LEO_M256 x2 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 2), _mm256_loadu_si256(y32 + 2));
const LEO_M256 x3 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 3), _mm256_loadu_si256(y32 + 3));
_mm256_storeu_si256(x32, x0);
_mm256_storeu_si256(x32 + 1, x1);
_mm256_storeu_si256(x32 + 2, x2);
_mm256_storeu_si256(x32 + 3, x3);
x32 += 4, y32 += 4;
bytes -= 128;
};
if (bytes > 0)
{
const LEO_M256 x0 = _mm256_xor_si256(_mm256_loadu_si256(x32), _mm256_loadu_si256(y32));
const LEO_M256 x1 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 1), _mm256_loadu_si256(y32 + 1));
_mm256_storeu_si256(x32, x0);
_mm256_storeu_si256(x32 + 1, x1);
}
return;
}
#endif // LEO_TRY_AVX2
LEO_M128 * LEO_RESTRICT x16 = reinterpret_cast<LEO_M128 *>(vx);
const LEO_M128 * LEO_RESTRICT y16 = reinterpret_cast<const LEO_M128 *>(vy);
do
{
const LEO_M128 x0 = _mm_xor_si128(_mm_loadu_si128(x16), _mm_loadu_si128(y16));
const LEO_M128 x1 = _mm_xor_si128(_mm_loadu_si128(x16 + 1), _mm_loadu_si128(y16 + 1));
const LEO_M128 x2 = _mm_xor_si128(_mm_loadu_si128(x16 + 2), _mm_loadu_si128(y16 + 2));
const LEO_M128 x3 = _mm_xor_si128(_mm_loadu_si128(x16 + 3), _mm_loadu_si128(y16 + 3));
_mm_storeu_si128(x16, x0);
_mm_storeu_si128(x16 + 1, x1);
_mm_storeu_si128(x16 + 2, x2);
_mm_storeu_si128(x16 + 3, x3);
x16 += 4, y16 += 4;
bytes -= 64;
} while (bytes > 0);
}
#ifdef LEO_M1_OPT
void xor_mem_2to1(
void * LEO_RESTRICT x,
const void * LEO_RESTRICT y,
const void * LEO_RESTRICT z,
uint64_t bytes)
{
#if defined(LEO_TRY_AVX2)
if (CpuHasAVX2)
{
LEO_M256 * LEO_RESTRICT x32 = reinterpret_cast<LEO_M256 *>(x);
const LEO_M256 * LEO_RESTRICT y32 = reinterpret_cast<const LEO_M256 *>(y);
const LEO_M256 * LEO_RESTRICT z32 = reinterpret_cast<const LEO_M256 *>(z);
while (bytes >= 128)
{
LEO_M256 x0 = _mm256_xor_si256(_mm256_loadu_si256(x32), _mm256_loadu_si256(y32));
x0 = _mm256_xor_si256(x0, _mm256_loadu_si256(z32));
LEO_M256 x1 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 1), _mm256_loadu_si256(y32 + 1));
x1 = _mm256_xor_si256(x1, _mm256_loadu_si256(z32 + 1));
LEO_M256 x2 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 2), _mm256_loadu_si256(y32 + 2));
x2 = _mm256_xor_si256(x2, _mm256_loadu_si256(z32 + 2));
LEO_M256 x3 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 3), _mm256_loadu_si256(y32 + 3));
x3 = _mm256_xor_si256(x3, _mm256_loadu_si256(z32 + 3));
_mm256_storeu_si256(x32, x0);
_mm256_storeu_si256(x32 + 1, x1);
_mm256_storeu_si256(x32 + 2, x2);
_mm256_storeu_si256(x32 + 3, x3);
x32 += 4, y32 += 4, z32 += 4;
bytes -= 128;
};
if (bytes > 0)
{
LEO_M256 x0 = _mm256_xor_si256(_mm256_loadu_si256(x32), _mm256_loadu_si256(y32));
x0 = _mm256_xor_si256(x0, _mm256_loadu_si256(z32));
LEO_M256 x1 = _mm256_xor_si256(_mm256_loadu_si256(x32 + 1), _mm256_loadu_si256(y32 + 1));
x1 = _mm256_xor_si256(x1, _mm256_loadu_si256(z32 + 1));
_mm256_storeu_si256(x32, x0);
_mm256_storeu_si256(x32 + 1, x1);
}
return;
}
#endif // LEO_TRY_AVX2
LEO_M128 * LEO_RESTRICT x16 = reinterpret_cast<LEO_M128 *>(x);
const LEO_M128 * LEO_RESTRICT y16 = reinterpret_cast<const LEO_M128 *>(y);
const LEO_M128 * LEO_RESTRICT z16 = reinterpret_cast<const LEO_M128 *>(z);
do
{
LEO_M128 x0 = _mm_xor_si128(_mm_loadu_si128(x16), _mm_loadu_si128(y16));
x0 = _mm_xor_si128(x0, _mm_loadu_si128(z16));
LEO_M128 x1 = _mm_xor_si128(_mm_loadu_si128(x16 + 1), _mm_loadu_si128(y16 + 1));
x1 = _mm_xor_si128(x1, _mm_loadu_si128(z16 + 1));
LEO_M128 x2 = _mm_xor_si128(_mm_loadu_si128(x16 + 2), _mm_loadu_si128(y16 + 2));
x2 = _mm_xor_si128(x2, _mm_loadu_si128(z16 + 2));
LEO_M128 x3 = _mm_xor_si128(_mm_loadu_si128(x16 + 3), _mm_loadu_si128(y16 + 3));
x3 = _mm_xor_si128(x3, _mm_loadu_si128(z16 + 3));
_mm_storeu_si128(x16, x0);
_mm_storeu_si128(x16 + 1, x1);
_mm_storeu_si128(x16 + 2, x2);
_mm_storeu_si128(x16 + 3, x3);
x16 += 4, y16 += 4, z16 += 4;
bytes -= 64;
} while (bytes > 0);
}
#endif // LEO_M1_OPT
#ifdef LEO_USE_VECTOR4_OPT
void xor_mem4(
void * LEO_RESTRICT vx_0, const void * LEO_RESTRICT vy_0,
void * LEO_RESTRICT vx_1, const void * LEO_RESTRICT vy_1,
void * LEO_RESTRICT vx_2, const void * LEO_RESTRICT vy_2,
void * LEO_RESTRICT vx_3, const void * LEO_RESTRICT vy_3,
uint64_t bytes)
{
#if defined(LEO_TRY_AVX2)
if (CpuHasAVX2)
{
LEO_M256 * LEO_RESTRICT x32_0 = reinterpret_cast<LEO_M256 *> (vx_0);
const LEO_M256 * LEO_RESTRICT y32_0 = reinterpret_cast<const LEO_M256 *>(vy_0);
LEO_M256 * LEO_RESTRICT x32_1 = reinterpret_cast<LEO_M256 *> (vx_1);
const LEO_M256 * LEO_RESTRICT y32_1 = reinterpret_cast<const LEO_M256 *>(vy_1);
LEO_M256 * LEO_RESTRICT x32_2 = reinterpret_cast<LEO_M256 *> (vx_2);
const LEO_M256 * LEO_RESTRICT y32_2 = reinterpret_cast<const LEO_M256 *>(vy_2);
LEO_M256 * LEO_RESTRICT x32_3 = reinterpret_cast<LEO_M256 *> (vx_3);
const LEO_M256 * LEO_RESTRICT y32_3 = reinterpret_cast<const LEO_M256 *>(vy_3);
while (bytes >= 128)
{
const LEO_M256 x0_0 = _mm256_xor_si256(_mm256_loadu_si256(x32_0), _mm256_loadu_si256(y32_0));
const LEO_M256 x1_0 = _mm256_xor_si256(_mm256_loadu_si256(x32_0 + 1), _mm256_loadu_si256(y32_0 + 1));
const LEO_M256 x2_0 = _mm256_xor_si256(_mm256_loadu_si256(x32_0 + 2), _mm256_loadu_si256(y32_0 + 2));
const LEO_M256 x3_0 = _mm256_xor_si256(_mm256_loadu_si256(x32_0 + 3), _mm256_loadu_si256(y32_0 + 3));
_mm256_storeu_si256(x32_0, x0_0);
_mm256_storeu_si256(x32_0 + 1, x1_0);
_mm256_storeu_si256(x32_0 + 2, x2_0);
_mm256_storeu_si256(x32_0 + 3, x3_0);
x32_0 += 4, y32_0 += 4;
const LEO_M256 x0_1 = _mm256_xor_si256(_mm256_loadu_si256(x32_1), _mm256_loadu_si256(y32_1));
const LEO_M256 x1_1 = _mm256_xor_si256(_mm256_loadu_si256(x32_1 + 1), _mm256_loadu_si256(y32_1 + 1));
const LEO_M256 x2_1 = _mm256_xor_si256(_mm256_loadu_si256(x32_1 + 2), _mm256_loadu_si256(y32_1 + 2));
const LEO_M256 x3_1 = _mm256_xor_si256(_mm256_loadu_si256(x32_1 + 3), _mm256_loadu_si256(y32_1 + 3));
_mm256_storeu_si256(x32_1, x0_1);
_mm256_storeu_si256(x32_1 + 1, x1_1);
_mm256_storeu_si256(x32_1 + 2, x2_1);
_mm256_storeu_si256(x32_1 + 3, x3_1);
x32_1 += 4, y32_1 += 4;
const LEO_M256 x0_2 = _mm256_xor_si256(_mm256_loadu_si256(x32_2), _mm256_loadu_si256(y32_2));
const LEO_M256 x1_2 = _mm256_xor_si256(_mm256_loadu_si256(x32_2 + 1), _mm256_loadu_si256(y32_2 + 1));
const LEO_M256 x2_2 = _mm256_xor_si256(_mm256_loadu_si256(x32_2 + 2), _mm256_loadu_si256(y32_2 + 2));
const LEO_M256 x3_2 = _mm256_xor_si256(_mm256_loadu_si256(x32_2 + 3), _mm256_loadu_si256(y32_2 + 3));
_mm256_storeu_si256(x32_2, x0_2);
_mm256_storeu_si256(x32_2 + 1, x1_2);
_mm256_storeu_si256(x32_2 + 2, x2_2);
_mm256_storeu_si256(x32_2 + 3, x3_2);
x32_2 += 4, y32_2 += 4;
const LEO_M256 x0_3 = _mm256_xor_si256(_mm256_loadu_si256(x32_3), _mm256_loadu_si256(y32_3));
const LEO_M256 x1_3 = _mm256_xor_si256(_mm256_loadu_si256(x32_3 + 1), _mm256_loadu_si256(y32_3 + 1));
const LEO_M256 x2_3 = _mm256_xor_si256(_mm256_loadu_si256(x32_3 + 2), _mm256_loadu_si256(y32_3 + 2));
const LEO_M256 x3_3 = _mm256_xor_si256(_mm256_loadu_si256(x32_3 + 3), _mm256_loadu_si256(y32_3 + 3));
_mm256_storeu_si256(x32_3, x0_3);
_mm256_storeu_si256(x32_3 + 1, x1_3);
_mm256_storeu_si256(x32_3 + 2, x2_3);
_mm256_storeu_si256(x32_3 + 3, x3_3);
x32_3 += 4, y32_3 += 4;
bytes -= 128;
}
if (bytes > 0)
{
const LEO_M256 x0_0 = _mm256_xor_si256(_mm256_loadu_si256(x32_0), _mm256_loadu_si256(y32_0));
const LEO_M256 x1_0 = _mm256_xor_si256(_mm256_loadu_si256(x32_0 + 1), _mm256_loadu_si256(y32_0 + 1));
const LEO_M256 x0_1 = _mm256_xor_si256(_mm256_loadu_si256(x32_1), _mm256_loadu_si256(y32_1));
const LEO_M256 x1_1 = _mm256_xor_si256(_mm256_loadu_si256(x32_1 + 1), _mm256_loadu_si256(y32_1 + 1));
_mm256_storeu_si256(x32_0, x0_0);
_mm256_storeu_si256(x32_0 + 1, x1_0);
_mm256_storeu_si256(x32_1, x0_1);
_mm256_storeu_si256(x32_1 + 1, x1_1);
const LEO_M256 x0_2 = _mm256_xor_si256(_mm256_loadu_si256(x32_2), _mm256_loadu_si256(y32_2));
const LEO_M256 x1_2 = _mm256_xor_si256(_mm256_loadu_si256(x32_2 + 1), _mm256_loadu_si256(y32_2 + 1));
const LEO_M256 x0_3 = _mm256_xor_si256(_mm256_loadu_si256(x32_3), _mm256_loadu_si256(y32_3));
const LEO_M256 x1_3 = _mm256_xor_si256(_mm256_loadu_si256(x32_3 + 1), _mm256_loadu_si256(y32_3 + 1));
_mm256_storeu_si256(x32_2, x0_2);
_mm256_storeu_si256(x32_2 + 1, x1_2);
_mm256_storeu_si256(x32_3, x0_3);
_mm256_storeu_si256(x32_3 + 1, x1_3);
}
return;
}
#endif // LEO_TRY_AVX2
LEO_M128 * LEO_RESTRICT x16_0 = reinterpret_cast<LEO_M128 *> (vx_0);
const LEO_M128 * LEO_RESTRICT y16_0 = reinterpret_cast<const LEO_M128 *>(vy_0);
LEO_M128 * LEO_RESTRICT x16_1 = reinterpret_cast<LEO_M128 *> (vx_1);
const LEO_M128 * LEO_RESTRICT y16_1 = reinterpret_cast<const LEO_M128 *>(vy_1);
LEO_M128 * LEO_RESTRICT x16_2 = reinterpret_cast<LEO_M128 *> (vx_2);
const LEO_M128 * LEO_RESTRICT y16_2 = reinterpret_cast<const LEO_M128 *>(vy_2);
LEO_M128 * LEO_RESTRICT x16_3 = reinterpret_cast<LEO_M128 *> (vx_3);
const LEO_M128 * LEO_RESTRICT y16_3 = reinterpret_cast<const LEO_M128 *>(vy_3);
do
{
const LEO_M128 x0_0 = _mm_xor_si128(_mm_loadu_si128(x16_0), _mm_loadu_si128(y16_0));
const LEO_M128 x1_0 = _mm_xor_si128(_mm_loadu_si128(x16_0 + 1), _mm_loadu_si128(y16_0 + 1));
const LEO_M128 x2_0 = _mm_xor_si128(_mm_loadu_si128(x16_0 + 2), _mm_loadu_si128(y16_0 + 2));
const LEO_M128 x3_0 = _mm_xor_si128(_mm_loadu_si128(x16_0 + 3), _mm_loadu_si128(y16_0 + 3));
_mm_storeu_si128(x16_0, x0_0);
_mm_storeu_si128(x16_0 + 1, x1_0);
_mm_storeu_si128(x16_0 + 2, x2_0);
_mm_storeu_si128(x16_0 + 3, x3_0);
x16_0 += 4, y16_0 += 4;
const LEO_M128 x0_1 = _mm_xor_si128(_mm_loadu_si128(x16_1), _mm_loadu_si128(y16_1));
const LEO_M128 x1_1 = _mm_xor_si128(_mm_loadu_si128(x16_1 + 1), _mm_loadu_si128(y16_1 + 1));
const LEO_M128 x2_1 = _mm_xor_si128(_mm_loadu_si128(x16_1 + 2), _mm_loadu_si128(y16_1 + 2));
const LEO_M128 x3_1 = _mm_xor_si128(_mm_loadu_si128(x16_1 + 3), _mm_loadu_si128(y16_1 + 3));
_mm_storeu_si128(x16_1, x0_1);
_mm_storeu_si128(x16_1 + 1, x1_1);
_mm_storeu_si128(x16_1 + 2, x2_1);
_mm_storeu_si128(x16_1 + 3, x3_1);
x16_1 += 4, y16_1 += 4;
const LEO_M128 x0_2 = _mm_xor_si128(_mm_loadu_si128(x16_2), _mm_loadu_si128(y16_2));
const LEO_M128 x1_2 = _mm_xor_si128(_mm_loadu_si128(x16_2 + 1), _mm_loadu_si128(y16_2 + 1));
const LEO_M128 x2_2 = _mm_xor_si128(_mm_loadu_si128(x16_2 + 2), _mm_loadu_si128(y16_2 + 2));
const LEO_M128 x3_2 = _mm_xor_si128(_mm_loadu_si128(x16_2 + 3), _mm_loadu_si128(y16_2 + 3));
_mm_storeu_si128(x16_2, x0_2);
_mm_storeu_si128(x16_2 + 1, x1_2);
_mm_storeu_si128(x16_2 + 2, x2_2);
_mm_storeu_si128(x16_2 + 3, x3_2);
x16_2 += 4, y16_2 += 4;
const LEO_M128 x0_3 = _mm_xor_si128(_mm_loadu_si128(x16_3), _mm_loadu_si128(y16_3));
const LEO_M128 x1_3 = _mm_xor_si128(_mm_loadu_si128(x16_3 + 1), _mm_loadu_si128(y16_3 + 1));
const LEO_M128 x2_3 = _mm_xor_si128(_mm_loadu_si128(x16_3 + 2), _mm_loadu_si128(y16_3 + 2));
const LEO_M128 x3_3 = _mm_xor_si128(_mm_loadu_si128(x16_3 + 3), _mm_loadu_si128(y16_3 + 3));
_mm_storeu_si128(x16_3, x0_3);
_mm_storeu_si128(x16_3 + 1, x1_3);
_mm_storeu_si128(x16_3 + 2, x2_3);
_mm_storeu_si128(x16_3 + 3, x3_3);
x16_3 += 4, y16_3 += 4;
bytes -= 64;
} while (bytes > 0);
}
#endif // LEO_USE_VECTOR4_OPT
void VectorXOR_Threads(
const uint64_t bytes,
unsigned count,
void** x,
void** y)
{
#ifdef LEO_USE_VECTOR4_OPT
if (count >= 4)
{
int i_end = count - 4;
#pragma omp parallel for
for (int i = 0; i <= i_end; i += 4)
{
xor_mem4(
x[i + 0], y[i + 0],
x[i + 1], y[i + 1],
x[i + 2], y[i + 2],
x[i + 3], y[i + 3],
bytes);
}
count %= 4;
i_end -= count;
x += i_end;
y += i_end;
}
#endif // LEO_USE_VECTOR4_OPT
for (unsigned i = 0; i < count; ++i)
xor_mem(x[i], y[i], bytes);
}
void VectorXOR(
const uint64_t bytes,
unsigned count,
void** x,
void** y)
{
#ifdef LEO_USE_VECTOR4_OPT
if (count >= 4)
{
int i_end = count - 4;
for (int i = 0; i <= i_end; i += 4)
{
xor_mem4(
x[i + 0], y[i + 0],
x[i + 1], y[i + 1],
x[i + 2], y[i + 2],
x[i + 3], y[i + 3],
bytes);
}
count %= 4;
i_end -= count;
x += i_end;
y += i_end;
}
#endif // LEO_USE_VECTOR4_OPT
for (unsigned i = 0; i < count; ++i)
xor_mem(x[i], y[i], bytes);
}
} // namespace leopard

502
cpp/LeopardCommon.h Normal file
View File

@ -0,0 +1,502 @@
/*
Copyright (c) 2017 Christopher A. Taylor. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Leopard-RS nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
#pragma once
/*
TODO:
Mid-term:
+ Add compile-time selectable XOR-only rowops instead of MULADD
+ Look into 12-bit fields as a performance optimization
Long-term:
+ Evaluate the error locator polynomial based on fast polynomial interpolations in O(k log^2 k)
+ Look into getting EncodeL working so we can support larger recovery sets
+ Implement the decoder algorithm from {3} based on the Forney algorithm
*/
/*
FFT Data Layout:
We pack the data into memory in this order:
[Recovery Data (Power of Two = M)] [Original Data] [Zero Padding out to 65536]
For encoding, the placement is implied instead of actual memory layout.
For decoding, the layout is explicitly used.
*/
/*
Encoder algorithm:
The encoder is described in {3}. Operations are done O(K Log M),
where K is the original data size, and M is up to twice the
size of the recovery set.
Roughly in brief:
Recovery = FFT( IFFT(Data_0) xor IFFT(Data_1) xor ... )
It walks the original data M chunks at a time performing the IFFT.
Each IFFT intermediate result is XORed together into the first M chunks of
the data layout. Finally the FFT is performed.
Encoder optimizations:
* The first IFFT can be performed directly in the first M chunks.
* The zero padding can be skipped while performing the final IFFT.
Unrolling is used in the code to accomplish both these optimizations.
* The final FFT can be truncated also if recovery set is not a power of 2.
It is easy to truncate the FFT by ending the inner loop early.
* The FFT operations can be unrolled two layers at a time so that instead
of writing the result of the first layer out and reading it back in for
the second layer, those interactions can happen in registers immediately.
*/
/*
Decoder algorithm:
The decoder is described in {1}. Operations are done O(N Log N), where N is up
to twice the size of the original data as described below.
Roughly in brief:
Original = -ErrLocator * FFT( Derivative( IFFT( ErrLocator * ReceivedData ) ) )
Precalculations:
---------------
At startup initialization, FFTInitialize() precalculates FWT(L) as
described by equation (92) in {1}, where L = Log[i] for i = 0..Order,
Order = 256 or 65536 for FF8/16. This is stored in the LogWalsh vector.
It also precalculates the FFT skew factors (s_i) as described by
equation (28). This is stored in the FFTSkew vector.
For memory workspace N data chunks are needed, where N is a power of two
at or above M + K. K is the original data size and M is the next power
of two above the recovery data size. For example for K = 200 pieces of
data and 10% redundancy, there are 20 redundant pieces, which rounds up
to 32 = M. M + K = 232 pieces, so N rounds up to 256.
Online calculations:
-------------------
At runtime, the error locator polynomial is evaluated using the
Fast Walsh-Hadamard transform as described in {1} equation (92).
At runtime the data is explicit laid out in workspace memory like this:
[Recovery Data (Power of Two = M)] [Original Data (K)] [Zero Padding out to N]
Data that was lost is replaced with zeroes.
Data that was received, including recovery data, is multiplied by the error
locator polynomial as it is copied into the workspace.
The IFFT is applied to the entire workspace of N chunks.
Since the IFFT starts with pairs of inputs and doubles in width at each
iteration, the IFFT is optimized by skipping zero padding at the end until
it starts mixing with non-zero data.
The formal derivative is applied to the entire workspace of N chunks.
This is a massive XOR loop that runs 4 columns in parallel for speed.
The FFT is applied to the entire workspace of N chunks.
The FFT is optimized by only performing intermediate calculations required
to recover lost data. Since it starts wide and ends up working on adjacent
pairs, at some point the intermediate results are not needed for data that
will not be read by the application. This optimization is implemented by
the ErrorBitfield class.
Finally, only recovered data is multiplied by the negative of the
error locator polynomial as it is copied into the front of the
workspace for the application to retrieve.
*/
/*
Finite field arithmetic optimizations:
For faster finite field multiplication, large tables are precomputed and
applied during encoding/decoding on 64 bytes of data at a time using
SSSE3 or AVX2 vector instructions and the ALTMAP approach from Jerasure.
Addition in this finite field is XOR, and a vectorized memory XOR routine
is also used.
*/
#include "leopard.h"
#include <stdint.h>
#ifdef _WIN32
#include <malloc.h>
#endif //_WIN32
#include <vector>
#include <atomic>
#include <memory>
#include <mutex>
#include <condition_variable>
//------------------------------------------------------------------------------
// Constants
// Enable 8-bit or 16-bit fields
#define LEO_HAS_FF8
#define LEO_HAS_FF16
// Enable using SIMD instructions
#define LEO_USE_SSSE3_OPT
#define LEO_USE_AVX2_OPT
// Avoid calculating final FFT values in decoder using bitfield
#define LEO_ERROR_BITFIELD_OPT
// Interleave butterfly operations between layer pairs in FFT
#define LEO_INTERLEAVE_BUTTERFLY4_OPT
// Optimize M=1 case
#define LEO_M1_OPT
// Unroll inner loops 4 times
#define LEO_USE_VECTOR4_OPT
// MacOS M1
#if defined(__aarch64__)
#define LEO_USE_SSE2NEON
#define LEO_TARGET_MOBILE
#endif
//------------------------------------------------------------------------------
// Debug
// Some bugs only repro in release mode, so this can be helpful
//#define LEO_DEBUG_IN_RELEASE
#if defined(_DEBUG) || defined(DEBUG) || defined(LEO_DEBUG_IN_RELEASE)
#define LEO_DEBUG
#ifdef _WIN32
#define LEO_DEBUG_BREAK __debugbreak()
#else
#define LEO_DEBUG_BREAK __builtin_trap()
#endif
#define LEO_DEBUG_ASSERT(cond) { if (!(cond)) { LEO_DEBUG_BREAK; } }
#else
#define LEO_DEBUG_BREAK ;
#define LEO_DEBUG_ASSERT(cond) ;
#endif
//------------------------------------------------------------------------------
// Windows Header
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#ifndef _WINSOCKAPI_
#define DID_DEFINE_WINSOCKAPI
#define _WINSOCKAPI_
#endif
#ifndef NOMINMAX
#define NOMINMAX
#endif
#ifndef _WIN32_WINNT
#define _WIN32_WINNT 0x0601 /* Windows 7+ */
#endif
#include <windows.h>
#endif
#ifdef DID_DEFINE_WINSOCKAPI
#undef _WINSOCKAPI_
#undef DID_DEFINE_WINSOCKAPI
#endif
//------------------------------------------------------------------------------
// Platform/Architecture
#ifdef _MSC_VER
#include <intrin.h>
#endif
#if defined(ANDROID) || defined(IOS)
#define LEO_TARGET_MOBILE
#endif // ANDROID
#if defined(__AVX2__) || (defined (_MSC_VER) && _MSC_VER >= 1900)
#define LEO_TRY_AVX2 /* 256-bit */
#include <immintrin.h>
#define LEO_ALIGN_BYTES 32
#else // __AVX2__
#define LEO_ALIGN_BYTES 16
#endif // __AVX2__
#if !defined(LEO_TARGET_MOBILE)
// Note: MSVC currently only supports SSSE3 but not AVX2
#include <tmmintrin.h> // SSSE3: _mm_shuffle_epi8
#include <emmintrin.h> // SSE2
#elif defined(LEO_USE_SSE2NEON)
#include "sse2neon/sse2neon.h"
#endif // LEO_TARGET_MOBILE
#if defined(HAVE_ARM_NEON_H)
#include <arm_neon.h>
#endif // HAVE_ARM_NEON_H
#if defined(LEO_TARGET_MOBILE)
#define LEO_ALIGNED_ACCESSES /* Inputs must be aligned to LEO_ALIGN_BYTES */
# if defined(HAVE_ARM_NEON_H)
// Compiler-specific 128-bit SIMD register keyword
#define LEO_M128 uint8x16_t
#define LEO_TRY_NEON
#elif defined(LEO_USE_SSE2NEON)
#define LEO_M128 __m128i
#else
#define LEO_M128 uint64_t
# endif
#else // LEO_TARGET_MOBILE
// Compiler-specific 128-bit SIMD register keyword
#define LEO_M128 __m128i
#endif // LEO_TARGET_MOBILE
#ifdef LEO_TRY_AVX2
// Compiler-specific 256-bit SIMD register keyword
#define LEO_M256 __m256i
#endif
// Compiler-specific C++11 restrict keyword
#define LEO_RESTRICT __restrict
// Compiler-specific force inline keyword
#ifdef _MSC_VER
#define LEO_FORCE_INLINE inline __forceinline
#else
#define LEO_FORCE_INLINE inline __attribute__((always_inline))
#endif
// Compiler-specific alignment keyword
// Note: Alignment only matters for ARM NEON where it should be 16
#ifdef _MSC_VER
#define LEO_ALIGNED __declspec(align(LEO_ALIGN_BYTES))
#else // _MSC_VER
#define LEO_ALIGNED __attribute__((aligned(LEO_ALIGN_BYTES)))
#endif // _MSC_VER
namespace leopard {
//------------------------------------------------------------------------------
// Runtime CPU Architecture Check
// Initialize CPU architecture flags
void InitializeCPUArch();
#if defined(LEO_TRY_NEON)
# if defined(IOS) && defined(__ARM_NEON__)
// Does device support NEON?
static const bool CpuHasNeon = true;
static const bool CpuHasNeon64 = true;
# else
// Does device support NEON?
// Remember to add LOCAL_STATIC_LIBRARIES := cpufeatures
extern bool CpuHasNeon; // V6 / V7
extern bool CpuHasNeon64; // 64-bit
# endif
#endif
#if !defined(LEO_TARGET_MOBILE)
# if defined(LEO_TRY_AVX2)
// Does CPU support AVX2?
extern bool CpuHasAVX2;
# endif
// Does CPU support SSSE3?
extern bool CpuHasSSSE3;
#elif defined(LEO_USE_SSE2NEON)
extern bool CpuHasSSSE3;
#endif // LEO_TARGET_MOBILE
//------------------------------------------------------------------------------
// Portable Intrinsics
// Returns highest bit index 0..31 where the first non-zero bit is found
// Precondition: x != 0
LEO_FORCE_INLINE unsigned LastNonzeroBit32(unsigned x)
{
#ifdef _MSC_VER
unsigned long index;
// Note: Ignoring result because x != 0
_BitScanReverse(&index, (uint32_t)x);
return (unsigned)index;
#else
// Note: Ignoring return value of 0 because x != 0
static_assert(sizeof(unsigned) == 4, "Assuming 32 bit unsigneds in LastNonzeroBit32");
return 31 - (unsigned)__builtin_clz(x);
#endif
}
// Returns next power of two at or above given value
LEO_FORCE_INLINE unsigned NextPow2(unsigned n)
{
return 2UL << LastNonzeroBit32(n - 1);
}
//------------------------------------------------------------------------------
// XOR Memory
//
// This works for both 8-bit and 16-bit finite fields
// x[] ^= y[]
void xor_mem(
void * LEO_RESTRICT x, const void * LEO_RESTRICT y,
uint64_t bytes);
#ifdef LEO_M1_OPT
// x[] ^= y[] ^ z[]
void xor_mem_2to1(
void * LEO_RESTRICT x,
const void * LEO_RESTRICT y,
const void * LEO_RESTRICT z,
uint64_t bytes);
#endif // LEO_M1_OPT
#ifdef LEO_USE_VECTOR4_OPT
// For i = {0, 1, 2, 3}: x_i[] ^= x_i[]
void xor_mem4(
void * LEO_RESTRICT x_0, const void * LEO_RESTRICT y_0,
void * LEO_RESTRICT x_1, const void * LEO_RESTRICT y_1,
void * LEO_RESTRICT x_2, const void * LEO_RESTRICT y_2,
void * LEO_RESTRICT x_3, const void * LEO_RESTRICT y_3,
uint64_t bytes);
#endif // LEO_USE_VECTOR4_OPT
// x[] ^= y[]
void VectorXOR(
const uint64_t bytes,
unsigned count,
void** x,
void** y);
// x[] ^= y[] (Multithreaded)
void VectorXOR_Threads(
const uint64_t bytes,
unsigned count,
void** x,
void** y);
//------------------------------------------------------------------------------
// XORSummer
class XORSummer
{
public:
// Set the addition destination and byte count
LEO_FORCE_INLINE void Initialize(void* dest)
{
DestBuffer = dest;
Waiting = nullptr;
}
// Accumulate some source data
LEO_FORCE_INLINE void Add(const void* src, const uint64_t bytes)
{
#ifdef LEO_M1_OPT
if (Waiting)
{
xor_mem_2to1(DestBuffer, src, Waiting, bytes);
Waiting = nullptr;
}
else
Waiting = src;
#else // LEO_M1_OPT
xor_mem(DestBuffer, src, bytes);
#endif // LEO_M1_OPT
}
// Finalize in the destination buffer
LEO_FORCE_INLINE void Finalize(const uint64_t bytes)
{
#ifdef LEO_M1_OPT
if (Waiting)
xor_mem(DestBuffer, Waiting, bytes);
#endif // LEO_M1_OPT
}
protected:
void* DestBuffer;
const void* Waiting;
};
//------------------------------------------------------------------------------
// SIMD-Safe Aligned Memory Allocations
static const unsigned kAlignmentBytes = LEO_ALIGN_BYTES;
static LEO_FORCE_INLINE uint8_t* SIMDSafeAllocate(size_t size)
{
uint8_t* data = (uint8_t*)calloc(1, kAlignmentBytes + size);
if (!data)
return nullptr;
unsigned offset = (unsigned)((uintptr_t)data % kAlignmentBytes);
data += kAlignmentBytes - offset;
data[-1] = (uint8_t)offset;
return data;
}
static LEO_FORCE_INLINE void SIMDSafeFree(void* ptr)
{
if (!ptr)
return;
uint8_t* data = (uint8_t*)ptr;
unsigned offset = data[-1];
if (offset >= kAlignmentBytes)
{
LEO_DEBUG_BREAK; // Should never happen
return;
}
data -= kAlignmentBytes - offset;
free(data);
}
} // namespace leopard

1799
cpp/LeopardFF16.cpp Normal file

File diff suppressed because it is too large Load Diff

93
cpp/LeopardFF16.h Normal file
View File

@ -0,0 +1,93 @@
/*
Copyright (c) 2017 Christopher A. Taylor. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Leopard-RS nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
#pragma once
#include "LeopardCommon.h"
#ifdef LEO_HAS_FF16
/*
16-bit Finite Field Math
This finite field contains 65536 elements and so each element is one byte.
This library is designed for data that is a multiple of 64 bytes in size.
Algorithms are described in LeopardCommon.h
*/
namespace leopard { namespace ff16 {
//------------------------------------------------------------------------------
// Datatypes and Constants
// Finite field element type
typedef uint16_t ffe_t;
// Number of bits per element
static const unsigned kBits = 16;
// Finite field order: Number of elements in the field
static const unsigned kOrder = 65536;
// Modulus for field operations
static const ffe_t kModulus = 65535;
// LFSR Polynomial that generates the field elements
static const unsigned kPolynomial = 0x1002D;
//------------------------------------------------------------------------------
// API
// Returns false if the self-test fails
bool Initialize();
void ReedSolomonEncode(
uint64_t buffer_bytes,
unsigned original_count,
unsigned recovery_count,
unsigned m, // = NextPow2(recovery_count)
const void* const * const data,
void** work); // m * 2 elements
void ReedSolomonDecode(
uint64_t buffer_bytes,
unsigned original_count,
unsigned recovery_count,
unsigned m, // = NextPow2(recovery_count)
unsigned n, // = NextPow2(m + original_count)
const void* const * const original, // original_count elements
const void* const * const recovery, // recovery_count elements
void** work); // n elements
}} // namespace leopard::ff16
#endif // LEO_HAS_FF16

1940
cpp/LeopardFF8.cpp Normal file

File diff suppressed because it is too large Load Diff

93
cpp/LeopardFF8.h Normal file
View File

@ -0,0 +1,93 @@
/*
Copyright (c) 2017 Christopher A. Taylor. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Leopard-RS nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
#pragma once
#include "LeopardCommon.h"
#ifdef LEO_HAS_FF8
/*
8-bit Finite Field Math
This finite field contains 256 elements and so each element is one byte.
This library is designed for data that is a multiple of 64 bytes in size.
Algorithms are described in LeopardCommon.h
*/
namespace leopard { namespace ff8 {
//------------------------------------------------------------------------------
// Datatypes and Constants
// Finite field element type
typedef uint8_t ffe_t;
// Number of bits per element
static const unsigned kBits = 8;
// Finite field order: Number of elements in the field
static const unsigned kOrder = 256;
// Modulus for field operations
static const ffe_t kModulus = 255;
// LFSR Polynomial that generates the field elements
static const unsigned kPolynomial = 0x11D;
//------------------------------------------------------------------------------
// API
// Returns false if the self-test fails
bool Initialize();
void ReedSolomonEncode(
uint64_t buffer_bytes,
unsigned original_count,
unsigned recovery_count,
unsigned m, // = NextPow2(recovery_count)
const void* const * const data,
void** work); // m * 2 elements
void ReedSolomonDecode(
uint64_t buffer_bytes,
unsigned original_count,
unsigned recovery_count,
unsigned m, // = NextPow2(recovery_count)
unsigned n, // = NextPow2(m + original_count)
const void* const * const original, // original_count elements
const void* const * const recovery, // recovery_count elements
void** work); // n elements
}} // namespace leopard::ff8
#endif // LEO_HAS_FF8

4
cpp/build.sh Executable file
View File

@ -0,0 +1,4 @@
#!/bin/bash
g++ -O3 -std=c++11 -c *.cpp
ar rcs libleopard.a *.o

347
cpp/leopard.cpp Normal file
View File

@ -0,0 +1,347 @@
/*
Copyright (c) 2017 Christopher A. Taylor. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Leopard-RS nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
#include "leopard.h"
#include "LeopardCommon.h"
#ifdef LEO_HAS_FF8
#include "LeopardFF8.h"
#endif // LEO_HAS_FF8
#ifdef LEO_HAS_FF16
#include "LeopardFF16.h"
#endif // LEO_HAS_FF16
#include <string.h>
extern "C" {
//------------------------------------------------------------------------------
// Initialization API
static bool m_Initialized = false;
LEO_EXPORT int leo_init_(int version)
{
if (version != LEO_VERSION)
return Leopard_InvalidInput;
leopard::InitializeCPUArch();
#ifdef LEO_HAS_FF8
if (!leopard::ff8::Initialize())
return Leopard_Platform;
#endif // LEO_HAS_FF8
#ifdef LEO_HAS_FF16
if (!leopard::ff16::Initialize())
return Leopard_Platform;
#endif // LEO_HAS_FF16
m_Initialized = true;
return Leopard_Success;
}
//------------------------------------------------------------------------------
// Result
LEO_EXPORT const char* leo_result_string(LeopardResult result)
{
switch (result)
{
case Leopard_Success: return "Operation succeeded";
case Leopard_NeedMoreData: return "Not enough recovery data received";
case Leopard_TooMuchData: return "Buffer counts are too high";
case Leopard_InvalidSize: return "Buffer size must be a multiple of 64 bytes";
case Leopard_InvalidCounts: return "Invalid counts provided";
case Leopard_InvalidInput: return "A function parameter was invalid";
case Leopard_Platform: return "Platform is unsupported";
case Leopard_CallInitialize: return "Call leo_init() first";
}
return "Unknown";
}
//------------------------------------------------------------------------------
// Encoder API
LEO_EXPORT unsigned leo_encode_work_count(
unsigned original_count,
unsigned recovery_count)
{
if (original_count == 1)
return recovery_count;
if (recovery_count == 1)
return 1;
return leopard::NextPow2(recovery_count) * 2;
}
// recovery_data = parity of original_data (xor sum)
static void EncodeM1(
uint64_t buffer_bytes,
unsigned original_count,
const void* const * const original_data,
void* recovery_data)
{
memcpy(recovery_data, original_data[0], buffer_bytes);
leopard::XORSummer summer;
summer.Initialize(recovery_data);
for (unsigned i = 1; i < original_count; ++i)
summer.Add(original_data[i], buffer_bytes);
summer.Finalize(buffer_bytes);
}
LEO_EXPORT LeopardResult leo_encode(
uint64_t buffer_bytes, // Number of bytes in each data buffer
unsigned original_count, // Number of original_data[] buffer pointers
unsigned recovery_count, // Number of recovery_data[] buffer pointers
unsigned work_count, // Number of work_data[] buffer pointers, from leo_encode_work_count()
const void* const * const original_data, // Array of pointers to original data buffers
void** work_data) // Array of work buffers
{
if (buffer_bytes <= 0 || buffer_bytes % 64 != 0)
return Leopard_InvalidSize;
if (recovery_count <= 0 || recovery_count > original_count)
return Leopard_InvalidCounts;
if (!original_data || !work_data)
return Leopard_InvalidInput;
if (!m_Initialized)
return Leopard_CallInitialize;
// Handle k = 1 case
if (original_count == 1)
{
for (unsigned i = 0; i < recovery_count; ++i)
memcpy(work_data[i], original_data[i], buffer_bytes);
return Leopard_Success;
}
// Handle m = 1 case
if (recovery_count == 1)
{
EncodeM1(
buffer_bytes,
original_count,
original_data,
work_data[0]);
return Leopard_Success;
}
const unsigned m = leopard::NextPow2(recovery_count);
const unsigned n = leopard::NextPow2(m + original_count);
if (work_count != m * 2)
return Leopard_InvalidCounts;
#ifdef LEO_HAS_FF8
if (n <= leopard::ff8::kOrder)
{
leopard::ff8::ReedSolomonEncode(
buffer_bytes,
original_count,
recovery_count,
m,
original_data,
work_data);
}
else
#endif // LEO_HAS_FF8
#ifdef LEO_HAS_FF16
if (n <= leopard::ff16::kOrder)
{
leopard::ff16::ReedSolomonEncode(
buffer_bytes,
original_count,
recovery_count,
m,
original_data,
work_data);
}
else
#endif // LEO_HAS_FF16
return Leopard_TooMuchData;
return Leopard_Success;
}
//------------------------------------------------------------------------------
// Decoder API
LEO_EXPORT unsigned leo_decode_work_count(
unsigned original_count,
unsigned recovery_count)
{
if (original_count == 1 || recovery_count == 1)
return original_count;
const unsigned m = leopard::NextPow2(recovery_count);
const unsigned n = leopard::NextPow2(m + original_count);
return n;
}
static void DecodeM1(
uint64_t buffer_bytes,
unsigned original_count,
const void* const * original_data,
const void* recovery_data,
void* work_data)
{
memcpy(work_data, recovery_data, buffer_bytes);
leopard::XORSummer summer;
summer.Initialize(work_data);
for (unsigned i = 0; i < original_count; ++i)
if (original_data[i])
summer.Add(original_data[i], buffer_bytes);
summer.Finalize(buffer_bytes);
}
LEO_EXPORT LeopardResult leo_decode(
uint64_t buffer_bytes, // Number of bytes in each data buffer
unsigned original_count, // Number of original_data[] buffer pointers
unsigned recovery_count, // Number of recovery_data[] buffer pointers
unsigned work_count, // Number of buffer pointers in work_data[]
const void* const * const original_data, // Array of original data buffers
const void* const * const recovery_data, // Array of recovery data buffers
void** work_data) // Array of work data buffers
{
if (buffer_bytes <= 0 || buffer_bytes % 64 != 0)
return Leopard_InvalidSize;
if (recovery_count <= 0 || recovery_count > original_count)
return Leopard_InvalidCounts;
if (!original_data || !recovery_data || !work_data)
return Leopard_InvalidInput;
if (!m_Initialized)
return Leopard_CallInitialize;
// Check if not enough recovery data arrived
unsigned original_loss_count = 0;
unsigned original_loss_i = 0;
for (unsigned i = 0; i < original_count; ++i)
{
if (!original_data[i])
{
++original_loss_count;
original_loss_i = i;
}
}
unsigned recovery_got_count = 0;
unsigned recovery_got_i = 0;
for (unsigned i = 0; i < recovery_count; ++i)
{
if (recovery_data[i])
{
++recovery_got_count;
recovery_got_i = i;
}
}
if (recovery_got_count < original_loss_count)
return Leopard_NeedMoreData;
// Handle k = 1 case
if (original_count == 1)
{
memcpy(work_data[0], recovery_data[recovery_got_i], buffer_bytes);
return Leopard_Success;
}
// Handle case original_loss_count = 0
if (original_loss_count == 0)
{
for(unsigned i = 0; i < original_count; i++)
memcpy(work_data[i], original_data[i], buffer_bytes);
return Leopard_Success;
}
// Handle m = 1 case
if (recovery_count == 1)
{
DecodeM1(
buffer_bytes,
original_count,
original_data,
recovery_data[0],
work_data[original_loss_i]);
return Leopard_Success;
}
const unsigned m = leopard::NextPow2(recovery_count);
const unsigned n = leopard::NextPow2(m + original_count);
if (work_count != n)
return Leopard_InvalidCounts;
#ifdef LEO_HAS_FF8
if (n <= leopard::ff8::kOrder)
{
leopard::ff8::ReedSolomonDecode(
buffer_bytes,
original_count,
recovery_count,
m,
n,
original_data,
recovery_data,
work_data);
}
else
#endif // LEO_HAS_FF8
#ifdef LEO_HAS_FF16
if (n <= leopard::ff16::kOrder)
{
leopard::ff16::ReedSolomonDecode(
buffer_bytes,
original_count,
recovery_count,
m,
n,
original_data,
recovery_data,
work_data);
}
else
#endif // LEO_HAS_FF16
return Leopard_TooMuchData;
return Leopard_Success;
}
} // extern "C"

242
cpp/leopard.h Normal file
View File

@ -0,0 +1,242 @@
/*
Copyright (c) 2017 Christopher A. Taylor. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Leopard-RS nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef CAT_LEOPARD_RS_H
#define CAT_LEOPARD_RS_H
/*
Leopard-RS
MDS Reed-Solomon Erasure Correction Codes for Large Data in C
Algorithms are described in LeopardCommon.h
Inspired by discussion with:
Sian-Jhen Lin <sjhenglin@gmail.com> : Author of {1} {3}, basis for Leopard
Bulat Ziganshin <bulat.ziganshin@gmail.com> : Author of FastECC
Yutaka Sawada <tenfon@outlook.jp> : Author of MultiPar
References:
{1} S.-J. Lin, T. Y. Al-Naffouri, Y. S. Han, and W.-H. Chung,
"Novel Polynomial Basis with Fast Fourier Transform
and Its Application to Reed-Solomon Erasure Codes"
IEEE Trans. on Information Theory, pp. 6284-6299, November, 2016.
{2} D. G. Cantor, "On arithmetical algorithms over finite fields",
Journal of Combinatorial Theory, Series A, vol. 50, no. 2, pp. 285-300, 1989.
{3} Sian-Jheng Lin, Wei-Ho Chung, "An Efficient (n, k) Information
Dispersal Algorithm for High Code Rate System over Fermat Fields,"
IEEE Commun. Lett., vol.16, no.12, pp. 2036-2039, Dec. 2012.
{4} Plank, J. S., Greenan, K. M., Miller, E. L., "Screaming fast Galois Field
arithmetic using Intel SIMD instructions." In: FAST-2013: 11th Usenix
Conference on File and Storage Technologies, San Jose, 2013
*/
// Library version
#define LEO_VERSION 2
// Tweak if the functions are exported or statically linked
//#define LEO_DLL /* Defined when building/linking as DLL */
//#define LEO_BUILDING /* Defined by the library makefile */
#if defined(LEO_BUILDING)
# if defined(LEO_DLL)
#define LEO_EXPORT __declspec(dllexport)
# else
#define LEO_EXPORT
# endif
#else
# if defined(LEO_DLL)
#define LEO_EXPORT __declspec(dllimport)
# else
#define LEO_EXPORT extern
# endif
#endif
#include <stdint.h>
#ifdef __cplusplus
extern "C" {
#endif
//------------------------------------------------------------------------------
// Initialization API
/*
leo_init()
Perform static initialization for the library, verifying that the platform
is supported.
Returns 0 on success and other values on failure.
*/
LEO_EXPORT int leo_init_(int version);
#define leo_init() leo_init_(LEO_VERSION)
//------------------------------------------------------------------------------
// Shared Constants / Datatypes
// Results
typedef enum LeopardResultT
{
Leopard_Success = 0, // Operation succeeded
Leopard_NeedMoreData = -1, // Not enough recovery data received
Leopard_TooMuchData = -2, // Buffer counts are too high
Leopard_InvalidSize = -3, // Buffer size must be a multiple of 64 bytes
Leopard_InvalidCounts = -4, // Invalid counts provided
Leopard_InvalidInput = -5, // A function parameter was invalid
Leopard_Platform = -6, // Platform is unsupported
Leopard_CallInitialize = -7, // Call leo_init() first
} LeopardResult;
// Convert Leopard result to string
LEO_EXPORT const char* leo_result_string(LeopardResult result);
//------------------------------------------------------------------------------
// Encoder API
/*
leo_encode_work_count()
Calculate the number of work_data buffers to provide to leo_encode().
The sum of original_count + recovery_count must not exceed 65536.
Returns the work_count value to pass into leo_encode().
Returns 0 on invalid input.
*/
LEO_EXPORT unsigned leo_encode_work_count(
unsigned original_count,
unsigned recovery_count);
/*
leo_encode()
Generate recovery data.
original_count: Number of original_data[] buffers provided.
recovery_count: Number of desired recovery data buffers.
buffer_bytes: Number of bytes in each data buffer.
original_data: Array of pointers to original data buffers.
work_count: Number of work_data[] buffers, from leo_encode_work_count().
work_data: Array of pointers to work data buffers.
The sum of original_count + recovery_count must not exceed 65536.
The recovery_count <= original_count.
The buffer_bytes must be a multiple of 64.
Each buffer should have the same number of bytes.
Even the last piece must be rounded up to the block size.
Let buffer_bytes = The number of bytes in each buffer:
original_count = static_cast<unsigned>(
((uint64_t)total_bytes + buffer_bytes - 1) / buffer_bytes);
Or if the number of pieces is known:
buffer_bytes = static_cast<unsigned>(
((uint64_t)total_bytes + original_count - 1) / original_count);
Returns Leopard_Success on success.
* The first set of recovery_count buffers in work_data will be the result.
Returns other values on errors.
*/
LEO_EXPORT LeopardResult leo_encode(
uint64_t buffer_bytes, // Number of bytes in each data buffer
unsigned original_count, // Number of original_data[] buffer pointers
unsigned recovery_count, // Number of recovery_data[] buffer pointers
unsigned work_count, // Number of work_data[] buffer pointers, from leo_encode_work_count()
const void* const * const original_data, // Array of pointers to original data buffers
void** work_data); // Array of work buffers
//------------------------------------------------------------------------------
// Decoder API
/*
leo_decode_work_count()
Calculate the number of work_data buffers to provide to leo_decode().
The sum of original_count + recovery_count must not exceed 65536.
Returns the work_count value to pass into leo_encode().
Returns 0 on invalid input.
*/
LEO_EXPORT unsigned leo_decode_work_count(
unsigned original_count,
unsigned recovery_count);
/*
leo_decode()
Decode original data from recovery data.
buffer_bytes: Number of bytes in each data buffer.
original_count: Number of original_data[] buffers provided.
original_data: Array of pointers to original data buffers.
recovery_count: Number of recovery_data[] buffers provided.
recovery_data: Array of pointers to recovery data buffers.
work_count: Number of work_data[] buffers, from leo_decode_work_count().
work_data: Array of pointers to recovery data buffers.
Lost original/recovery data should be set to NULL.
The sum of recovery_count + the number of non-NULL original data must be at
least original_count in order to perform recovery.
Returns Leopard_Success on success.
Returns other values on errors.
*/
LEO_EXPORT LeopardResult leo_decode(
uint64_t buffer_bytes, // Number of bytes in each data buffer
unsigned original_count, // Number of original_data[] buffer pointers
unsigned recovery_count, // Number of recovery_data[] buffer pointers
unsigned work_count, // Number of buffer pointers in work_data[]
const void* const * const original_data, // Array of original data buffers
const void* const * const recovery_data, // Array of recovery data buffers
void** work_data); // Array of work data buffers
#ifdef __cplusplus
}
#endif
#endif // CAT_LEOPARD_RS_H

19
cpp/sse2neon/LICENSE Normal file
View File

@ -0,0 +1,19 @@
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

67
cpp/sse2neon/Makefile Normal file
View File

@ -0,0 +1,67 @@
ifndef CXX
override CXX = g++
endif
ifndef CROSS_COMPILE
processor := $(shell uname -m)
else # CROSS_COMPILE was set
CXX = $(CROSS_COMPILE)g++
CXXFLAGS += -static
LDFLAGS += -static
check_arm := $(shell echo | $(CROSS_COMPILE)cpp -dM - | grep " __ARM_ARCH " | cut -c20-)
ifeq ($(check_arm),8)
processor = aarch64
else ifeq ($(check_arm),7) # detect ARMv7-A only
processor = arm
else
$(error Unsupported cross-compiler)
endif
endif
EXEC_WRAPPER =
ifdef CROSS_COMPILE
EXEC_WRAPPER = qemu-$(processor)
endif
# Follow platform-specific configurations
ifeq ($(processor),$(filter $(processor),aarch64 arm64))
ARCH_CFLAGS = -march=armv8-a+fp+simd+crc
else ifeq ($(processor),$(filter $(processor),i386 x86_64))
ARCH_CFLAGS = -maes -mpclmul -mssse3 -msse4.2
else ifeq ($(processor),$(filter $(processor),arm armv7l))
ARCH_CFLAGS = -mfpu=neon
else
$(error Unsupported architecture)
endif
CXXFLAGS += -Wall -Wcast-qual -I. $(ARCH_CFLAGS) -std=gnu++14
LDFLAGS += -lm
OBJS = \
tests/binding.o \
tests/common.o \
tests/impl.o \
tests/main.o
deps := $(OBJS:%.o=%.o.d)
.SUFFIXES: .o .cpp
.cpp.o:
$(CXX) -o $@ $(CXXFLAGS) -c -MMD -MF $@.d $<
EXEC = tests/main
$(EXEC): $(OBJS)
$(CXX) $(LDFLAGS) -o $@ $^
check: tests/main
$(EXEC_WRAPPER) $^
indent:
@echo "Formatting files with clang-format.."
@if ! hash clang-format-12; then echo "clang-format-12 is required to indent"; fi
clang-format-12 -i sse2neon.h tests/*.cpp tests/*.h
.PHONY: clean check format
clean:
$(RM) $(OBJS) $(EXEC) $(deps)
-include $(deps)

190
cpp/sse2neon/README.md Normal file
View File

@ -0,0 +1,190 @@
# sse2neon
![Github Actions](https://github.com/DLTcollab/sse2neon/workflows/Github%20Actions/badge.svg?branch=master)
A C/C++ header file that converts Intel SSE intrinsics to Arm/Aarch64 NEON intrinsics.
## Introduction
`sse2neon` is a translator of Intel SSE (Streaming SIMD Extensions) intrinsics
to [Arm NEON](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon),
shortening the time needed to get an Arm working program that then can be used to
extract profiles and to identify hot paths in the code.
The header file `sse2neon.h` contains several of the functions provided by Intel
intrinsic headers such as `<xmmintrin.h>`, only implemented with NEON-based counterparts
to produce the exact semantics of the intrinsics.
## Mapping and Coverage
Header file | Extension |
---|---|
`<mmintrin.h>` | MMX |
`<xmmintrin.h>` | SSE |
`<emmintrin.h>` | SSE2 |
`<pmmintrin.h>` | SSE3 |
`<tmmintrin.h>` | SSSE3 |
`<smmintrin.h>` | SSE4.1 |
`<nmmintrin.h>` | SSE4.2 |
`<wmmintrin.h>` | AES |
`sse2neon` aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.
In order to deliver NEON-equivalent intrinsics for all SSE intrinsics used widely,
please be aware that some SSE intrinsics exist a direct mapping with a concrete
NEON-equivalent intrinsic. However, others lack of 1-to-1 mapping, that means the
equivalents are implemented using several NEON intrinsics.
For example, SSE intrinsic `_mm_loadu_si128` has a direct NEON mapping (`vld1q_s32`),
but SSE intrinsic `_mm_maddubs_epi16` has to be implemented with 13+ NEON instructions.
## Usage
- Put the file `sse2neon.h` in to your source code directory.
- Locate the following SSE header files included in the code:
```C
#include <xmmintrin.h>
#include <emmintrin.h>
```
{p,t,s,n,w}mmintrin.h should be replaceable, but the coverage of these extensions might be limited though.
- Replace them with:
```C
#include "sse2neon.h"
```
- Explicitly specify platform-specific options to gcc/clang compilers.
* On ARMv8-A 64-bit targets, you should specify the following compiler option: (Remove `crypto` and/or `crc` if your architecture does not support cryptographic and/or CRC32 extensions)
```shell
-march=armv8-a+fp+simd+crypto+crc
```
* On ARMv8-A 32-bit targets, you should specify the following compiler option:
```shell
-mfpu=neon-fp-armv8
```
* On ARMv7-A targets, you need to append the following compiler option:
```shell
-mfpu=neon
```
## Compile-time Configurations
Considering the balance between correctness and performance, `sse2neon` recognizes the following compile-time configurations:
* `SSE2NEON_PRECISE_MINMAX`: Enable precise implementation of `_mm_min_ps` and `_mm_max_ps`. If you need consistent results such as NaN special cases, enable it.
* `SSE2NEON_PRECISE_DIV`: Enable precise implementation of `_mm_rcp_ps` and `_mm_div_ps` by additional Netwon-Raphson iteration for accuracy.
* `SSE2NEON_PRECISE_SQRT`: Enable precise implementation of `_mm_sqrt_ps` and `_mm_rsqrt_ps` by additional Netwon-Raphson iteration for accuracy.
* `SSE2NEON_PRECISE_DP`: Enable precise implementation of `_mm_dp_pd`. When the conditional bit is not set, the corresponding multiplication would not be executed.
The above are turned off by default, and you should define the corresponding macro(s) as `1` before including `sse2neon.h` if you need the precise implementations.
## Run Built-in Test Suite
`sse2neon` provides a unified interface for developing test cases. These test
cases are located in `tests` directory, and the input data is specified at
runtime. Use the following commands to perform test cases:
```shell
$ make check
```
You can specify GNU toolchain for cross compilation as well.
[QEMU](https://www.qemu.org/) should be installed in advance.
```shell
$ make CROSS_COMPILE=aarch64-linux-gnu- check # ARMv8-A
```
or
```shell
$ make CROSS_COMPILE=arm-linux-gnueabihf- check # ARMv7-A
```
Check the details via [Test Suite for SSE2NEON](tests/README.md).
## Adoptions
Here is a partial list of open source projects that have adopted `sse2neon` for Arm/Aarch64 support.
* [Aaru Data Preservation Suite](https://www.aaru.app/) is a fully-featured software package to preserve all storage media from the very old to the cutting edge, as well as to give detailed information about any supported image file (whether from Aaru or not) and to extract the files from those images.
* [aether-game-utils](https://github.com/johnhues/aether-game-utils) is a collection of cross platform utilities for quickly creating small game prototypes in C++.
* [ALE](https://github.com/sc932/ALE), aka Assembly Likelihood Evaluation, is a tool for evaluating accuracy of assemblies without the need of a reference genome.
* [Apache Doris](https://doris.apache.org/) is a Massively Parallel Processing (MPP) based interactive SQL data warehousing for reporting and analysis.
* [Apache Impala](https://impala.apache.org/) is a lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.
* [Apache Kudu](https://kudu.apache.org/) completes Hadoop's storage layer to enable fast analytics on fast data.
* [ART](https://github.com/dinosaure/art) is an implementation in OCaml of [Adaptive Radix Tree](https://db.in.tum.de/~leis/papers/ART.pdf) (ART).
* [Async](https://github.com/romange/async) is a set of c++ primitives that allows efficient and rapid development in C++17 on GNU/Linux systems.
* [avec](https://github.com/unevens/avec) is a little library for using SIMD instructions on both x86 and Arm.
* [BEAGLE](https://github.com/beagle-dev/beagle-lib) is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages.
* [BitMagic](https://github.com/tlk00/BitMagic) implements compressed bit-vectors and containers (vectors) based on ideas of bit-slicing transform and Rank-Select compression, offering sets of method to architect your applications to use HPC techniques to save memory (thus be able to fit more data in one compute unit) and improve storage and traffic patterns when storing data vectors and models in files or object stores.
* [bipartite_motif_finder](https://github.com/soedinglab/bipartite_motif_finder) as known as BMF (Bipartite Motif Finder) is an open source tool for finding co-occurences of sequence motifs in genomic sequences.
* [Blender](https://www.blender.org/) is the free and open source 3D creation suite, supporting the entirety of the 3D pipeline.
* [Boo](https://github.com/AxioDL/boo) is a cross-platform windowing and event manager similar to SDL or SFML, with additional 3D rendering functionality.
* [CARTA](https://github.com/CARTAvis/carta-backend) is a new visualization tool designed for viewing radio astronomy images in CASA, FITS, MIRIAD, and HDF5 formats (using the IDIA custom schema for HDF5).
* [Catcoon](https://github.com/i-evi/catcoon) is a [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) implementation in C.
* [compute-runtime](https://github.com/intel/compute-runtime), the Intel Graphics Compute Runtime for oneAPI Level Zero and OpenCL Driver, provides compute API support (Level Zero, OpenCL) for Intel graphics hardware architectures (HD Graphics, Xe).
* [Cog](https://github.com/losnoco/Cog) is a free and open source audio player for macOS.
* [dab-cmdline](https://github.com/JvanKatwijk/dab-cmdline) provides entries for the functionality to handle Digital audio broadcasting (DAB)/DAB+ through some simple calls.
* [DISTRHO](https://distrho.sourceforge.io/) is an open-source project for Cross-Platform Audio Plugins.
* [EDGE](https://github.com/3dfxdev/EDGE) is an advanced OpenGL source port spawned from the DOOM engine, with focus on easy development and expansion for modders and end-users.
* [Embree](https://github.com/embree/embree) is a collection of high-performance ray tracing kernels. Its target users are graphics application engineers who want to improve the performance of their photo-realistic rendering application by leveraging Embree's performance-optimized ray tracing kernels.
* [emp-tool](https://github.com/emp-toolkit/emp-tool) aims to provide a benchmark for secure computation and allowing other researchers to experiment and extend.
* [Exudyn](https://github.com/jgerstmayr/EXUDYN) is a C++ based Python library for efficient simulation of flexible multibody dynamics systems.
* [FoundationDB](https://www.foundationdb.org) is a distributed database designed to handle large volumes of structured data across clusters of commodity servers.
* [gmmlib](https://github.com/intel/gmmlib) is the Intel Graphics Memory Management Library that provides device specific and buffer management for the Intel Graphics Compute Runtime for OpenCL and the Intel Media Driver for VAAPI.
* [iqtree2](https://github.com/iqtree/iqtree2) is an efficient and versatile stochastic implementation to infer phylogenetic trees by maximum likelihood.
* [IResearch](https://github.com/iresearch-toolkit/iresearch) is a cross-platform, high-performance document oriented search engine library written entirely in C++ with the focus on a pluggability of different ranking/similarity models.
* [kram](https://github.com/alecazam/kram) is a wrapper to several popular encoders to and from PNG/[KTX](https://www.khronos.org/opengles/sdk/tools/KTX/file_format_spec/) files with [LDR/HDR and BC/ASTC/ETC2](https://developer.arm.com/solutions/graphics-and-gaming/developer-guides/learn-the-basics/adaptive-scalable-texture-compression/single-page).
* [libCML](https://github.com/belosthomas/libCML) is a SLAM library and scientific tool, which include a novel fast thread-safe graph map implementation.
* [libscapi](https://github.com/cryptobiu/libscapi) stands for the "Secure Computation API", providing reliable, efficient, and highly flexible cryptographic infrastructure.
* [libmatoya](https://github.com/matoya/libmatoya) is a cross-platform application development library, providing various features such as common cryptography tasks.
* [Loosejaw](https://github.com/TheHolyDiver/Loosejaw) provides deep hybrid CPU/GPU digital signal processing.
* [Madronalib](https://github.com/madronalabs/madronalib) enables efficient audio DSP on SIMD processors with readable and brief C++ code.
* [minimap2](https://github.com/lh3/minimap2) is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database.
* [MMseqs2](https://github.com/soedinglab/MMseqs2) (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets.
* [MRIcroGL](https://github.com/rordenlab/MRIcroGL) is a cross-platform tool for viewing NIfTI, DICOM, MGH, MHD, NRRD, AFNI format medical images.
* [N2](https://github.com/oddconcepts/n2o) is an approximate nearest neighborhoods algorithm library written in C++, providing a much faster search speed than other implementations when modeling large dataset.
* [nanors](https://github.com/sleepybishop/nanors) is a tiny, performant implementation of [Reed-Solomon codes](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction), capable of reaching multi-gigabit speeds on a single core.
* [niimath](https://github.com/rordenlab/niimath) is a general image calculator with superior performance.
* [NVIDIA GameWorks](https://developer.nvidia.com/gameworks-source-github) has been already used in a lot of games. These repositories are public on GitHub.
* [ofxNDI](https://github.com/leadedge/ofxNDI) is an [openFrameworks](https://openframeworks.cc/) addon to allow sending and receiving images over a network using the [NewTek](https://en.wikipedia.org/wiki/NewTek) Network Device Protocol.
* [OGRE](https://github.com/OGRECave/ogre) is a scene-oriented, flexible 3D engine written in C++ designed to make it easier and more intuitive for developers to produce games and demos utilising 3D hardware.
* [Olive](https://github.com/olive-editor/olive) is a free non-linear video editor for Windows, macOS, and Linux.
* [OpenXRay](https://github.com/OpenXRay/xray-16) is an improved version of the X-Ray engine, used in world famous S.T.A.L.K.E.R. game series by GSC Game World.
* [parallel-n64](https://github.com/libretro/parallel-n64) is an optimized/rewritten Nintendo 64 emulator made specifically for [Libretro](https://www.libretro.com/).
* [PFFFT](https://github.com/marton78/pffft) does 1D Fast Fourier Transforms, of single precision real and complex vectors.
* [pixaccess](https://github.com/oliverue/pixaccess) provides the abstractions for integer and float bitmaps, pixels, and aliased (nearest neighbor) and anti-aliased (bi-linearly interpolated) pixel access.
* [PlutoSDR Firmware](https://github.com/seanstone/plutosdr-fw) is the customized firmware for the [PlutoSDR](https://wiki.analog.com/university/tools/pluto) that can be used to introduce fundamentals of Software Defined Radio (SDR) or Radio Frequency (RF) or Communications as advanced topics in electrical engineering in a self or instructor lead setting.
* [Pygame](https://www.pygame.org) is cross-platform and designed to make it easy to write multimedia software, such as games, in Python.
* [R:RandomFieldsUtils](https://cran.r-project.org/web/packages/RandomFieldsUtils) provides various utilities might be used in spatial statistics and elsewhere. (CRAN)
* [rkcommon](https://github.com/ospray/rkcommon) represents a common set of C++ infrastructure and CMake utilities used by various components of [Intel oneAPI Rendering Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/rendering-toolkit.html).
* [RPCS3](https://github.com/RPCS3/rpcs3) is the world's first free and open-source PlayStation 3 emulator/debugger, written in C++.
* [simd_utils](https://github.com/JishinMaster/simd_utils) is a header-only library implementing common mathematical functions using SIMD intrinsics.
* [SMhasher](https://github.com/rurban/smhasher) provides comprehensive Hash function quality and speed tests.
* [Spack](https://github.com/spack/spack) is a multi-platform package manager that builds and installs multiple versions and configurations of software.
* [srsLTE](https://github.com/srsLTE/srsLTE) is an open source SDR LTE software suite.
* [SSW](https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library) is a fast implementation of the [Smith-Waterman algorithm](https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm), which uses the SIMD instructions to parallelize the algorithm at the instruction level.
* [Surge](https://github.com/surge-synthesizer/surge) is an open source digital synthesizer.
* [XEVE](https://github.com/mpeg5/xeve) (eXtra-fast Essential Video Encoder) is an open sourced and fast MPEG-5 EVC encoder.
* [XMRig](https://github.com/xmrig/xmrig) is an open source CPU miner for [Monero](https://web.getmonero.org/) cryptocurrency.
## Related Projects
* [SIMDe](https://github.com/simd-everywhere/simde): fast and portable implementations of SIMD
intrinsics on hardware which doesn't natively support them, such as calling SSE functions on ARM.
* [CatBoost's sse2neon](https://github.com/catboost/catboost/blob/master/library/cpp/sse/sse2neon.h)
* [ARM\_NEON\_2\_x86\_SSE](https://github.com/intel/ARM_NEON_2_x86_SSE)
* [AvxToNeon](https://github.com/kunpengcompute/AvxToNeon)
* [sse2rvv](https://github.com/FeddrickAquino/sse2rvv): C header file that converts Intel SSE intrinsics to RISC-V Vector intrinsic.
* [sse2msa](https://github.com/i-evi/sse2msa): A C/C++ header file that converts Intel SSE intrinsics to MIPS/MIPS64 MSA intrinsics.
* [POWER/PowerPC support for GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000) contains a series of headers simplifying porting x86\_64 code that makes explicit use of Intel intrinsics to powerpc64le (pure little-endian mode that has been introduced with the [POWER8](https://en.wikipedia.org/wiki/POWER8)).
- implementation: [xmmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/xmmintrin.h), [emmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/emmintrin.h), [pmmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/pmmintrin.h), [tmmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/tmmintrin.h), [smmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/smmintrin.h)
## Reference
* [Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/)
* [Arm Neon Intrinsics Reference](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
* [Neon Programmer's Guide for Armv8-A](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/neon-programmers-guide-for-armv8-a)
* [NEON Programmer's Guide](https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf)
* [qemu/target/i386/ops_sse.h](https://github.com/qemu/qemu/blob/master/target/i386/ops_sse.h): Comprehensive SSE instruction emulation in C. Ideal for semantic checks.
* [Porting Takua Renderer to 64-bit ARM- Part 1](https://blog.yiningkarlli.com/2021/05/porting-takua-to-arm-pt1.html)
* [Porting Takua Renderer to 64-bit ARM- Part 2](https://blog.yiningkarlli.com/2021/07/porting-takua-to-arm-pt2.html)
* [Comparing SIMD on x86-64 and arm64](https://blog.yiningkarlli.com/2021/09/neon-vs-sse.html)
* [Getting started with AWS Graviton](https://github.com/aws/aws-graviton-getting-started)
* [Port with SSE2Neon and SIMDe](https://developer.arm.com/documentation/102581/0200/Port-with-SSE2Neon-and-SIMDe)
* [Genomics: Optimizing the BWA aligner for Arm Servers](https://community.arm.com/arm-community-blogs/b/high-performance-computing-blog/posts/optimizing-genomics-and-the-bwa-aligner-for-arm-servers)
## Licensing
`sse2neon` is freely redistributable under the MIT License.

8801
cpp/sse2neon/sse2neon.h Normal file

File diff suppressed because it is too large Load Diff

84
hs-leopard.cabal Normal file
View File

@ -0,0 +1,84 @@
Cabal-Version: 2.4
Name: hs-leopard
Version: 0.0.1
Synopsis: Haskell bindings to the Leopard fast erasure coding library
Description: Haskell bindings to the Leopard fast erasure coding library.
License: BSD-3-Clause
License-files: LICENSE
Author: Balazs Komuves
Copyright: (c) 2026 Logos
Maintainer: balazs (at) free (dot) technology
Stability: Experimental
Category: Cryptography
Tested-With: GHC == 9.12.1
Build-Type: Simple
--------------------------------------------------------------------------------
extra-source-files: cpp/LeopardCommon.h
cpp/LeopardFF16.h
cpp/LeopardFF8.h
cpp/leopard.h
cpp/sse2neon/sse2neon.h
cpp/LICENSE
README.md
LICENSE
--------------------------------------------------------------------------------
source-repository head
type: git
location: https://github.com/logos-storage/hs-leopard
--------------------------------------------------------------------------------
Library
Build-Depends: base >= 4 && <5,
array >= 0.5 && < 0.6,
random >= 1.3 && < 1.4,
bytestring >= 0.12 && < 0.14
Exposed-Modules: Leopard
Leopard.Codec
Leopard.Example
Leopard.Binding
Leopard.Types
Leopard.Misc
Default-Language: Haskell2010
Default-Extensions: BangPatterns
Hs-Source-Dirs: src
Include-Dirs: cpp
CXX-Sources: cpp/LeopardCommon.cpp
cpp/LeopardFF16.cpp
cpp/LeopardFF8.cpp
cpp/leopard.cpp
Default-Extensions: ForeignFunctionInterface, CPP
ghc-options: -fwarn-tabs -fno-warn-unused-matches -fno-warn-name-shadowing -fno-warn-unused-imports
cc-options: -x c++
cxx-options: -O3 -std=c++11 -lm
extra-libraries: stdc++
--------------------------------------------------------------------------------
Executable testMain
build-depends: base >= 4 && < 5,
bytestring >= 0.12 && < 0.14,
hs-leopard
hs-source-dirs: test
main-is: testMain.hs
Default-Language: Haskell2010
--------------------------------------------------------------------------------

16
src/Leopard.hs Normal file
View File

@ -0,0 +1,16 @@
module Leopard where
--------------------------------------------------------------------------------
import Data.Bits
import Data.Word
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import Leopard.Codec
import Leopard.Types
--------------------------------------------------------------------------------

258
src/Leopard/Binding.hs Normal file
View File

@ -0,0 +1,258 @@
-- | Note: This is an internal module; use @Leopard.Codec@ instead
{-# LANGUAGE ForeignFunctionInterface, CPP, Strict, ScopedTypeVariables #-}
module Leopard.Binding where
--------------------------------------------------------------------------------
import Data.Word
import Data.Array
import Data.Maybe
import Control.Monad
import Foreign.C
import Foreign.C.Types
import Foreign.Ptr
import Foreign.Storable
import Foreign.Marshal
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import Leopard.Types
import Leopard.Misc
--------------------------------------------------------------------------------
-- * error handling
data LeopardResult
= Success -- ^ Operation succeeded
| NeedMoreData -- ^ Not enough recovery data received
| TooMuchData -- ^ Buffer counts are too high
| InvalidSize -- ^ Buffer size must be a multiple of 64 bytes
| InvalidCounts -- ^ Invalid counts provided
| InvalidInput -- ^ A function parameter was invalid
| Platform -- ^ Platform is unsupported
| CallInitialize -- ^ Call leo_init() first
deriving (Eq,Show)
instance Enum LeopardResult where
toEnum ( 0) = Success -- Operation succeeded
toEnum (-1) = NeedMoreData -- Not enough recovery data received
toEnum (-2) = TooMuchData -- Buffer counts are too high
toEnum (-3) = InvalidSize -- Buffer size must be a multiple of 64 bytes
toEnum (-4) = InvalidCounts -- Invalid counts provided
toEnum (-5) = InvalidInput -- A function parameter was invalid
toEnum (-6) = Platform -- Platform is unsupported
toEnum (-7) = CallInitialize -- Call leo_init() first
toEnum _ = error "invalid leopard error code"
fromEnum _ = error "LeopardResult/fromEnum: not implemented"
decodeLeopardResult :: LeopardResult -> Maybe String
decodeLeopardResult result = case result of
Success -> Nothing -- "Operation succeeded"
NeedMoreData -> Just "Not enough recovery data received"
TooMuchData -> Just "Buffer counts are too high"
InvalidSize -> Just "Buffer size must be a multiple of 64 bytes"
InvalidCounts -> Just "Invalid counts provided"
InvalidInput -> Just "A function parameter was invalid"
Platform -> Just "Platform is unsupported"
CallInitialize -> Just "Call leo_init() first"
--------------------------------------------------------------------------------
-- * C++ bindings
{-# NOINLINE initLeopard #-}
initLeopard :: IO ()
initLeopard = do
res <- cpp_leo_init leo_VERSION
if (res == 0)
then return ()
else fail "Leopard initialization failed"
withLeopard :: IO a -> IO a
withLeopard action = do
initLeopard
action
unsafeEncodeIOList :: ECParams -> [ByteString] -> IO (Either LeopardResult [ByteString])
unsafeEncodeIOList ecParams inputChunks = do
ei <- unsafeEncodeIO ecParams (arrayFromList inputChunks)
return $ case ei of
Left err -> Left err
Right arr -> Right (elems arr)
--------------------------------------------------------------------------------
-- | Takes @K@ input chunks, and returns @M@ parity chunks.
--
-- We assume that the chunks have a size which is a multiple of 64 bytes, as
-- the underlying `leopard` library assumes that too...
--
{-# NOINLINE unsafeEncodeIO #-}
unsafeEncodeIO :: ECParams -> Array Int ByteString -> IO (Either LeopardResult (Array Int ByteString))
unsafeEncodeIO ecParams@(ECParams k n) inputChunks = do
let m = n - k
work_cnt <- cpp_leo_encode_work_count (fromIntegral k) (fromIntegral m)
when (work_cnt == 0) $ fail "encode: `leo_encode_work_count` claims invalid input"
let work_cnt_int = fromIntegral work_cnt :: Int
let nchunks = arrayLength inputChunks
let sizes = map B.length (elems inputChunks)
let mb_chunk_size = isUniformList sizes
unless (k == nchunks) $ fail "encode: we need exactly K input chunks"
unless (isJust mb_chunk_size) $ fail "encode: chunk size must be uniform"
let chunk_size = fromJust mb_chunk_size
unless (isDivisibleBy64 chunk_size) $ fail "encode: chunk size should be divisible by 64"
allocaArray nchunks $ \(porigs :: Ptr PtrWord8) -> do
flipZipWithM_ [0..] (elems inputChunks) $ \idx bs -> withByteString bs $ \len ptr -> pokeElemOff porigs idx ptr
allocaArrays (replicate work_cnt_int chunk_size) $ \(ptrs :: [PtrWord8]) -> do
allocaArray work_cnt_int $ \(pworks :: Ptr PtrWord8) -> do
flipZipWithM_ [0..] ptrs $ \idx ptr -> pokeElemOff pworks idx ptr
res <- cpp_leo_encode
(fromIntegral chunk_size) -- Number of bytes in each data buffer
(fromIntegral k) -- Number of original_data[] buffer pointers
(fromIntegral m) -- Number of recovery_data[] buffer pointers
(fromIntegral work_cnt) -- Number of work_data[] buffer pointers, from leo_encode_work_count()
porigs -- Array of pointers to original data buffers
pworks -- Array of work buffers
if res /= 0
then return (Left $ toEnum $ fromIntegral res)
else do
parityChunks <- forM [0..m-1] $ \j -> do
ptr <- peekElemOff pworks j
createByteString chunk_size ptr
return $ Right $ listArray (0,m-1) parityChunks
--------------------------------------------------------------------------------
unsafeDecodeIOList :: ECParams -> [Maybe ByteString] -> IO (Either LeopardResult [ByteString])
unsafeDecodeIOList ecParams mbChunks = do
ei <- unsafeDecodeIO ecParams (arrayFromList mbChunks)
return $ case ei of
Left err -> Left err
Right arr -> Right (elems arr)
{-# NOINLINE unsafeDecodeIO #-}
unsafeDecodeIO :: ECParams -> Array Int (Maybe ByteString) -> IO (Either LeopardResult (Array Int ByteString))
unsafeDecodeIO ecParams@(ECParams k n) mbChunks = do
let m = n - k
work_cnt <- cpp_leo_decode_work_count (fromIntegral k) (fromIntegral m)
when (work_cnt == 0) $ fail "edeode: `leo_decode_work_count` claims invalid input"
let work_cnt_int = fromIntegral work_cnt :: Int
let nchunks = arrayLength mbChunks
let sizes = map B.length (catMaybes $ elems mbChunks)
let mb_chunk_size = isUniformList sizes
unless (n == nchunks) $ fail "encode: we need exactly N encoded chunks"
unless (isJust mb_chunk_size) $ fail "decode: chunk size must be uniform"
let chunk_size = fromJust mb_chunk_size
unless (isDivisibleBy64 chunk_size) $ fail "decode: chunk size should be divisible by 64"
let (origChunks,parityChunks) = splitAt k (elems mbChunks)
allocaArray k $ \(porigs :: Ptr PtrWord8) -> do
flipZipWithM_ [0..] origChunks $ \idx mb -> case mb of
Just bs -> withByteString bs $ \len ptr -> pokeElemOff porigs idx ptr
Nothing -> pokeElemOff porigs idx nullPtr
allocaArray k $ \(pparity :: Ptr PtrWord8) -> do
flipZipWithM_ [0..] parityChunks $ \idx mb -> case mb of
Just bs -> withByteString bs $ \len ptr -> pokeElemOff pparity idx ptr
Nothing -> pokeElemOff pparity idx nullPtr
allocaArrays (replicate work_cnt_int chunk_size) $ \(ptrs :: [PtrWord8]) -> do
allocaArray work_cnt_int $ \(pworks :: Ptr PtrWord8) -> do
flipZipWithM_ [0..] ptrs $ \idx ptr -> pokeElemOff pworks idx ptr
res <- cpp_leo_decode
(fromIntegral chunk_size) -- Number of bytes in each data buffer
(fromIntegral k) -- Number of original_data[] buffer pointers
(fromIntegral m) -- Number of recovery_data[] buffer pointers
(fromIntegral work_cnt) -- Number of work_data[] buffer pointers, from leo_encode_work_count()
porigs -- Array of pointers to original data buffers
pparity -- Array of recovery data buffers
pworks -- Array of work buffers
if res /= 0
then return (Left $ toEnum $ fromIntegral res)
else do
finalChunks <- forM [0..k-1] $ \j -> case origChunks!!j of
Just orig -> return orig
Nothing -> do
ptr <- peekElemOff pworks j
createByteString chunk_size ptr
return $ Right $ listArray (0,k-1) finalChunks
--------------------------------------------------------------------------------
type PtrWord8 = Ptr Word8
leo_VERSION :: CInt
leo_VERSION = 2
foreign import ccall "leo_init_" cpp_leo_init :: CInt -> IO CInt
foreign import ccall "leo_result_string" cpp_leo_result_string :: CInt -> IO CString
----------------------------------------
{-
LEO_EXPORT unsigned leo_encode_work_count(
unsigned original_count,
unsigned recovery_count);
-}
foreign import ccall "leo_encode_work_count" cpp_leo_encode_work_count :: CUInt -> CUInt -> IO CUInt
foreign import ccall "leo_decode_work_count" cpp_leo_decode_work_count :: CUInt -> CUInt -> IO CUInt
----------------------------------------
{-
LEO_EXPORT LeopardResult leo_encode(
uint64_t buffer_bytes, // Number of bytes in each data buffer
unsigned original_count, // Number of original_data[] buffer pointers
unsigned recovery_count, // Number of recovery_data[] buffer pointers
unsigned work_count, // Number of work_data[] buffer pointers, from leo_encode_work_count()
const void* const * const original_data, // Array of pointers to original data buffers
void** work_data); // Array of work buffers
-}
--
-- * `buffer_bytes` must be a multiple of 64
-- * Each buffer should have the same number of bytes.
-- * Even the last piece must be rounded up to the block size.
-- * The first set of recovery_count buffers in work_data will be the result.
--
foreign import ccall "leo_encode" cpp_leo_encode :: Word64 -> CUInt -> CUInt -> CUInt -> Ptr (Ptr a) -> Ptr (Ptr a) -> IO CInt
{-
LEO_EXPORT LeopardResult leo_decode(
uint64_t buffer_bytes, // Number of bytes in each data buffer
unsigned original_count, // Number of original_data[] buffer pointers
unsigned recovery_count, // Number of recovery_data[] buffer pointers
unsigned work_count, // Number of buffer pointers in work_data[]
const void* const * const original_data, // Array of original data buffers
const void* const * const recovery_data, // Array of recovery data buffers
void** work_data);
-}
foreign import ccall "leo_decode" cpp_leo_decode :: Word64 -> CUInt -> CUInt -> CUInt -> Ptr (Ptr a) -> Ptr (Ptr a) -> Ptr (Ptr a) -> IO CInt
--------------------------------------------------------------------------------

36
src/Leopard/Codec.hs Normal file
View File

@ -0,0 +1,36 @@
{-# LANGUAGE Strict #-}
module Leopard.Codec
( LeopardResult
,
)
where
--------------------------------------------------------------------------------
import Data.Bits
import Data.Word
import Data.Array
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import Leopard.Binding
import Leopard.Types
import Leopard.Misc
--------------------------------------------------------------------------------
{-
{-# NOINLINE #-}
encodeIO :: ECParams -> ByteString -> IO EncodedData
encodeIO ecParams@(ECParams k n) input
let m = n - k
let orig_size = B.length input
let chunk_size_0 = ceilDiv orig_size k
let chunk_size = roundUpToMultipleOf 64 chunk_size_0
-}
--------------------------------------------------------------------------------

113
src/Leopard/Example.hs Normal file
View File

@ -0,0 +1,113 @@
module Leopard.Example where
--------------------------------------------------------------------------------
import Data.Word
import Data.Array
import Data.Maybe
import Control.Monad
import System.Random
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import Leopard.Codec
import Leopard.Binding
import Leopard.Types
import Leopard.Misc
--------------------------------------------------------------------------------
init_ :: IO ()
init_ = initLeopard
--------------------------------------------------------------------------------
maxChunks :: Int
maxChunks = 20
exampleLowLevel :: IO ()
exampleLowLevel = void (exampleLowLevel' True)
testLowLevel :: Int -> IO Bool
testLowLevel howMany = do
oks <- replicateM howMany (exampleLowLevel' False)
return (and oks)
exampleLowLevel' :: Bool -> IO Bool
exampleLowLevel' doPrint = withLeopard $ do
k <- randomRIO (2,maxChunks)
m <- randomRIO (1,k)
let n = k + m
let ecp = ECParams
{ _ecK = k
, _ecN = n
}
-- let chunkSize = 64
chunkSize <- ((\x -> x * 64) <$> randomRIO (1,100))
exampleLowLevel'' ecp chunkSize doPrint
--------------------------------------------------------------------------------
exampleLowLevel'' :: ECParams -> Int -> Bool -> IO Bool
exampleLowLevel'' ecp@(ECParams k n) chunkSize doPrint = do
let m = n - k
when doPrint $ do
putStrLn "Leopard example (low level)"
putStrLn "---------------------------"
putStrLn $ "K = " ++ show k
putStrLn $ "N = " ++ show n
putStrLn $ "M = " ++ show m
putStrLn $ "chunk size = " ++ show chunkSize ++ " bytes"
origs <- replicateM k (randomByteString chunkSize)
parity <- failIfLeft =<< unsafeEncodeIOList ecp origs
let encoded = arrayFromList (origs ++ parity)
nbad <- randomRIO (0,m)
when doPrint $ putStrLn $ "#lost chunks = " ++ show nbad
partial <- elems <$> maskRandomly nbad encoded
let ngood = sum [ 1 | Just _ <- partial ]
unless (nbad + ngood == n) $ error "fatal: nbad + ngood /= N"
-- when doPrint $ print $ map isJust partial
decoded <- failIfLeft =<< unsafeDecodeIOList ecp partial
let ok = (origs == decoded)
when doPrint $ putStrLn $ "reconstruction successful = " ++ show ok
{-
when doPrint $ do
printChunks "original" origs
printChunks "parity" parity
printChunks "reconstructed" decoded
-}
return ok
--------------------------------------------------------------------------------
failIfLeft :: Either LeopardResult a -> IO a
failIfLeft (Left err) = fail (show $ decodeLeopardResult err)
failIfLeft (Right res) = return res
--------------------------------------------------------------------------------
printChunks :: String -> [ByteString] -> IO ()
printChunks title bss = do
putStrLn ""
putStrLn title
putStrLn (replicate (length title) '-')
flipZipWithM_ [0..] bss $ \idx bs -> do
putStrLn $ " - " ++ show idx ++ ": " ++ byteStringToHexString bs
--------------------------------------------------------------------------------

155
src/Leopard/Misc.hs Normal file
View File

@ -0,0 +1,155 @@
{-# LANGUAGE Strict #-}
module Leopard.Misc where
--------------------------------------------------------------------------------
import Data.Bits
import Data.Word
import Data.Array
import Control.Monad
import System.Random
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.Marshal
import Foreign.Storable
import Text.Printf
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import qualified Data.ByteString.Internal as BI
--------------------------------------------------------------------------------
-- * Integer logarithm
-- | Largest integer @k@ such that @2^k@ is smaller or equal to @n@
integerLog2' :: Integer -> Int
integerLog2' n = go n where
go 0 = -1
go k = 1 + go (shiftR k 1)
-- | Smallest integer @k@ such that @2^k@ is larger or equal to @n@
ceilingLog2' :: Integer -> Int
ceilingLog2' 0 = 0
ceilingLog2' n = 1 + go (n-1) where
go 0 = -1
go k = 1 + go (shiftR k 1)
integerLog2 :: Int -> Int
integerLog2 = integerLog2' . fromIntegral
ceilingLog2 :: Int -> Int
ceilingLog2 = ceilingLog2' . fromIntegral
--------------------------------------------------------------------------------
-- * Division
-- | @ceil( a / b )@
ceilDiv :: Int -> Int -> Int
ceilDiv a b = div (a+b-1) b
isDivisibleBy64 :: Int -> Bool
isDivisibleBy64 n = (mod n 64 == 0)
-- | Rounding up to the multiple of the first argument
roundUpToMultipleOf :: Int -> Int -> Int
roundUpToMultipleOf size x = size * (ceilDiv x size)
--------------------------------------------------------------------------------
-- * Bytestrings
partitionBS :: Int -> ByteString -> [ByteString]
partitionBS len = go where
go :: ByteString -> [ByteString]
go bs = if B.null bs
then []
else B.take len bs : go (B.drop len bs)
withByteString :: ByteString -> (Int -> Ptr Word8 -> IO a) -> IO a
withByteString bs@(BI.BS fptr len) action =
withForeignPtr fptr $ \ptr -> action len ptr
createByteString :: Int -> Ptr Word8 -> IO ByteString
createByteString len src = BI.create len $ \tgt -> copyBytes tgt src len
randomByteString :: Int -> IO ByteString
randomByteString len = do
xs <- replicateM len randomIO :: IO [Word8]
return (B.pack xs)
byteStringToHexString :: ByteString -> String
byteStringToHexString = concatMap f . B.unpack where
f :: Word8 -> String
f = printf "%02x"
--------------------------------------------------------------------------------
-- * Arrays
arrayLength :: Array Int a -> Int
arrayLength arr = let (u,v) = bounds arr in v - u + 1
arrayFromList :: [a] -> Array Int a
arrayFromList xs = listArray (0,length xs - 1) xs
--------------------------------------------------------------------------------
-- * Random masks
-- | There will be @k@ @Nothing@-s in the resulting array
maskRandomly :: Int -> Array Int a -> IO (Array Int (Maybe a))
maskRandomly k arr = do
mask <- randomBoolMask (arrayLength arr) k
let (u,v) = bounds arr
return $ listArray (u,v)
[ if b then Just x else Nothing | (x,b) <- zip (elems arr) (elems mask) ]
-- | @randomBoolMask n k@ will give you @k@ falses and @(n-k)@ trues
randomBoolMask :: Int -> Int -> IO (Array Int Bool)
randomBoolMask n k = go k trues where
trues :: Array Int Bool
trues = listArray (0,n-1) (replicate n True)
go :: Int -> Array Int Bool -> IO (Array Int Bool)
go 0 arr = return arr
go k arr = do
j <- randomRIO (0,n-1)
case arr!j of
True -> go (k-1) (arr // [(j,False)])
False -> go k arr
--------------------------------------------------------------------------------
-- * Marshal
allocaArrays :: Storable a => [Int] -> ([Ptr a] -> IO b) -> IO b
allocaArrays sizes action = go sizes [] where
go [] ptrs = action (reverse ptrs)
go (k:ks) ptrs = allocaArray k $ \ptr -> go ks (ptr : ptrs)
--------------------------------------------------------------------------------
-- * Monad
flipZipWithM_ :: Monad m => [a] -> [b] -> (a -> b -> m ()) -> m ()
flipZipWithM_ xs ys action = zipWithM_ action xs ys
--------------------------------------------------------------------------------
-- * Misc
-- | If all the elements of the input list are the same, then it returns that element
isUniformList :: Eq a => [a] -> Maybe a
isUniformList [] = error "isUniformList: empty input"
isUniformList (x0:x0s) = go x0s where
go [] = Just x0
go (u:us) = if u == x0
then go us
else Nothing
isUniformList_ :: Eq a => [a] -> a
isUniformList_ xs = case isUniformList xs of
Just x -> x
Nothing -> error "isUniformList_: not an uniform list"
--------------------------------------------------------------------------------

62
src/Leopard/Types.hs Normal file
View File

@ -0,0 +1,62 @@
{-# LANGUAGE Strict #-}
module Leopard.Types where
--------------------------------------------------------------------------------
import Data.Bits
import Data.Word
import Data.Array
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import Leopard.Misc
--------------------------------------------------------------------------------
-- | Note: Recause of a restriction of the underlying Leopard library, you should have
-- @K >= 2@, @N <= 2*K@ and @N <= 65536@.
data ECParams = ECParams
{ _ecK :: Int -- ^ @K@ is the number of original chunks
, _ecN :: Int -- ^ @N@ is the number of chunks after encoding
}
deriving (Eq,Show)
-- | Number of \"parity\" chunks
ecM :: ECParams -> Int
ecM params = _ecN params - _ecK params
isValidECParams :: ECParams -> Bool
isValidECParams (ECParams k n) = and
[ k > 1
, k <= 32768
, k < n
, n <= 2 * k
]
--------------------------------------------------------------------------------
data Encoding = Encoding
{ _ecParams :: ECParams -- ^ the erasure coding parameters
, _chunkSize :: Int -- ^ size of an EC chunk
, _origDataSize :: Int -- ^ if not divisible by @K@, it can be smaller than @K x chunkSize@
}
deriving (Eq,Show)
isValidEncoding :: Encoding -> Bool
isValidEncoding (Encoding params@(ECParams k n) chunkSize dataSize) = and
[ isValidECParams params
, chunkSize == ceilDiv dataSize k
, isDivisibleBy64 chunkSize
]
--------------------------------------------------------------------------------
data EncodedData = EncodedData
{ _encoding :: Encoding
, _chunks :: Array Int ByteString
}
deriving (Eq,Show)
--------------------------------------------------------------------------------

16
test/testMain.hs Normal file
View File

@ -0,0 +1,16 @@
module Main where
--------------------------------------------------------------------------------
import Leopard.Codec
import Leopard.Example
--------------------------------------------------------------------------------
main :: IO ()
main = do
exampleLowLevel
--------------------------------------------------------------------------------