Ethash with cpp

Last Updated on 2020年7月20日

This is a rewrite of offical wiki page REVISION 23 in c/cpp. Basicly all the codes are copied from Here with release v0.3.0, the missed definations could be found there, you can also treat this document as a guidence to read the cpp ethash project.

In conclusion, when comparing with the python version, the only difference is that the cpp version uses a “better” memeory access pattern on the full_dataset (the DAG), this pattern combines two adjacent sha512 value into one sha1024. So the cacl_dataset() (which write to full_dataset) and the hashimoto() (which read on full_dataset) are modified.

Summary

Ethash is the planned PoW algorithm for Ethereum 1.0, it is the core algorithm to mine a eth block. The general route that the algorithm takes is as follows:

  1. There exists a seed that changes after every 30000 (EPOCH_LENGTH) blocks are mined.
  2. From the seed, one can compute a pseudorandom cache. All clients store the cache.
  3. From the cache, we can generate a dataset (often called DAG), with the property that each item in the dataset depends on only a small number of items from the cache. Because claculating dataset from cache takes time, full clients and miners store the dataset to improve performance.
  4. Mining involves grabbing random slices of the dataset and hashing them together. Verification can be done with low memory by using the cache to regenerate the specific pieces of the dataset that you need, so light clients could only store the cache.

The large dataset is updated once every 30000 blocks, so the vast majority of a miner’s effort will be reading the dataset, not making changes to it.

See https://github.com/ethereum/wiki/wiki/Ethash-Design-Rationale for design rationale considerations for this algorithm.

A naive ethash minning algorithm could be defined as follows:

// block_number : the block_number of the block to be mined
// header_hash: the header hash of the block  to be mined
// target: decided by the `difficuly` value of system
int naivie_mine(int block_number,hash256 header_hash,hash256 target)
{

    int epoch_number = calc_epoch_number(block_number);

    hash256 seed=calculate_seed(epoch_number);// by the way, if given a seed, we could use loop-try to get the epoch_number from seed.
    int cache_item_num = ethash_calculate_light_cache_num_items(epoch_number);
    int dataset_item_num = ethash_calculate_full_dataset_num_items(epoch_number);

    std::vector<hash512> light_cache(cache_item_num);
    std::vector<hash1024> dataset(dataset_item_num);

    build_light_cache(light_cache.data(), light_cache.size(), seed);
    calc_dataset(dataset.data(), dataset.size(), light_cache.data(), light_cache.size());

    int nonce = 0;
    hash256 final_hash;
    do
    {
        ++nonce;
        final_hash = hash(dataset.data(), dataset.size(), header_hash, nonce);
    }
    while (!is_less_or_equal(final_hash, target));

    printf("a valid nonce found, work proved")
}

Definitions

We employ the following definitions:


constexpr static int light_cache_init_size = 1 << 24; constexpr static int light_cache_growth = 1 << 17; constexpr static int light_cache_rounds = 3; constexpr static int full_dataset_init_size = 1 << 30; constexpr static int full_dataset_growth = 1 << 23; constexpr static int full_dataset_item_parents = 256; constexpr static int epoch_length = 30000; constexpr static int num_dataset_accesses = 64; union hash256 { uint64_t words[4]; uint32_t hwords[8]; uint8_t bytes[32]; }; union hash512 { uint64_t words[8]; uint32_t half_words[16]; uint8_t bytes[64]; }; // hash 1024 is just a simple container which holds two hash512 values // it will be used when calculating the "mix hash" union hash1024 { union hash512 hashes[2]; uint64_t words[16]; uint32_t hwords[32]; uint8_t bytes[128]; }; int calc_epoch_number(int block_number) { return block_number / epoch_length; } hash256 calculate_seed(int epoch_number) noexcept { hash256 seed = {}; for (int i = 0; i < epoch_number; ++i) seed = keccak256(seed); return seed; }

A note regarding “SHA3” hashes described in this specification

Ethereum’s development coincided with the development of the SHA3 standard, and the
standards process made a late change in the padding of the finalized hash algorithm, so that Ethereum’s
“sha3_256” and “sha3_512” hashes are not standard sha3 hashes, but a variant often referred
to as “Keccak-256” and “Keccak-512” in other contexts. See discussion, e.g. here, here, or here.

Please keep that in mind as “sha3” hashes are referred to in the description of the algorithm below.

A note of “endian”

There are many endian converions, big to little ,or little to big. This is because “EVM” always uses BIG-ENDIAN and most x86 devices use LITTLE-ENDIAN.

Parameters

The parameters for Ethash’s cache and dataset depend on the block number. The cache size and dataset size both grow linearly; however, we always take the highest prime below the linearly growing threshold in order to reduce the risk of accidental regularities leading to cyclic behavior.

// calc the num of hash items in cache
int ethash_calculate_light_cache_num_items(int epoch_number) noexcept
{
    static constexpr int item_size = sizeof(hash512);
    static constexpr int num_items_init = light_cache_init_size / item_size;
    static constexpr int num_items_growth = light_cache_growth / item_size;
    static_assert(
        light_cache_init_size % item_size == 0, "light_cache_init_size not multiple of item size");
    static_assert(
        light_cache_growth % item_size == 0, "light_cache_growth not multiple of item size");

    int num_items_upper_bound = num_items_init + epoch_number * num_items_growth;
    int num_items = ethash_find_largest_prime(num_items_upper_bound);
    return num_items;
}

// calc the num of hash1024 items in cache
int ethash_calculate_full_dataset_num_items(int epoch_number) noexcept
{
    static constexpr int item_size = sizeof(hash1024);
    static constexpr int num_items_init = full_dataset_init_size / item_size;
    static constexpr int num_items_growth = full_dataset_growth / item_size;
    static_assert(full_dataset_init_size % item_size == 0,
        "full_dataset_init_size not multiple of item size");
    static_assert(
        full_dataset_growth % item_size == 0, "full_dataset_growth not multiple of item size");

    int num_items_upper_bound = num_items_init + epoch_number * num_items_growth;
    int num_items = ethash_find_largest_prime(num_items_upper_bound);
    return num_items;
}

Tables of dataset and cache size values are provided in the wiki page.

Cache Generation

Now, we specify the function for producing a cache:

void build_light_cache(hash512* cache, int num_items, const hash256& seed) noexcept
{
    hash512 item = keccak512(seed.bytes, sizeof(seed));
    cache[0] = item;
    for (int i = 1; i < num_items; ++i)
    {
        item = keccak512(item);
        cache[i] = item;
    }

    for (int q = 0; q < light_cache_rounds; ++q)
    {
        for (int i = 0; i < num_items; ++i)
        {
            const uint32_t index_limit = static_cast<uint32_t>(num_items);

            // Fist index: 4 first bytes of the item as little-endian integer.
            uint32_t t = fix_endianness(cache[i].half_words[0]);
            uint32_t v = t % index_limit;

            // Second index.
            uint32_t w = static_cast<uint32_t>(num_items + (i - 1)) % index_limit;

            // Pipelining functions returning structs gives small performance boost.
            cache[i] = keccak512(bitwise_xor(cache[v], cache[w]));
        }
    }
}

The cache production process performs two passes of Sergio Demian Lerner’s RandMemoHash algorithm from Strict Memory Hard Hashing Functions (2014). The output is a set of sha512 (64 bytes) values.

Data aggregation function

We use an algorithm inspired by the FNV hash in some cases as a non-associative substitute for XOR. Note that we multiply the prime with the full 32-bit input, in contrast with the FNV-1 spec which multiplies the prime with one byte (octet) in turn.


inline uint32_t fnv(uint32_t u, uint32_t v) noexcept { return (u * 0x01000193) ^ v; } inline hash512 fnv(const hash512& u, const hash512& v) noexcept { hash512 r; for (size_t i = 0; i < sizeof(r) / sizeof(r.half_words[0]); ++i) r.half_words[i] = fnv(u.half_words[i], v.half_words[i]); return r; }

Please note, even the yellow paper specifies fnv as v1*(FNV_PRIME ^ v2), all current implementations consistently use the above definition.

Full dataset calculation

Each sha512 value in the dataset is computed as follows:


hash512 calculate_dataset_item_single(const hash512* light_cache,const int64_t light_cache_num_items, uint32_t index0) noexcept { const hash512* const cache = light_cache; static constexpr size_t num_half_words = sizeof(hash512) / sizeof(uint32_t); const int64_t num_cache_items = light_cache_num_items; const uint32_t init0 = static_cast<uint32_t>(index0); hash512 mix0 = cache[index0 % num_cache_items]; mix0.half_words[0] ^= fix_endianness(init0); // Hash and convert to little-endian 32-bit words. mix0 = fix_endianness32(keccak512(mix0)); for (uint32_t j = 0; j < full_dataset_item_parents; ++j) { uint32_t t0 = fnv(init0 ^ j, mix0.half_words[j % num_half_words]); int64_t parent_index0 = t0 % num_cache_items; mix0 = fnv(mix0, fix_endianness32(cache[parent_index0])); } // Covert 32-bit words back to bytes and hash. mix0 = keccak512(fix_endianness32(mix0)); return mix0; }

So the hash1024 ,which is a combination of two hash512, could be computed as:

hash1024 calculate_dataset_item(const hash512* light_cache, const int64_t light_cache_num_items, uint32_t index) noexcept
{
    hash512 mix0 = calculate_dataset_item_single(light_cache, light_cache_num_items, index * 2);
    hash512 mix1 = calculate_dataset_item_single(light_cache, light_cache_num_items, index * 2 + 1);
    return hash1024{ { mix0, mix1 } };
}

The entire dataset is then generated by:

void calc_dataset(hash1024* dataset,int dataset_item_nums, const hash512* light_cache,int light_cache_num_items){
    for(int i=0;i < dataset_item_nums;++i){
        dataset[i] = calculate_dataset_item(light_cache,light_cache_num_items,i)
    }
}

Hash

Now, we specify the main “hashimoto”-like loop, where we aggregate data from the full dataset in order to produce our final value for a particular header and nonce. In the code below, header represents the SHA3-256 hash of the RLP representation of a truncated block header, that is, of a header excluding the fields mixHash and nonce. nonce is the eight bytes of a 64 bit unsigned integer in big-endian order.

inline hash512 hash_seed(const hash256& header_hash, uint64_t nonce) noexcept
{
    nonce = fix_endianness(nonce);
    uint8_t init_data[sizeof(header_hash) + sizeof(nonce)];
    std::memcpy(&init_data[0], &header_hash, sizeof(header_hash));
    std::memcpy(&init_data[sizeof(header_hash)], &nonce, sizeof(nonce));

    return keccak512(init_data, sizeof(init_data));
}

inline hash256 hash_kernel(
    hash1024 * dataset, uint32_t full_dataset_num_items,const hash512& seed) noexcept
{
    static constexpr size_t mix_hwords = sizeof(hash1024) / sizeof(uint32_t);
    const uint32_t index_limit = static_cast<uint32_t>(full_dataset_num_items);
    const uint32_t seed_init = fix_endianness(seed.half_words[0]);

    hash1024 mix{ { fix_endianness32(seed), fix_endianness32(seed) } };

    for (uint32_t i = 0; i < num_dataset_accesses; ++i)
    {
        const uint32_t p = fnv(i ^ seed_init, mix.hwords[i % mix_hwords]) % index_limit;
        const hash1024 newdata = fix_endianness32(dataset[p]);

        for (size_t j = 0; j < mix_hwords; ++j)
            mix.hwords[j] = fnv(mix.hwords[j], newdata.hwords[j]);
    }

    hash256 mix_hash;
    for (size_t i = 0; i < mix_hwords; i += 4)
    {
        const uint32_t h1 = fnv(mix.hwords[i], mix.hwords[i + 1]);
        const uint32_t h2 = fnv(h1, mix.hwords[i + 2]);
        const uint32_t h3 = fnv(h2, mix.hwords[i + 3]);
        mix_hash.hwords[i / 4] = h3;
    }

    return fix_endianness32(mix_hash);
}

inline hash256 hash_final(const hash512& seed, const hash256& mix_hash)
{
    uint8_t final_data[sizeof(seed) + sizeof(mix_hash)];
    std::memcpy(&final_data[0], seed.bytes, sizeof(seed));
    std::memcpy(&final_data[sizeof(seed)], mix_hash.bytes, sizeof(mix_hash));
    return keccak256(final_data, sizeof(final_data));
}

hash256 hash(hash1024 * dataset, uint32_t full_dataset_num_items, const hash256& header_hash, uint64_t nonce) noexcept
{
    const hash512 seed = hash_seed(header_hash, nonce);
    const hash256 mix_hash = hash_kernel(dataset, full_dataset_num_items, seed);
    return hash_final(seed,mix_hash);
}

Essentially, we maintain a hash1024’s “mix” , and repeatedly fetch 128 bytes from the full dataset and use the fnv function to combine it with the mix. 128 bytes of sequential access are used so that each round of the algorithm always fetches a full page from RAM, minimizing translation lookaside buffer misses which ASICs would theoretically be able to avoid.

If the output of this algorithm is below the desired target, then the nonce is valid. Note that the extra application of sha3_256 at the end ensures that there exists an intermediate nonce which can be provided to prove that at least a small amount of work was done; this quick outer PoW verification can be used for anti-DDoS purposes. It also serves to provide statistical assurance that the result is an unbiased, 256 bit number.