NPU for bit mining

JustSumDad · September 2, 2020, 4:20pm

Just curious if anyone has leveraged the NPU for bit mining and what kind of results it gave.

Electr1 · September 3, 2020, 12:48am

Well, we were talking about this over on the Khadas discord, and how one could harness its processing power, but it went into a stalemate, as we didn’t know how we could procced into the topic,

or First assumptions was to go barebones and use something like OpenCL as a standard way of Running the code, and then we thought of something like an NPU model that we could just run on it…

but in all both methods are complex, and the model method is hard, as no one has documented such a thing till now…

but in pure performance it depends on how many instruction cycles it takes to do one mining (hash) operation… (including things like converting INT8 to FLOAT, yada yada yada…)

@wll1rah was the one to discuss, maybe he can pour some more knowledge on the topic for you

JustSumDad · September 3, 2020, 3:57am

Thanx for the info, I am not a programmer so its not something I would take on, I was just curious if someone had taken the shot at it and if so what the results were

with the kind of cpu power needed to complete a hash now and the NPU being at least in theory so powerful, I was just kinda day dreaming…

Electr1 · September 3, 2020, 4:26am

could become a reality so, don’t worry,
we might crack the Da Vinci code and be able to use in a very near future…

Archangel1235 · September 4, 2020, 4:00am

The driver for NPU is closed source… NPU is supposed to support OpenCL but no documentation/driver for that. Though some matrix operations are possible using available API but porting it might be hard task.

As for the performance with proper drivers It should provide the full 5 TOPS (INT8 only), since the NPU can’t handle floating point it might be a huge disadvantage.

JustSumDad · September 4, 2020, 4:05am

So thats a bit of a bummer but in true Amlogic form it amounts to promises of drivers and library’s that take way way too long to be released…

Great hardware that we cannot make use of…

Electr1 · September 4, 2020, 4:19am

Amlogic’s hardware is good, very much as our allies, but in terms of software We just got ambushed by proprietary blobs…

Archangel1235 · September 4, 2020, 4:39am

Its still only INT8 operation… so you will be limited in terms of mining…

Frank · September 4, 2020, 10:14am

@Archangel1235 The driver for NPU is opensource , you can find it in fenix . The transform tool is closed source .

Electr1 · September 4, 2020, 10:19am

@Frank what about OpenCL for the NPU ?

Frank · September 4, 2020, 10:50am

@Electr1 Maybe you can try to do it . Now it not supports.

Vladimir.v.v · September 4, 2020, 12:23pm

I doubt this something

Electr1 · September 4, 2020, 1:17pm

Haha, you’re cute
but I’ll let my outcomes do the talking

Vladimir.v.v · September 4, 2020, 1:34pm

Ga-ga-ga, buddy, you’re still the best

Archangel1235 · September 6, 2020, 2:04am

@Frank ahh I was mistaken… Is there any plans to implement things like batch inference… This will make VIM3 much better than nano in AI workloads

Frank · September 7, 2020, 1:15am

@Archangel1235 It can be considered later, but the current primary plan for NPU is yolov4

wll1rah · November 12, 2020, 11:11am

@Frank, thank you the information. It’s really more a dream of mine as well to get it going for mining, it’s possible in theory but was looking for the OpenCL documentation so I could at least start to study it. @JustSumDad. INT16 to Float is what you would likely use since it’s the longest bit width that the NPU support and for an algorithm for something like Bitcoin the software would have have to pass it though the NPU 16 times and back though again another 16 times to complete the 256D function of Bitcoin. This doesn’t include an other processing of the checks and shuffling done by the NPU either. The general method to speed things up with OpenCL is to run lots of threads at once, but I think this would only slow the NPU down due to limited chance for the that chip. Just so you know I’m not a programmer either but I can read the code and see how things are put together. I do testing work for some forks on Cryptonight algorithms. As @Electr1 mention the model method is likely harder to due, and think that is because it’s closer to how an FPGA, Field Programmable Gate Array, bit stream works for mining. In this form you’d create a model on the bit stream that you normally would for the FPGA in place of the bit stream it would be processed though NPU. The biggest hurdle for me is that bit stream for an FPGA is written in machine code which is very low level processing code for the FPGA since the it’s an empty processor with no instruction sets like our NPU. This makes you have to figure out how to replicate the bit stream into NPU and have it solve the hash correctly It’s hard mainly because you have to figure out what code for the model will work for solving hashes.