VIM3 Crypto Currency Miner Build

I did test out the Odroid C4 on RandomX and got ~122H/s which 1/4 of a GTX 10606GB, Power draw is ~3W at wall, I wish Khadas would release a VIM3L with 4GB of RAM I think it would kill off the RK3399 boards not cost wise but price/performance wise for sure especially in W per Performance

2 Likes

Yes it would defo be interesting - any thoughts on a 4gb vim3l @Gouwa?!

1 Like

4G VIM3L, interesting, but how does a 8G VIM3 sound ? :smiley:,
but anyway a 8G is not possible as the memory address on the VIM3 is max of 32 bits,
Atleast expect it in the VIM4 (if something like that comes out) :slight_smile:

1 Like

8g is wasted memory for crypto mining! Need 4g to fit the randomx algo in though!

Odroid C4 on RandomX and got ~122H/s

4GB VIM3 on CPU does 226H/s, even not using all cores. Actually using all cores hurts performance (and consumption). The best setting seems to be "rx": [0, 1, 2, 3], i.e. two big cores (2,3) and two little cores (0,1). The reason probably is two L2 caches (each 512kB), one exclusively for big and one for little cores.

Probably on Odroid C4 you’ll also get better performance using just 3 or 2 cores out of 4. The cache should be comparable (only 512kB). The best way to get cache properties is a little program cache-info from here.

well yes but what if you wanted to do something else, it might have been useful, just saying

1 Like

Oh yes definitely I agree!

The new RPi has a less powerful SoC and 8GB of ram, the VIM3 has a more powerful SoC but only 4GB of ram, its like both these combos were made in Heaven :rofl:

RPi4 misses ARMv8 crypto-extensions, so there is no HW AES instruction and mining is a bit slower. I can get rx 104H/s on 3 cores.

yes that’s why i said ,

but I don’t understand why those extentions are not included, maybe it’s because it uses a custom SoC ?

The Cache structure is different between armv8.2-a and armv8-a, I’ll try what you suggest. but the cache for the VIM3L as well as the Odroid C4, is or should be L1, L2, and L3 the L2 is per core and independent while the L3 is system wide.

8GB of ram for SBC is wasted memory even the 4GB RPI is more than enough, If they had wanted to boost actual performance they would have either added more L2 cache or added AES and/or Crypto extensions, Same with the VIM3 it’s actually disappointing that it’s only got 512KB for the 4 A73 cores should have been a minimum of 1MB ideally 2MB and given the cost you can’t tell me there wasn’t room to add them.

1 Like

Cache structure for the Odroid C4
$ ./cache-info
Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 4 x 32 KB, 4-way set associative (128 sets), 64 byte lines, shared by 1 processors
L1 data cache: 4 x 32 KB, 4-way set associative (128 sets), 64 byte lines, shared by 1 processors
L2 data cache: 4 x 128 KB (exclusive), 4-way set associative (512 sets), 64 byte lines, shared by 1 processors
L3 data cache: 1 MB (exclusive), 16-way set associative (1024 sets), 64 byte lines, shared by 4 processors
It’s got and extra 512KB that the VIM3 Doesn’t
and they took the halfway point for the L2 cache

wow, @wll1rah you really know your hardware :+1:

1 Like

It would be interesting to see, if Odroid C4 is faster with 3 or 4 cores on rx. It seems that it has enough L3 for 4 threads, but maybe not, as the official xmrig optimization guide states rx needs 256kB L2 + 2MB L3 per thread.

BTW. the official Odroid C4 page says it has only 512kB L3. Maybe cache-info is wrong in this case?

I’m willing to wager that CPU info is right over website, Since the VIM3L has 1MB of L3, the chips are more or less the same. They also don’t list the 128KB of L2 Cache that is for GPU either. I’ve not tried any of the other setups yet though. In the past though limiting the cores on the in-order hasn’t changed the hashrate on a per thread basis. Also the Architecture in ARM is RISC based so it requires mor memory in general, however if it see’s that it will need the same function it won’t flush that register but will flush the data register. Out of order Is like branch prediction in x86_64 hardware and can funnel data and instructions forward to get tasks out of the pipeline faster

RandomX
4 cores 127H/s unassigned but no measurable difference
3 cores 105H/s assigned cpu affinity
2 cores 85H/s unassigned
1 core 58H/s unassigned

Note: fastest results given, during cmake added CFLAGS"-mtune=cortex-a55" to the cmake proccess

Can’t really tell the guides are useless for arm hardware for the most part, the biggest thing to test is single versus double threaded, and that only applies to out of order cores in order has never been a benefit. Though even though the hash is lower on the Odroid C4 it still best the VIM3 on price and performance per/W. RandomX on the C4 yields ~37 H/s per Watt, the VIM3 yields about 29H/s per watt. I’m using my power numbers for my basic which is 8watt’s if you can add your that would be great. Price per hash of the the C4 vs VIM3. The C4 is $1:2.54H/s and for the for the VIM3 $1:1.61H/s
Get the NPU on the VIM3 would likely change this result drastically but as that isn’t a factor at the moment these are what we are left with

Seems that Odroid C4 really has 1MB cache. I think the RandomX algorithm is implemented inside a virtual machine so the datasets are approximately the same size between x86_64 and arm64. For optimal speed, it should fit in caches, otherwise the speed suffers.

For the performance per Watt I get:

VIM3 4threads [0,1,2,3] 52.6 H/Ws, 4.3W, 226H/s
RPi4 3threads [0,1,2] 24.8 H/Ws, 4.2W, 104H/s

2 Likes

In case anyone is interested Ive started open sourcing the backplane for my miner - have got PCB and firmware up there as of now. Will knock together a webshop and sell kits and populated boards when Ive worked out pricing

6 Likes