As a warm-up, let's consider the memory as the main energy sink of the model. Quantizing your network is the go-to solution to reduce the memory footprint : down to 1 bit per parameter (aka binarized network) as the utmost limit.
-
-
Prikaži ovu nit
-
To squeeze even further the consumption you're then bound to tweak your architecture ... but are you ? If you take a look at the hardware part, the consumption grows quadratically with the supply voltage. You could try to play with it
Prikaži ovu nit -
It's no free lunch though ! The memory read error rate, meaning 0's read as 1's and vice-versa, will undergo an exponential increase. Your network will thus have to make with randomized parameters.
Prikaži ovu nit -
You could use some safety mechanisms to ensure the read parameters are "the good ones" (e.g. error correction code). But let's agree it's neither fun nor free (require additional hardware
)Prikaži ovu nit -
Or, you could try to optimize the network to work with these random parameters and even better: try to maximize the amount of randomness. Based on recent works such as "Are all layers created equal" (Zhang, Bengio et al.) we propose to optimize the randomness per-layer.
Prikaži ovu nit -
Thus we propose the Layerwise Noise Maximisation algorithm to efficiently find the "good amount" of randomness during a gradient descent based training. We show networks have the capability to retain an interesting accuracy with a third of the original memory energy consumption.
Prikaži ovu nit -
(compared to a binary network with reliable memory) Details are in the paper, which we'll present at AICAS 2020
Interested ? have any question ? let's chat !
otherwise, merry christmas !
Prikaži ovu nit
Kraj razgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.
Planning for a neural network electricity diet for 2020 ? Look no further !