oh: 5) you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer .This one won't make you silently fail, but they are spurious parameters
-
-
এই থ্রেডটি দেখান
-
6) thinking view() and permute() are the same thing (& incorrectly using view)
এই থ্রেডটি দেখান
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
As an ML noob, can you explain why you want to overfit a single batch first or point to more around that topic?
-
It's a very quick sanity test of your wiring; i.e. if you can't overfit a small amount of data you've got a simple bug somewhere
-
it's by far the most "bang for the buck" trick that noone uses that exists.
-
I'm really big on the starting with small model + small amount of data & growing both together; I always find it really insightful
-
exactly. I like to start with the simplest possible sanity checks - e.g. also training on all zero data first to see what loss I get with the base output distribution, then gradually include more inputs and scale up the net, making sure I beat the previous thing each time.
-
All zero data is an interesting idea I haven't tried!
-
yep, happened to me a few times that I turn my data back on and get the same loss :) also if doing this produces a nice/decaying loss curve, this usually indicates not very clever initialization. I sometimes like to tweak the final layer biases to be close to base distribution.
-
Would you ever be worried about overfitting when tweaking the final layer biases?
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
Nice list! Our ML lab wrote up a few practical tips for debugging neural networks:https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-building-deep-neural-networks/ …
-
Do you think that we should still recommend the Adam optimizer after this paper from
@ICLR18?https://openreview.net/forum?id=ryQu7f-RZ … -
I'd have to read it! Is it implemented into Tensorflow yet?
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
This one's actually there in 231n. For classification problems: Didn't check if the loss started at ln(n_classes) :) Super simple and useful sanity check.
-
Doesn't this assume the classes are evenly distributed?
-
Pl correct me if I'm wrong, but I don't think this check depends on the ground truth class distribution. Since we know that it holds when classes are evenly distributed, I've worked out a case where all examples belong to class_1 in a 3 class problem. The check should still work!pic.twitter.com/z50FjGFBy9
-
Ahh right, I see it now, thanks for working this out!
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
5) you forgot that pytorch's .view() function reads from the last dimension first and fills the last dimension first too and are sending wrong input to model but aren't getting an error since the shape is right.
-
6) you forgot converting to float() after a comparison of tensors and summing on ByteTensors that have a buffer of 255 and zero out after that (should be fixed in new pytorch).
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
a) Not double-checking the learning rate --> an initial learning rate that is (far) too high leading to "weird" results. b) ~bad image augmentation --> I've accidentally augmented (w/a minor zoom in a loop) data loaded in memory, not a copy of the data, leading to ~useless data
-
Worst thing I ever did: I trained a détection algorithm with bounding boxes DRAWN on my training set
No wonder it converged so fast! -
haha! thanks for sharing -- I hope you found your mistake faster than I found mine
#sneaky -
TBH I found my mistake exactly 10 minutes after I started bragging to my co-workers
So I guess that’s what we can consider as “too late” :)
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
লোড হতে বেশ কিছুক্ষণ সময় নিচ্ছে।
টুইটার তার ক্ষমতার বাইরে চলে গেছে বা কোনো সাময়িক সমস্যার সম্মুখীন হয়েছে আবার চেষ্টা করুন বা আরও তথ্যের জন্য টুইটারের স্থিতি দেখুন।