Training dynamics for angle in last layer is very different from norm. Angle reaches a plateau early in training and improvement in loss in later stages is mostly through increase in norm. Models with better generalization have better angular alignment.
-
-
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.