You are logged into an old server. The uptime is 788 days. There are a lot of kernels here. >
-
-
You take note of running services & containers, and of listening IPs:ports, so that you can verify exactly the same ones are back up after reboot. You sanity-check and save system time to NVRAM (affects fsck on bootup!) You check dmesg, and status of filesystems and RAID arrays.
-
If switching to a different kernel, you prepare a "safety net": bootloader's automatic fallback to the current kernel on a 2nd reboot, "panic=60" to trigger that reboot on new Linux kernel's panic, and "/sbin/shutdown -r 5 &" in rc.local in case new kernel's networking fails.
-
You possibly configure netconsole to a nearby server, or over the Internet if you specify the router's MAC address along with your logging machine's IP address. You manually shutdown services, but not sshd yet. You manually remount filesystems read-only. You think twice.
-
If everything's OK, you "sync" and "reboot -f" (also syncs). If there are disk issues (sync might get stuck), you "reboot -nf" or (safer in terms of not trying to sync to possibly-stuck drives?) you "echo -en '\xfe' | dd of=/dev/port seek=100 bs=1 count=1" (surely everyone does).
-
Once back in, you "shutdown -c" and remove it from rc.local. You sanity-check the system: what kernel booted up, as well as the checks similar to those you made pre-reboot (including verifying the same services are back up, etc. - perhaps using "diff -u" against your saved game).
-
If you had enabled netconsole temporarily, you disable it. You make the new kernel the default. You stay around for a few hours doing other work yet checking your e-mail in case issues come up or are reported to you (and you had planned for this before starting to play the game).
-
Alternately: if you had to spend this much time rebooting a single node, you already lost the game
-
That's a valid alternative, yes. Depends on which game it was supposed to be and ended up being, which in turn depends on scale and more.
End of conversation
New conversation -
-
-
Check if iDrac/iLo/iRMC is available (and licenced), update it, check if the java console works, reboot. Map a systemrescuecd iso as a virtual cdrom and fix the system, reboot. No iDrac/iLo/iRMC? "system cannot be supported by IT remotly, onpremise visit required" :)
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.