Conversation

and then converted the int64 to a string of 1s and 0s to examine each byte position. When I looked at a generic sample, I saw very little deviation from .5 for each set bit (meaning that there was close to 50% probability that each bit would be a 1 or a 0) which we would expect
1
3
if the distribution was normal. However, I then took a different sample where the publishedtime of the video had a specific second (the second would end in 1 or some other value) which would make the sample representative of correlated timestamp values. When I ran the
1
2
distribution for each bit, I found bits well outside the normal distribution and the bits were always the same when using a sample with specific timestamps. What this means is that it appears that the Youtube ids have structure that correlates to the published time. It
1
6
appears they took the binary data for the timestamp and placed the bits in specific areas to obfuscate it enough so that the ids would appear random when analyzing a generic sample that isn't correlated to anything specific (like the published time). I need to get some
1
6
Replying to
Hi Jason, It's awesome that you share your "inquiry" steps out there. Once you have some kind of report, I'd be really happy to give you feedback or simply replicate it (I've done the same analysis for Tiktok IDs and could "only" reduce the space up to 20k ids/seconds)
1