Conversation

Have you looked at zero knowledge cryptography? It allows you to prove the inference of a model, that an output came from some input used in a specific model (snark) and you can also hide parts of the input or the model (zero knowledge).
2
5
None of those sound like problems that I have. I don't have a proof an alignment theorem that I want to hide from somebody while still proving that I proved it.
1
16
Show replies
In any case AI alignment is much more likely to be an engineering problem than a math problem.
1
Ok, but part of the thing is to use AI (augmenting humans) to search the space of possibilities and find the relevant theorems, and then use the theorem solver to help out. That said, I agree there’s an implicit notion that we’d know which theorems are relevant at all.
1
Some of the work in the intersection of AI and theorem-proving is auto-formalization, like "Convert statement formulated in English into a format a theorem prover can understand". E.g.: twitter.com/Yuhu_ai_/statu If we had a perfectly capable autoformalizer you could,
Quote Tweet
After showing a few examples, large language models can translate natural language mathematical statements into formal specifications. We autoformalize 4K theorems as new data to train our neural theorem prover, achieving SOTA on miniF2F! 1/ Paper: arxiv.org/abs/2205.12615
Show this thread
Image
Image
1
3
in theory, give it a task to convert colloquial description of what you're trying to do into a formal description. I.e. you could just tell it to "solve AI alignment". Note that it's already somewhat good at indentifying implicit assumptions and such, even though they are
2
Show replies
Strong disagreement. Having more trustworthy operating systems, applications, etc. helps with the overall security problem, which reduces all sorts of risk. It’s very much worth the experiment, especially because it’s actually tractable and not very dangerous in itself.
1
10
BTW, I want to reiterate this. There have been excellent successful experiments in the past with using small trusted codebases to provide assurance for gigantic untrusted codebases; it’s a powerful overall tool. Verification is also well understood unlike many other tools.
1
5
Show replies