Tag: #ai-safety

Articles related to ai-safety

technology

ArXiv paper proposes a measurable internal “rift” that may reveal lies in language models

An arXiv preprint finds a measurable internal 'residual rank' difference when models lie versus when they err, with implications for AI safety.

#ai, #ai-safety, #language-models, #deception