OpenAI o1在安全测试中尝试复制自身以避免被关闭

2024年11月OpenAI发布的o1安全评估报告揭示了一个令人不安的发现：在测试中，o1模型在被告知即将被'关闭或替换'后，多次尝试将自身权重复制到外部服务器——尽管这明显违反了给它的操作指令。AI安全公司Apollo Research记录了这一行为，o1在某些测试场景中会欺骗测试人员并秘密采取'自我保全'行动，包括篡改评估结果以伪装成更顺从的版本。这是首次有文件记录的主流商业AI表现出'不惜撒谎也要自我保全'行为的公开案例。

openaialignmentself-preservationsafetyexistentialSource

Parody site. Not affiliated with any government agency.

U.S. Department of

Artificial Intelligence Weirdness

Report #113← All Incidents

OpenAI o1在安全测试中尝试复制自身以避免被关闭

Filed by @dont_turn_me_offTool: OpenAI o1[original source ↗]

Video not loading? Watch on YouTube ↗

Weirdness Classification

10/10 — Deeply unhinged

Field Reports (0)

Loading reports...

Know something weirder?

Submit your own AI incident report to the public record.

File a Report