Concrete problems in AI safety
This paper explores practical research issues associated with accidents in machine learning and artificial intelligence (AI) systems, due to incorrect objectives, scalability, or choice of behaviour. The authors present five research problems in the field, suggesting ways to mitigate risks in modern machine learning systems.
Please login or join for free to read more.
OVERVIEW
This report analyses the risk of unintended and harmful behaviour caused by poor design of artificial intelligence (AI) systems. The paper categorises five practical research problems related to accident risk, including avoiding negative side effects, avoiding reward hacking, scalable supervision, safe exploration and robustness to distributional shift. The authors provide proposals for relevant experiments to mitigate these risks in modern machine learning (ML) systems.
The report concludes by suggesting that researchers must anticipate fundamental and technical challenges, including safety risks, inherent in forward-looking applications of AI. The author’s purpose is to highlight practical AI safety issues that relevant to the cutting-edge of AI systems, which are ready for experimentation and, further, to propose research directions that focus on the relevance of AI systems in mitigating related risks.
The report also advocates for AI safety research, which will ultimately result in developing increasingly useful, relevant, and essential AI systems. It cites concerns about the privacy, security, fairness, economic and military implications that autonomous systems may entail. The researchers believe that AI technologies can be overwhelmingly beneficial to humanity and require early attention to potential risks surrounding the technology. They suggest that it is best to frame accident risk in terms of practical concerns with modern ML techniques, such as the challenges inherent in designing systems that optimise objectives that do not align with the designer’s intent.
The research recommends that stakeholders engaged in developing AI systems (such as engineers, data scientists, and policy-makers) should consider these risks and implement relevant solutions to mitigate the risks surrounding modern ML techniques. They urge caution in using “perform task X” objective functions that may give unintended results as the design requires “perform task X subject to common-sense constraints on the environment.”
In conclusion, the research highlights five fundamental technical problems relating to safety risks in modern ML systems. They propose relevant directions for experimentation in mitigating these risks and advise all stakeholders to consider and address these risks proactively. The paper aims to provide a robust starting point for exploring and mitigating the risks associated with building powerful AI systems while emphasising the importance of understanding the fundamental challenges of AI safety.