[ad_1]
Abstract
The recent emergence of foundation model-based chatbots, such as ChatGPT (OpenAI, San Francisco, CA, USA), has showcased remarkable language mastery and intuitive comprehension capabilities. Despite significant efforts to identify and address the near-term risks associated with artificial intelligence (AI), our understanding of the existential threats they pose remains limited. Near-term risks stem from AI that already exist or are under active development with a clear trajectory towards deployment. Existential risks of AI can be an extension of the near-term risks studied by the fairness, accountability, transparency and ethics community, and are characterised by a potential to threaten humanity’s long-term potential. In this paper, we delve into the ways AI can give rise to existential harm and explore potential risk mitigation strategies. This involves further investigation of critical domains, including AI alignment, overtrust in AI, AI safety, open-sourcing, the implications of AI to healthcare and the broader societal risks.
Introduction
For many of us, artificial intelligence (AI) already plays a significant role in several aspects of our lives and daily decision-making, ranging from the items we purchase to the treatment plan for our disease diagnosis. AI systems influence the outcome of recommender systems, search engines, healthcare services and drug discoveries, among others. With the rise of foundation models (eg, DALL-E, CLIP, GPT-4), it is only inevitable for the role of AI in our day-to-day lives to grow.1 The release of foundation model-based chatbots such as Gemini, Claude and ChatGPT are capable of language mastery and have remarkable abilities to understand at an intuitive level. They have demonstrated the ability to successfully pass the US Medical Licensing Examination and Radiology Board-style Examination.
Despite the immense benefits of AI systems, their increasing capabilities and rapid adoption are not without risks. Near-term risks stem from AI that already exist or are under active development with a clear trajectory towards deployment.2 While near-term AI risks related to fairness, transparency, security and explainability are well explored,3 our understanding of the existential threats they pose is limited. Existential risks of AI are defined as a risk that endangers the long-term potential of humanity, potentially leading to its destruction.4 5 This definition includes risk factors, which are near-term risks that are not an existential risk itself, but can increase the probability of an existential catastrophe or reduce our ability to respond effectively to such a threat.4 Moreover, these risk factors can evolve and converge to eventually lead to the collapse of civilisations, dystopian possibilities and the destruction of desirable future development.5 In this paper, we discuss near-term AI risk factors, and ways they can lead to existential threats and potential risk mitigation strategies.
AI alignment and inequities
AI alignment refers to ensuring that an AI system behaves in accordance with the values of another entity, such as a human, an institution or humanity as a whole.6 Misalignment can be a result of goal misgeneralisation, which could occur on specifying the purpose of a system (outer misalignment) or ensuring the system adopts the specification robustly (inner misalignment).7 Failing to align AI with shared human values can pose existential risks by exacerbating social disparities, creating power imbalances and locking in oppressive systems.8 However, given the existence of nuanced differences between diverse groups of people, aligning AI to human values is not a trivial task. Failure to align AI can lead to the eradication of cultures, dialects and human values. For instance, an algorithm designed to identify patients needing additional care in the US healthcare system inadvertently prioritised white patients over sicker black patients, demonstrating how misalignment can perpetuate existing inequities and cause real-world harm.9 Such misalignment has also been seen in AI diagnostic tools for medical imaging, where AI correctly diagnosed young patients at a lower rate10 or underdiagnosed low-income patients with Medicaid insurance at a higher rate.11 These shortcomings may limit timely access to care for marginalised groups, which could cascade into systemic disparities in health outcomes. When deployed at scale, such biased systems could erode public trust in healthcare, destabilise societal structures and ultimately spark public health crises or even civil unrest. It is imperative that the development of strategies to detect, evaluate and scrutinise the implicit goals and subgoals of AI systems are prioritised. Strategies to assess the trade-offs between reward and morality can help set appropriate goals for AI systems.12 Scalable oversight—whereby human oversight is scaled to provide reliable supervision to AI systems through labels, reward signals and critiques—is imperative for improving AI alignment. However, in order for humans to provide better oversight for AI, there is a need to advance the domain of human–computer interaction and develop robust techniques for scalable oversight. Mechanistic interpretability13 and inverse reinforcement learning (RL)14 are other promising approaches to observe an agent’s behaviour and learn its objectives, values or rewards, to enable improved AI alignment.
Privacy and AI safety
Although technologies like public chatbots have built-in safety measures to prevent the generation of inappropriate content, it has been shown that adversarial prompts can circumvent the alignment of large language models (LLMs) and create clearly objectionable content.15 Model security is a growing concern, given that LLMs are vulnerable to data poisoning attacks.16 Sophisticated AI systems are capable of memorising personally identifiable information and reproducing it, which raises serious privacy concerns.17 Trojan attacks can cause LLMs to have unexpected behaviour towards database middleware.18 The recent Dinerstein v. Google case underscores these concerns, highlighting the challenges of safeguarding patient privacy even when using de-identified electronic health records for AI development. This case highlighted the potential for re-identification and unauthorised use of patient data, even when explicit identifiers are removed.19 This privacy risk could lead to large-scale surveillance or targeted exploitation, which could be weaponised by malicious actors to destabilise societal structures, erode public trust, and, in extreme cases, enable bioterrorism or other forms of large-scale harm, ultimately presenting existential threats. To ensure safe AI operation in real-world settings, thoughtful design and careful data collection practices are required. Improvements to adversarial robustness, sycophancy mitigation, uncertainty estimation and transparency are required to steer AI systems away from harmful behaviours, and prevent misuse. A comprehensive evaluation of agents’ adversarial robustness and transparency of training data is necessary to be aware of knowledge gaps and system vulnerabilities.20
Overtrust in AI and automation bias
AI systems can increasingly generate convincing misinformation with high confidence; this can manipulate people, create fake news, encourage polarisation, create power imbalances and lock in oppressive systems.21 In the context of healthcare, overtrust in AI could have devastating ramifications, far exceeding individual harm. Blind reliance on AI systems for critical tasks, such as diagnosing diseases or recommending treatments, without adequate human oversight or verification, could lead to catastrophic errors on a massive scale.22 Methods for explainability can enable a better understanding of a system’s behaviour and help detect undesired objectives that lead to specification gaming. For example, we have observed unexpected behaviours of AI models, such as race detection from medical images.23 Co-occurrence of this phenomenon with higher underdiagnosis of vulnerable races by AI systems in medical imaging, increases concerns around fairness.22 Explainability techniques can be leveraged to alleviate these concerns by revealing potential reasons for race detection from medical images.24 However, explainability should be approached with caution as it can also lead to overtrust and misplaced confidence in AI explanations.25 This can amplify misinformation and conspiracy theories, particularly on social media, as seen during the COVID-19 pandemic. Public trust in political systems is essential for an effective response to global health crises like pandemics, including measures like quarantining, vaccination and mask-wearing. However, widespread misinformation undermines this trust, making it harder to combat future pandemics.3 Preventing the spread of misinformation is critical to ensuring a coordinated and effective response to such existential threats. Hierarchical RL methods, which decompose the decision-making process into simpler subtasks, are a promising strategy to understand AI decisions and mitigate their associated existential risks.26 Methods to fact-check and understand these models are necessary. This may be possible by watermarking a generated output to distinguish humans from machines.27 Overtrust in AI is especially harmful in domains, such as healthcare. The use of AI for clinical decision-making, including medication administration or disease treatment, can threaten human life and set unrealistic expectations of patient outcomes. To ensure more truthful answers, it could be beneficial to have two AI systems debate each other to see different opinions, with a third (possibly human) system judging the debate,28 or use smaller (specialised) AI models to verify bigger ones. Red teaming, which involves simulating attacks on an AI model to test security, could also be used to adversarially probe an LLM for harmful outputs and subsequently update it to avoid such outputs.29
Transparency and open-sourcing
Given the high resource consumption and associated costs, very few companies have the ability to train large foundation models. As a result, losing smaller models can generate a monopoly that puts the power in the hands of a small group of select corporations. The concentration of power within a few corporations, combined with the opacity of proprietary AI models like ChatGPT and Google’s Gemini chatbots, creates a perilous environment in healthcare. The potential for profit-driven manipulation of AI tools raises the threat of patient exploitation, unethical practices and even the weaponisation of AI for bioterrorism.30 Moreover, the growing sophistication of AI in fields like molecular biology amplifies the risk of engineered pandemics, enabling malicious actors to design synthetic proteins and unleash devastating pathogens.31 The lack of transparency hinders our ability to scrutinise AI decision-making, allowing biases and errors to perpetuate, leading to misdiagnoses, inappropriate treatments and the exacerbation of existing health disparities, ultimately eroding public trust and jeopardising the integrity of the healthcare system. By contrast, open-source technology developers share their work for everyone to use, improve or adapt as they see fit. Since models like Meta’s LLaMA have become widely available, the barrier to entry for implementing LLMs has dropped significantly, closing the gap between corporations and open-source models at a rapid pace. Open-source models are increasingly easier to customise and are more capable. A continued community-wide effort to open-source AI technologies is critical to promote transparency and prevent monopolisation. To prevent this, policymakers must create regulations to clearly define intellectual property and ownership, as well as legal recourse for the misuse of foundation models.
Societal risks of AI
The increased integration of AI technologies in our daily lives can lead to enfeeblement, whereby humanity loses the ability to self-govern and becomes completely dependent on machines. Although there is no consensus around the dangers of superhuman AI, a number of AI leaders have expressed concerns regarding the existential threat posed by AI, leading to a dystopian world where machines take over systems and override human control. These warnings have come from those who occupy positions of privilege and disregard the real (and more imminent) risks of AI encoding structural inequities permeating centuries-old legacy systems. The real dangers of AI arise not from its supposed superintelligence, but from its lack of true intelligence. For instance, adversarial attacks exploit the limitations of AI’s pattern-matching, while overtrust arises when users mistakenly attribute intelligence to these systems, leaving them vulnerable to manipulation. Moreover, the economic displacement, driven by AI-enabled automation of tasks that require human decisions, is poised to further restrict access to opportunities for communities who are historically disempowered and oppressed by influential and discriminatory groups. This would exacerbate disparities in critical areas like healthcare. Advanced AI systems pose significant threats to society, including exacerbating social injustice, destabilising institutions and distorting our shared reality. They could enable large-scale criminal activities and concentrate power in the hands of a few, potentially leading to increased global inequities, automated warfare, mass manipulation and pervasive surveillance. The development of autonomous AI systems that can pursue goals independently amplifies these risks and introduces new ones. Without proper safeguards, we risk losing control over these systems, potentially leading to rapid escalation of harms such as cybercrime and social manipulation. In the worst-case scenario, unchecked AI advancement could result in a catastrophic loss of life and environmental destruction.30 To prevent this, evaluations and risk assessments are required to understand how individuals, society, processes, institutions and power structures are changed and shifted by AI systems, in addition to the implementation of evidence-based risk mitigation strategies.
Future of AI in healthcare
There is immense potential for AI to improve healthcare and assist with clinical decision-making care; this has already been demonstrated in several aspects of early diagnosis, risk stratification and treatment optimisation. However, the incorporation of AI into clinical care can pose risks to healthcare providers and cause over-reliance on AI systems. Impacts on critical workflows can make the healthcare system more vulnerable to technical failures or cyberattacks. Integration of AI into patient care can also further exacerbate the accessibility and disparity gaps, particularly affecting individuals of lower socioeconomic status, and marginalised and under-represented groups. Ultimately, AI systems are only as good as the data they are trained on; if the data is biased or lacks diversity, the AI’s decisions will reflect this. Efforts to generate and curate diverse, high-quality, large-scale datasets and perform robust evaluations in accordance are critical to preventing adverse patient outcomes. Proactive governance frameworks are also essential to keep up with AI’s rapid evolution.1 30 This means establishing specialised institutions with the expertise and authority to enforce rigorous safety standards, conduct detailed risk assessments and require transparency.
Conclusion
Despite the profound benefits AI can provide to individuals and society, failure to proactively address the existential threats they pose can have catastrophic consequences with long-term harm to humanity. As the use of AI grows and its capabilities continue to advance,1 a collective effort by the AI community is needed to better understand the existential risks they pose and to develop strategies to ensure their responsible development and deployment.
[ad_2]
Source link



