How Large Language Models Threaten Enterprise Security

Written by Lauren Haines | Sep 27, 2023 9:49:20 PM

Invasive lionfish, with their beautiful stripes and destructive appetites, can tell us a cautionary tale about the hazards of adopting a new element into an unprepared ecosystem. Imported from their native waters by aquarium enthusiasts, these insatiable predators found their way into the Atlantic Ocean, where they have decimated native species. Once celebrated as iconic pets, lionfish are now better known as a danger to ecosystem security.

Technology Imitates Nature

Large Language Models (LLMs), like lionfish, can wreak havoc in unprepared ecosystems. Promising breakthrough capabilities in natural language processing and generation, these models have swiftly found their way into a wide range of applications across industries. However, their strengths are a double-edged sword. Without proper safeguards, LLMs can jeopardize enterprise security, compromising digital ecosystems like an invasive species.

The introduction of lionfish into the Atlantic Ocean was a preventable disaster. Caution could have averted what Graham Maddocks, President of Ocean Support Foundation, a non-profit working to reduce the impact of invasive lionfish, calls “the Atlantic's most profound environmental crisis.” Likewise, organizations can avoid unleashing a ‘lionfish’ into their digital ecosystems by adopting safeguards against LLM enterprise security risks.

Understanding LLM Enterprise Security Risks

While environmentalists like Maddocks grapple with lionfish in the Atlantic Ocean, enterprise security experts face LLMs unleashed into digital ecosystems. For these professionals, the first step to prevent harm is understanding how it might occur. Lionfish, for instance, cause harm through over-predation and habitat destruction, whereas LLMs threaten enterprise security through external leaks, hallucinations, and internal exposure and misuse.

External Leaks

LLMs may share sensitive data, including intellectual property, with unauthorized parties. The massively popular, LLM-powered chatbot ChatGPT, for example, captures users’ chat history to train its model. This training method can lead to external leaks, where one user’s data resurfaces as output in response to another’s query.

Multinational electronics corporation Samsung discovered this risk the hard way. In May 2023, the company banned staff from using generative AI tools after engineers accidentally leaked internal source code by uploading it to ChatGPT. “Interest in generative AI platforms such as ChatGPT has been growing internally and externally,” Samsung told staff in an internal memo about the ban. “While this interest focuses on the usefulness and efficiency of these platforms, there are also growing concerns about security risks presented by generative AI.”

The threat of external leaks is not limited to chatbots like ChatGPT. Researchers at Robust Intelligence, an AI startup that helps businesses stress test their AI models, recently found that Nvidia’s “NeMo Framework,” which allows developers to work with a wide range of LLMs, can be manipulated into revealing sensitive data.

Hallucinations

LLMs ‘hallucinate’ (i.e., produce falsified information) for many reasons, including contradictory data, limited contextual understanding, and lack of background knowledge. Malicious actors can even manufacture hallucinations through data poisoning: the manipulation of an LLM’s training data to produce harmful results.

"An LLM is only as good as the data it is trained on. If you don’t know what the model was trained on, you can’t count on the integrity of its output," warns Erik LaBianca, Chief Technology Officer at Synaptiq.

Researchers at security firm Vulcan Cyber demonstrated that hackers can exploit hallucinations to deliver malicious code into a development environment, compromising enterprise security. They observed that ChatGPT — “the fastest-growing consumer application in history” — hallucinates references to non-existent code packages. Hackers can create malware masquerading as these ‘fake’ packages and fool victims into downloading it.

Internal Exposure & Misuse

Some organizations have adopted LLM models trained on internal data. Without strict access controls, these models can expose sensitive data to users who should not have access to it. For example, an LLM trained on internal HR data could reveal performance evaluations when queried by an individual from a different department.

Moreover, using LLMs without proper training or guidelines can lead to employees becoming overly reliant on AI, sidelining human expertise and intuition. This over-reliance can cause strategic missteps or amplify biases already present in the model's training data. For example, a federal judge recently imposed $5,000 fines on two lawyers and a law firm who blamed ChatGPT for their submission of fictitious legal research in an aviation injury claim.

"It's not just external threats that organizations need to worry about. A robust governance framework is equally critical for ensuring the safe, responsible usage of LLMs," says LaBianca.

Managing LLM Enterprise Security Risks

There is still hope for organizations that wish to leverage LLM capabilities while safeguarding their digital ecosystems. For one, emerging enterprise solutions present an opportunity for organizations to externalize security risks to third-party vendors through contractual guarantees (although they pose other challenges, such as vendor selection and trust). Alternatively, in-house LLM development grants organizations complete control over their data, as well as the opportunity to tailor their models to their unique operational and security needs.

While the enterprise security risks posed by LLMs should not be underestimated, they are not insurmountable. With a commitment to understanding both the capabilities and vulnerabilities of these models, organizations can leverage the transformative capabilities of LLMs without compromising their enterprise security.

Photo by Michael Jin on Unsplash

About Synaptiq

Synaptiq is an AI and data science consultancy based in Portland, Oregon. We collaborate with our clients to develop human-centered products and solutions. We uphold a strong commitment to ethics and innovation.

You can learn more about our story through our past projects, blog, or podcast.

View full post