Many organizations are looking for trusted advisors, and this applies to our beloved domain of cyber/information security. If you look at LinkedIn, many consultants present themselves as trusted advisors to CISOs or their teams.
This perhaps implies that nobody wants to hire an untrusted advisor. But if you think about it, modern LLM-powered chatbots and other GenAI applications are essentially untrusted advisors (RAG and fine-tuning notwithstanding).
Let’s think about the use cases where using an untrusted security advisor is quite effective and the risks are minimized.
To start, naturally intelligent humans remind us that any output of an LLM-powered application needs to be reviewed by a human with domain knowledge. While this advice has been spouted many times — with good reasons — unfortunately there are signs of people not paying attention. Here I will try to identify patterns and anti-patterns and some dependencies for success with untrusted advisors, in security and SOC specifically.
First, tasks involving ideation, creating ideas and refining them are very much a fit to the pattern. One of the inspirations for this blog was my eternal favorite read from years ago about LLMs “ChatGPT as muse, not oracle”. If you need a TLDR, you will see that an untrusted cybersecurity advisor can be used for the majority of muse use cases (give me ideas and inspiration! test my ideas!) and only for a limited number of oracle use cases (give me precise answers! tell me what to do!).
So let’s create new ideas. How would you approach securing something? What are some ideas for doing architecture in cases of X and Y constraints? What are some ideas for implementing controls given the infrastructure constraints? What are some of the ways to detect Z? All of these produce useful ideas that can be turned by experts into something great. Ultimately, they shorten time to value and they also create value.
A slightly more interesting use case is the Devil’s Advocate use case (this has been suggested by Gemini Brainstormer Gem during my ideation of this very post!). This implies testing ideas that humans come up with to identify limitations, problems, contradictions or other cases where these things may matter. I plan to do X with Y and this affects security, is this a good idea? What security will actually be reduced if I implement this new control? In what way is this new technology actually even more risky?
Making “what if” scenarios is another good one. After all, if the scenarios are incorrect, ill-fitting or risky, a human expert can reject them. No harm done! And if they’re useful, we again see shorter time to value (epic example of tabletops via GenAI)
Now think about all the testing use cases. Given the controls we have, how would you test X? This makes me think that perhaps GenAI will end up being more useful for the red team (or: red side of the purple team). The risks are low and the value is there.
Report drafting and data story-telling. By automating elements of data-centric story telling, GenAI can produce readable reports, freeing humans for more fun tasks. Furthermore, GenAI excels at identifying patterns. This enables the creation of compelling narratives that effectively communicate insights and risks. And, back to the untrusted advisor: it’s still essential to remember that experts should always review GenAI-generated content for accuracy and relevance (thanks for the reminder, Gemini!)
Summary — The Good:
- Ideation and Brainstorming: LLMs excel at generating ideas for security architectures, controls, and approaches. They can help overcome mental blocks and accelerate the brainstorming process.
- Devil’s Advocate: LLMs can challenge existing ideas, identify weaknesses, and highlight potential risks. This helps refine strategies and improve overall security posture.
- “What-if” Scenarios: LLMs can create various scenarios to test the effectiveness of security controls and identify vulnerabilities.
- Security Testing: LLMs can be valuable tools for testing, proposing simulated attacks and identifying weaknesses in defenses.
- Report drafting: LLMs can help you write reports that make sense and flow well.
On the other hand, let’s talk about the anti-patterns. It goes without saying that if it leads to deployment of controls, automated reconfiguration of things, or remediation that is not reviewed by a human expert, that’s a “hard no”.
Admittedly, any task that require sharing detailed knowledge of my environment is also on that “hard no” list (some bots leak, and leak a lot). I just don’t trust the untrusted advisor with my sensitive data. I also assume that some results will be inaccurate, but only a human domain expert will recognize when this is the case…
Summary — The Bad:
- Direct Control: Allowing LLMs to directly deploy controls, reconfigure systems, or automate remediation without human review is a major risk.
- Access to Sensitive Information: Avoid sharing detailed knowledge of your environment with an untrusted LLM (which is another way of saying “an LLM”).
Bridging the Trust Gap
The key to safely using LLM-powered “untrusted security advisor” for more use cases is to maintain a clear separation between their (untrusted) outputs and your (trusted) critical systems.
A human domain expert should always review and validate LLM-generated suggestions before implementation. This choice is obvious, but it is also a choice that promises to be unpopular with some environments. What are the alternatives, if any?
Alternatives and Considerations
While relying on non-expert human review or smaller, grounded LLMs might seem appealing, they ultimately don’t solve the trust issue. Clueless human review does not fix AI mistakes. Another AI may fix AI mistakes, or it may not…
Perhaps a promising approach involves using a series of progressively smaller and more grounded LLMs to filter and refine the initial untrusted output. Who knows … we live in fun times!
Agent-style valuation is another route (if an LLM wrote remediation code, I can run it in a test or simulated environment, and then decide what to do with it, perhaps automatically prompting the LLM to refine it until it works well).
But still: will you automatically act on it? No! So think real hard about the trust boundary between your “untrusted security advisor” and your environment! Perhaps we will eventually invent a semantic firewall for it?
Conclusion
LLMs can be powerful tools for security teams, but they must be used responsibly given lack of trust. By focusing on appropriate use cases and maintaining human oversight, organizations can leverage the benefits of LLMs while mitigating the risks.
Specifically, LLMs can be valuable “untrusted advisors” for cybersecurity, but only when used responsibly. Ideation, testing, and red teaming are excellent applications. However, direct control, access to sensitive data, and unsupervised deployment are off-limits. Human expertise remains essential for validating LLM outputs and ensuring safe integration with critical systems.
- LLMs can be valuable “untrusted advisors” for ideation and testing in cybersecurity.
- Human experts should always review and validate LLM output before implementation.
- LLMs should not (yet?) be used for tasks requiring high trust or detailed environmental knowledge.
- Striking the right balance between human expertise and AI assistance is crucial.
Thanks Gemini, Editor Gem, Brainstormer Gem and NotebookLM! 🙂
Related: