AI Security

Modulus performs independent assessments of AI and machine learning systems institutions use in production, measured against the standards courts, regulators, and auditors recognize.

AI changes how systems can fail. AI models learn statistical patterns from data, follow instructions written in plain language, and increasingly act on their own. Each of those properties is also an attack surface, and most AI systems fall outside what traditional application security testing can audit.

Modulus has built machine learning and natural language systems for nearly three decades, and holds foundational LLM patents. We help uncover vulnerabilities in your organization that you do not yet know about, and provide a clear plan to mitigate them.

A new attack surface

Conventional security testing assumes software does what its code says. A machine learning model does not work that way. Its behavior is learned from data, expressed through millions of parameters, and shaped at runtime by whatever input it receives. There is no line of code to read that tells you how the model will respond to an input no one anticipated.

That creates failure modes a code review or a penetration test will not surface. The data a model learned from can be tampered with. The model itself can be copied or reverse-engineered. The instructions it follows can be hijacked by ordinary-looking text. An AI security audit examines the system as a whole: the data, the model, its inputs, its outputs, and the actions it is permitted to take.

Where risk enters the AI lifecycle

Diagram of a machine learning workflow from data collection through model training to deployment

An AI system is a pipeline, not a single program. Data is collected and prepared, a model is trained or fine-tuned, the model is deployed behind an interface, and its output flows into other systems. Risk can enter at any stage, and a weakness early in the pipeline often becomes a larger problem once the system reaches production.

Mapping that full lifecycle is the first step of an audit. It shows where untrusted data enters, where a third party is trusted, where sensitive information is exposed, and where the model's output is acted on without review.

  • Training and fine-tuning data, where poisoning and backdoors are introduced
  • The model, which can be copied, inverted, or made to leak its training data
  • Inputs at runtime, the vector for prompt injection and evasion
  • Outputs, which become dangerous when passed unchecked into other systems
  • Integrations and agents, where the model is allowed to take real actions
  • Third-party models and data, which carry risk you did not create

Poisoned data and untrusted models

Training data poisoning plants malicious examples in the data a model learns from, so the model behaves normally almost everywhere and fails only on the inputs an attacker chooses. Published research has planted a working backdoor by altering a few hundred images out of three million, and has shown that a model trained to misbehave on a specific trigger can pass standard safety testing without revealing the flaw.

For a bank or an exchange, this matters most in the models that decide what is normal. A poisoned fraud, anti-money-laundering, or market-surveillance model can be shaped to overlook the precise activity an attacker intends to run. Because most institutions build on third-party base models, datasets, and libraries, provenance becomes part of the audit: what went into a model, where it came from, and whether it can be trusted.

Evasion: a change too small to see

An adversarial example is an input altered just enough to change a model's decision while looking unchanged to a person. The change is computed against the model itself, so the result is deliberate rather than random.

This is the attack that matters most for detection. A fraud, anti-money-laundering, or surveillance model exists to separate normal activity from abnormal, and an evasion attack is engineered to push the abnormal back across that line. The same technique has fooled image classifiers, malware detectors, and intrusion-detection systems in published research.

InputPerturbationAdversarial input+=imperceptible, shown magnifiedFraud · 96%Fraud · 96%Legitimate · 99%evades detectionBoth inputs are identical to a human. The model’s verdict is not.

How a Modulus audit works

An engagement begins by inventorying your AI systems and the data, models, and integrations behind them, then building a threat model specific to how each system is used. From there our team tests the system the way an adversary would, combining manual review with adversarial techniques drawn from published research and the MITRE ATLAS knowledge base.

Every finding is mapped to a recognized framework, rated by impact and likelihood, and paired with a concrete remediation step your engineers can act on. The result is not a checklist. It is a clear account of how your AI can be misused and what to change first.

  • Inventory and threat modeling of each AI system in scope
  • Review of training data, fine-tuning, and model provenance
  • Adversarial testing for prompt injection, evasion, and data leakage
  • Assessment of agent permissions, tool access, and output handling
  • Review of third-party model and data dependencies
  • We measure against OWASP, NIST AI RMF, and ISO/IEC 42001

The risks an audit covers

Modulus tests AI systems against the recognized catalog of large language model and machine learning risks, including the OWASP Top 10 for LLM Applications. Each item below is a documented failure mode, not a hypothetical.

Prompt injection

Crafted input, sometimes hidden inside a document or web page the model reads, overrides its instructions. A model cannot reliably tell a developer's commands from data, so any untrusted content it ingests is a potential entry point.

Sensitive data disclosure

Models trained or prompted on confidential data can reveal it, and staff pasting records into external tools send regulated data outside your control. Both break data-handling obligations a financial institution is held to.

Data and model poisoning

Tampered training data plants errors or hidden backdoors that activate only on specific inputs. A poisoned detection model can be made blind to the exact behavior an attacker plans to use.

Adversarial examples

Small, often imperceptible changes to an input cause a model to misclassify it. Evasion attacks target precisely the fraud, AML, and intrusion-detection systems whose job is to flag the abnormal.

Data leakage from the model

Model inversion and membership inference reconstruct training data or confirm that a specific record was used. A model trained on customer data can become a channel for exposing it.

Improper output handling

When model output is passed unchecked into code, a database query, or a browser, prompt injection upstream becomes SQL injection or code execution downstream. Text-to-SQL and text-to-code features make this common.

Excessive agency

An agent given broad permissions can be steered by manipulated output into taking real actions. The damage scales with what the agent is allowed to do, from reading data to moving money.

Supply chain risk

Base models, datasets, embeddings, and libraries pulled from third parties can arrive already compromised. Provenance is difficult to verify by inspection alone.

Hallucination

A model can state false information as fact. In disclosures, suitability assessments, or customer guidance, a fabricated figure or citation becomes a misstatement the institution, not the vendor, is liable for.

Scope, standards, and deliverables

What a Modulus AI security audit examines, the frameworks it maps to, and what your team receives.

What we examine

  • Large language model applications and assistants
  • Machine learning and detection models
  • Training, fine-tuning, and retrieval data pipelines
  • Prompts, embeddings, and vector stores
  • Agents, tools, and downstream integrations
  • Third-party and self-hosted model providers

Frameworks we map to

  • OWASP Top 10 for LLM Applications
  • NIST AI Risk Management Framework
  • MITRE ATLAS adversarial technique catalog
  • ISO/IEC 42001 AI management systems
  • EU AI Act risk tiers

What you receive

  • A threat model for each system in scope
  • Findings rated by impact and likelihood
  • Every finding mapped to a framework
  • A prioritized remediation roadmap
  • An executive summary for risk and compliance
  • An optional re-test after remediation

Models and platforms we assess

ChatGPT
Claude
Gemini
Llama
Mistral
Grok
DeepSeek
Hugging Face

Audit your AI.

Understand where your AI systems are exposed, and how to close the gaps, before they reach production. Get started, or arrange a call with our team.