← Back to Blog AI VOICE

In-call MFA vs deepfakes — what actually catches the modern attacker

Deepfake voice cloning is now consumer-grade. Here is how Authenticator and Duo push challenges inside the call defeat the attack — and where the gaps still are.

Patrick Leonard

January 2026 · 8 min read · Founder & CEO

Voice cloning crossed the consumer-grade threshold in 2024. A 30-second clip of someone's voice, free tools, and a target attacker can produce a convincing voice impersonation in under five minutes. The implication for the helpdesk is direct: you can't trust a voice. Even one you recognize.

This is a structural shift, and it forces a structural answer. The answer is in-call MFA challenges that are bound to a device the real employee owns — not the voice on the line.

Why "ask security questions" stopped working

Tier-1 helpdesks have leaned on knowledge-based authentication for decades — manager's name, last four of SSN, recent ticket numbers. None of these survive contact with modern reconnaissance. LinkedIn, the company's About page, leaked breach data, and a few minutes of OSINT cover most of them.

The attack model used to be: attacker doesn't have the answers. Modern model: attacker has all the answers. KBA is a noise floor — it filters out unsophisticated attempts but does nothing against the pros.

Why voiceprint matching alone has gaps

Some platforms claim to defeat deepfakes via voiceprint matching — comparing the caller's voice to a registered sample. We don't rely on this, and we don't recommend it as a primary defense. Two reasons:

Modern voice cloning models reproduce voiceprints well enough to pass static matching. The arms race is moving fast.
Voiceprint matching is a probabilistic check, not a binding check. It says "this voice probably matches" — not "this is definitely the registered employee on the registered device."

The right primitive is a device-bound challenge.

"You can clone the voice. You can't clone the registered Authenticator app on the real employee's phone. The defense moves to a layer the attacker can't replicate."

The in-call MFA model

What we deploy: when a caller asks for a privileged action, the agent triggers a Microsoft Authenticator or Duo push to the registered device of the employee they claim to be. The caller has to approve from the real device, on the real account. If they can't, the privileged action does not execute.

What this defeats:

Voice clones — they don't have the device.
Recovered passwords from old breaches — they don't have the device.
SIM swap on the caller's phone — the push goes to the registered device, which is not the caller's compromised SIM.
Insider impersonation — they still need the registered device of the impersonated employee.

What it doesn't defeat (and where the gaps are):

Device theft. If the attacker has the actual phone and the unlock code, they can approve. This is a separate threat covered by phone-level controls (biometric unlock, device wipe).
MFA fatigue. Some employees approve any push reflexively. We mitigate with number-matching challenges where supported.
Push prompt phishing. Mitigated by ensuring the caller, not the agent, must initiate the action — the prompt goes to the employee with context.

Fallback paths that don't break the model

What happens when the legitimate caller doesn't have their phone?

Secure one-time link to the registered email. The caller gets the link, the link verifies session ownership of the email account, the privileged action proceeds.
Photo ID via secure portal. A second tech does live video verification with a government ID.
Hand-off to the security team for elevated review. Always available. Always logged.

The point is: there is always a fallback that does not require trusting the voice. Every fallback is auditable and logged to the SIEM.

The economics

The reason we structured the platform this way comes down to attacker economics. The cost of a voice clone fell from "several thousand dollars and a recording studio" in 2020 to "$0 and 5 minutes" in 2025. The cost of getting an Authenticator push approved on someone else's phone has stayed roughly the same: very high. We want the defense on the cheap-to-defend side of that gap, not the cheap-to-attack side.

The take-home

Don't try to detect deepfakes. Make the verification step independent of voice. The Authenticator/Duo push lives on a device the attacker doesn't have. That's the whole game.

Three customers caught vishing attempts in their first 30 days using exactly this protocol. None of them suspected anything from the voice. The push didn't approve. The attack ended at hello.

Ship a verified service desk in 30 days.

Book a 30-minute call with a solutions engineer who came out of an MSP service desk. Bring your stack. We'll model the impact with your numbers.

Book a Demo More Articles

Why "ask security questions" stopped working

Why voiceprint matching alone has gaps

The in-call MFA model

Fallback paths that don't break the model

The economics

The take-home

Continue reading

Why helpdesk vishing replaced phishing as the #1 ransomware vector

What an after-hours AI voice agent actually sounds like in production

Ship a verified service desk in 30 days.