安全与供应链

When AI Learns to Forge Everything: How Image Generation Is Undermining Financial Security

Have you ever done facial verification on a banking app? You look into the camera, blink, nod, open your mouth, and the system confirms your identity before granting access. Behind this process lies a core assumption: the face in front of the camera is real and belongs to the phone’s owner.

That assumption is breaking down.

In February 2024, security firm Group-IB documented a trojan called GoldPickaxe. It disguised itself as a Thai government “digital pension” app, prompting users to record a facial video. During the recording, it asked users to perform various actions: blink, smile, turn their head, nod, open their mouth. It looked like identity verification, but it was actually harvesting facial data. After obtaining the video, attackers used AI face-swapping tools to generate a new deepfake video, opened a banking app on their own phone, and played the video during the liveness detection step. The bank’s system passed the verification. One victim lost approximately $40,000.

The same attack is now scaling. Between January and August 2025, Group-IB recorded 8,065 similar deepfake injection attacks at a single financial institution, all targeting production systems.

These cases point to a broader shift: AI image and video generation technologies, represented by GPT-Image-2 and real-time face-swapping tools, are systematically invalidating the security assumptions that the financial industry has relied on for years. These risks are already causing billions of dollars in losses today, and the industry’s response is only beginning.

The End of Visual Confirmation

The financial industry’s identity verification framework is built on layer after layer of visual confirmation. During account opening, your ID photo is examined, your face is checked against the ID, and you are verified as a living person in front of a camera. Each layer of confirmation corresponds to an assumption, and these assumptions are being dismantled one by one.

Liveness Detection: From “A Person Is in Front of the Camera” to “The Data Stream Comes from a Real Camera”

Liveness detection falls into three categories. The first is passive detection: the system analyzes features like skin texture and light reflection in the photo to determine whether it shows a real person. The second is active detection: the user is asked to perform specific actions (blink, turn their head), and the system confirms a live person is present through dynamic responses. The third is 3D detection: structured light or LiDAR sensors project an infrared dot pattern to capture three-dimensional depth information of the face.

The first two categories run on 2D cameras, and their security assumption is that the image captured by the camera originates from the real physical world. But when attackers use virtual camera software, this assumption is bypassed entirely. Virtual cameras disguise AI-generated deepfake video streams as hardware camera input, and the operating system and applications treat them as legitimate devices. Liveness detection systems check whether “the face in this image looks like a real person,” not whether “there is actually a person in front of the camera.”

The growth rate of these injection attacks is striking. Data from security firm ROC shows that injection attacks grew 9x in 2024, and virtual camera exploits grew 28x.

The barrier to launching such attacks is already low. The open-source tool Deep Live Cam can generate real-time face-swapped video from a single photo, requires no model training, and is free to use. In other words, an attacker only needs one photo of their target from social media to mount an attack. The World Economic Forum (WEF) published a Cybercrime Atlas report in January 2026, testing 17 face-swapping tools and 8 injection tools, and found that most could bypass standard KYC biometric verification. A synthetic facial image capable of passing standard detection can cost as little as $5.

3D detection offers significantly stronger security. Apple Face ID uses the TrueDepth camera to project over 30,000 infrared dots to reconstruct the three-dimensional structure of a face, with a probability of less than one in a million that a random person could unlock it. The problem, however, is that a large number of devices lack this hardware. UK consumer advocacy organization Which? has tested the facial unlock features of 208 phones since 2022 and found that 64% could be fooled by a printed photo. At its worst in 2024, this figure reached 72%, including flagships priced above $1,000 (such as the Samsung Galaxy S25 and Oppo Find X9 Pro). Although banking apps no longer accept Android’s 2D facial recognition as an authentication factor, the vulnerability of phone unlock still provides an entry point in the attack chain: an attacker can unlock the phone with a photo, intercept SMS verification codes, and initiate a password reset.

IDs and Documents: What AI Can Forge

Liveness detection is just one step in identity verification. Before that comes an even more fundamental step: submitting ID photos. And AI’s ability to generate fake IDs has reached an entirely different level.

Sumsub’s 2025 report shows that synthetic identity document fraud grew 195% globally from Q1 2024 to Q1 2025, and 311% in North America. AI can replicate fonts, layouts, security watermarks, holograms, and other security features. Sepideh Rowland, a partner at fintech firm Klaros Group, demonstrated the process in an American Banker report: generating a realistic receipt with Microsoft Copilot required just a one-line prompt, and one additional instruction added creases and water stain effects. Her conclusion was that AI-generated financial and identity documents are “virtually impossible to detect.”

It goes beyond identity documents. Pay stubs, bank statements, utility bills, all documents used for income and address verification, can be forged just as easily. And there is an infrastructure-level gap here: there is currently no centralized database to verify whether an account number on a utility bill actually belongs to the corresponding utility company. You can generate a complete-looking power company bill with any address, and no database can automatically cross-verify it.

These capabilities have already been productized. A tool called ProKYC sells for $629 per year, offering a complete three-step KYC bypass workflow: use AI to generate a fake ID, generate a deepfake selfie video that matches the ID photo, and inject it into the verification system via a virtual camera. A demo video showed it successfully passing the KYC process at crypto exchange Bybit. Eric Huber, head of adversarial intelligence at TD Bank, demonstrated how these tools work at an industry conference and noted that similar kits can cost as little as $300.

Synthetic identity fraud, assembling a nonexistent person from fragments of real personal information (such as stolen Social Security numbers) combined with fabricated data, has already caused enormous financial losses. TransUnion’s data shows that as of the end of 2024, U.S. lenders faced over $3.3 billion in risk exposure from synthetic identities. In 2024, the U.S. Financial Crimes Enforcement Network (FinCEN) issued its first formal alert specifically addressing deepfake financial fraud, confirming a sustained increase in deepfake-related suspicious activity reports from financial institutions starting in 2023.

The Fundamental Dilemma of KYC

The design logic of traditional KYC (Know Your Customer) processes is to perform a one-time identity verification at account opening and then assume the person is real once they pass. This design has exposed two problems in the AI era.

The first problem is that the one-time check itself is no longer reliable. When IDs can be forged, faces can be synthesized, and liveness detection can be bypassed, passing KYC no longer equals identity authenticity.

The second problem may be more severe: Sumsub’s data shows that 76% of fraud occurs after the customer registration stage. This means that even if KYC successfully blocks some fake identities, once an attacker passes initial verification, there is virtually no ongoing identity re-confirmation mechanism. KYC is a static checkpoint, not a continuous process.

Checks and Transfers: Another Door Opened by AI

Beyond identity verification, AI image generation poses equally direct threats to payment and transaction security.

The Trust Gap in Mobile Deposit

U.S. banks allow users to deposit checks by taking a photo (mobile deposit). One traditional safeguard requires users to write a restrictive endorsement on the back of the check, such as “For Mobile Deposit Only at XXX Bank.” But this safeguard assumes the check is a real physical object.

AI already has the ability to generate check images convincing enough to pass mobile deposit review. Compliance security firm Secureframe disclosed an actual case it experienced: two entirely fabricated paper checks were successfully deposited, even though the company had never lost a physical checkbook. The fraudsters used AI to generate check images realistic enough to pass the bank’s Positive Pay system screening. Secureframe’s conclusion: AI does not need to produce perfect forgeries, just forgeries convincing enough to blend into the normal noise of operations.

The risk of duplicate deposits is even more immediate. In 2024, 65% of financial institutions reported check fraud through remote deposit channels, with related losses exceeding $400 million. A typical scenario involves a user first depositing a check via their phone app, then taking the same physical check to a teller window or ATM for a second deposit. Banks’ duplicate detection systems work reasonably well within the same institution and channel, but when fraudsters use different channels or different banks, detection difficulty increases significantly because most banks cannot view other banks’ deposit records in real time.

Mitek’s Check Fraud Defender includes a “check liveness” feature that can distinguish between a direct photo of a physical check and a photo of a screen displaying a check image, with 93.3% accuracy. But the mobile deposit channel has a fundamental weakness: security features that a teller could detect by physically touching the check, such as paper watermarks, thickness, and texture, are completely unverifiable in a phone photo.

Dark web check trading is also supplying raw materials for this type of fraud. Recorded Future’s research tracked 1.9 million stolen U.S. bank checks circulating across more than 700 Telegram channels in 2024, with stolen check images appearing on trading platforms within an average of 8 days of theft.

Voice Cloning: 3 Seconds of Audio Is Enough

AI’s impact on financial security extends to voice as well. The maturity of voice cloning technology has made identity confirmation over phone calls and video conferences equally unreliable.

In 2019, the CEO of a UK energy company received a call from someone claiming to be the CEO of its German parent company, demanding an urgent wire transfer within one hour. The CEO recognized the “boss’s” slight German accent and characteristic intonation and was convinced it was him speaking. He executed a transfer of 220,000 euros. The funds were subsequently moved from Hungary to Mexico and could not be recovered.

A January 2024 case in Hong Kong was an order of magnitude larger. An employee in the finance department of Arup’s Hong Kong office, a global engineering consultancy, was invited to join a video conference where the CFO and several colleagues he recognized appeared on screen. Every one of them was a deepfake generated by AI in real time. After the employee was convinced the participants were real, he executed 15 transfers totaling approximately $25.6 million in a single day.

A March 2025 case in Singapore demonstrated an evolution in attack strategy. The attackers proactively suggested a video conference to discuss a deal. The act of proactively offering video verification itself created a sense of trust. The company lost approximately $500,000 as a result.

Producing a convincing voice clone now requires approximately 3 seconds of audio sample.

What the Financial Industry Is Doing

The financial industry has begun deploying defenses across multiple layers, but each layer faces its own limitations.

3D Liveness Detection + Injection Attack Detection

The first line of defense against deepfakes is upgrading liveness detection. 3D liveness detection uses structured light or ToF sensors to read facial depth information, distinguishing a real 3D face from any flat-surface presentation at the physical level. Major KYC providers (FaceTec, Jumio, Identomat, and others) have all released ISO 30107-3 certified 3D liveness detection solutions. The combination of iBeta Level 2 certified 3D liveness detection and deepfake detection can achieve a false acceptance rate of one in a hundred million.

But 3D detection requires specific hardware. Among Android devices, only a handful of flagship models are equipped with structured light or ToF depth sensors. When a device does not support 3D detection, the system must fall back to 2D detection, and security decreases accordingly. This problem is particularly acute in emerging markets where Android’s market share exceeds 80%, which also happen to be regions with high rates of financial fraud.

Injection attack detection is also becoming standard. These systems verify that input signals genuinely originate from a legitimate hardware camera, rather than from a virtual camera or other software injection. However, the current industry standard, ISO 30107-3, primarily targets physical presentation attacks (photos, masks) and has limited coverage of injection attacks.

Behavioral Biometrics: Not Who You Are, but How You Act

Behavioral biometrics focuses not on a user’s identity characteristics but on their operational habits. It continuously analyzes keystroke rhythm, mouse movement patterns, touchscreen pressure, device holding posture, and other signals to generate a dynamic risk score. Even if an attacker passes facial recognition, behavioral patterns that differ from the real account owner’s will still trigger anomaly alerts during account operation.

A deepfake can forge a face, but it is far harder to simultaneously replicate the operational habits a person has accumulated over time.

BioCatch is the leading vendor in this space, deployed at over 50 financial institutions worldwide. One large bank identified approximately 1,000 “mule accounts” used to launder stolen funds within months of deployment, with 98% of cases detected by behavioral analysis before existing systems raised an alert. The fraud detection rate reached 98%.

The core value of behavioral biometrics lies in filling the post-authentication gap. Traditional biometrics makes a single determination at login and then stops, while behavioral analysis runs continuously throughout the entire session.

AI vs. AI: The Real-World Performance of Detection Tools

Detection of AI-generated content is another line of defense, but there is a significant gap between the performance vendors claim and actual effectiveness.

Commercial systems claim 96-98% accuracy in controlled environments. However, Mitek Systems’ testing showed that when external deepfake detection models are placed in real-world scenarios, false acceptance rates range from 60% to 90%, meaning most deepfake attacks are let through. Independent benchmarks show that the best commercial systems achieve 78% accuracy on video deepfakes and 89% on audio. The gap between laboratory accuracy and real-world performance can be 20 to 30 percentage points.

This gap has specific technical causes. Compression artifacts introduced by video encoding can be confused with deepfake artifacts, leading to misclassification. Detection model generalization is also limited: models trained on GAN-generated deepfakes perform poorly on content generated by diffusion models. Each new generation of generative model requires detection models to be retrained on new data.

Google SynthID represents a different approach: embedding imperceptible digital watermarks at the moment AI content is generated. Google has watermarked over 10 billion pieces of content. However, SynthID only covers output from Google’s own models (Gemini, Imagen, etc.); content generated by ChatGPT, Stable Diffusion, and others does not carry this watermark. Moreover, SynthID has already been partially reverse-engineered: a developer used 200 solid black images and signal processing analysis to claim approximately 90% accuracy in detecting and partially removing the watermark.

Regulation Is Catching Up

Regulators around the world have begun to act. In November 2024, U.S. FinCEN issued its first deepfake financial fraud alert, listing 9 red flag indicators and requiring financial institutions to use dedicated keyword tags in suspicious activity reports. The U.S. Congress introduced H.R.1734, proposing to establish an AI security task force for financial services. Singapore’s MAS issued a deepfake-specific circular, requiring financial institutions to adopt layered detection technologies. The EU AI Act’s high-risk provisions will take full effect in mid-2026, with many identity verification systems classified as high-risk AI. China also published its first deepfake detection standard for the financial sector in September 2024, jointly developed by more than 10 institutions including ICBC and China Construction Bank.

But a time lag exists between regulatory progress and technological evolution. AI tools iterate on a monthly cycle, while regulatory frameworks typically require one to two years to adjust.

Offense vs. Defense: Who Is Ahead?

A key question is: between generation capability (offense) and detection capability (defense), which side holds the advantage?

The path of human visual judgment is no longer viable. iProov’s 2025 study tested more than 8,000 participants and found that only 0.1% could correctly identify all deepfake content. Human identification accuracy for high-quality video deepfakes was just 24.5%. Sixty percent of people believed they could tell real from fake, but this confidence was almost entirely unfounded. The German Federal Office for Information Security (BSI) put it more bluntly: “No human operator can be trained to identify real-time deepfakes.”

At the AI-versus-AI level, the defense currently maintains a narrow technical lead, but the advantage is shrinking. Broadcom/Symantec’s analysis notes that defenders have longer experience with AI applications and more mature model iteration infrastructure, but the attacker’s advantage lies in asymmetry: attackers only need to find one vulnerability, while defenders must secure every single link.

Gartner’s 2024 prediction serves as a useful reference point: by 2026, 30% of enterprises will consider standalone identity verification and authentication solutions no longer reliable. Identity verification itself will not be abandoned, but it must evolve from a single checkpoint to a multi-layered combination. The most likely equilibrium state is a shift from one-time verification to multi-signal continuous verification: a combination of 3D liveness detection, behavioral biometrics, AI content forensics, device fingerprinting, and transaction behavior analysis.

Looking at the loss data, the scale of this arms race is expanding rapidly. Deloitte forecasts that AI-driven financial fraud losses in the U.S. will grow from $12.3 billion in 2023 to $40 billion by 2027, a compound annual growth rate of 32%. The deepfake fraud detection market is expected to expand at a 42% annual growth rate, from $5.5 billion in 2023 to $15.7 billion in 2026. Both sides are growing rapidly, but the offense is currently growing faster.

What Ordinary People Can Do

The FBI and the American Bankers Association jointly released a deepfake prevention guide in September 2025. The most practical recommendations include the following.

When you receive an urgent request for a wire transfer, pause. Regardless of whether the request comes via phone, video, or email, independently contact the other party through a channel you already know (such as a phone number you have saved) to confirm. If the Arup employee had called the CFO’s known number before executing the transfers, the $25.6 million loss would not have occurred.

Consider establishing a verification code word with family members and colleagues. During phone or video calls, the code word provides a quick way to confirm whether the other party is a real person. This method is simple but effective, because a deepfake cannot anticipate a code word.

Reduce the exposure of personal information on social media. Your publicly available photos, videos, and voice recordings can all be used to train deepfake models. Voice cloning requires only 3 seconds of audio sample.

Maintain a healthy skepticism toward your habitual trust in visual confirmation. In the Singapore case, the attackers proactively offered a video conference, exploiting precisely the trust people place in video calls. In the AI era, seeing does not equal verifying.


Data sources and reference links are embedded throughout the text. Primary sources include FinCEN FIN-2024-Alert004, the World Economic Forum Cybercrime Atlas Report (January 2026), Group-IB threat intelligence, TransUnion Synthetic Identity Research (September 2025), the Sumsub Identity Fraud Report 2025, Deloitte Center for Financial Services forecasts, the FBI/ABA joint guide, and benchmark testing from multiple independent security research organizations.