ai-coding-data-policy-survey-en-20260309

Survey Date: March 9, 2026 | Methodology: 5 parallel librarian agent groups + cross-verification

This report systematically surveys the user agreements and data policies of 11 mainstream AI coding tools, focusing on whether user code is used for model training, data retention periods, the existence of “irrevocable perpetual license” clauses, and differences between enterprise and individual versions.

Key Findings

The most important conclusion of the survey can be summarized in one sentence: Free/individual versions almost always use your data to train models, while enterprise versions almost never do, but the word “almost” hides critical differences.

Specifically:

Tier 1 (Strongest Privacy Protection): Tongyi Lingma and Tabnine explicitly commit to not storing or training on user code across all tiers. Tabnine even supports completely air-gapped deployments, making it the only option for scenarios with extreme data security requirements.

Tier 2 (Secure Enterprise, Caution for Individual): Enterprise versions of GitHub Copilot, Windsurf, Google Gemini Code Assist, and Amazon Q Developer all provide zero data retention and no-training commitments, but free/individual versions collect data by default. Gemini Code Assist’s free version is particularly noteworthy, as it enables data collection by default and requires an active opt-out.

Tier 3 (Clear Risks): Volcengine Coding Plan, Zhipu AI, and Kimi/Moonshot all include “perpetual license” or similar clauses, and their opt-out mechanisms are either non-existent or have questionable execution. Volcengine’s terms are the most aggressive, explicitly stating that the “authorization period is perpetual” and “technically cannot be withdrawn.”

Special Cases: In an August 2025 consumer terms update, Anthropic extended the data retention period for Claude Code (individual accounts) from 30 days to 5 years (when users agree to training), sparking a strong backlash from the developer community. OpenAI’s consumer terms contain “perpetual, irrevocable” license clauses, though API/enterprise versions are not subject to this.


I. Overview Comparison

1.1 Is User Code Used for Model Training?

Tool Free/Individual Enterprise/API
Tongyi Lingma ❌ No training, no storage ❌ No training, VPC support
Tabnine ❌ No training, zero retention ❌ No training, air-gapped support
GitHub Copilot ⚠️ Possible (requires user permission) ❌ No training, zero IDE retention
Windsurf ⚠️ Chat content may be used (opt-out available) ❌ No training, zero retention by default
Google Gemini ⚠️ Enabled by default for free version ❌ No training for Standard/Enterprise
Amazon Q ⚠️ Enabled by default for free version (opt-out available) ❌ Pro automatically opts out
Anthropic Claude Code ⚠️ Default opt-out for training (2025.8 policy) ❌ No training for API/Enterprise
OpenAI Codex ⚠️ Default possible training (opt-out available) ❌ No training for API/Enterprise
Cursor ⚠️ Possible training when Privacy Mode is off ⚠️ No training with Business Plan + Privacy Mode
Volcengine Trae ✅ Explicitly used for training ⚠️ Enterprise supports VPC zero storage
Zhipu GLM/CodeGeeX ✅ Explicitly used for training ❌ Same base policy, no independent enterprise terms
Kimi/Moonshot ✅ Explicitly used for training ⚠️ Negotiable DPA, but default same as individual

1.2 Data Retention Period

Tool Individual Enterprise/API
Tabnine Zero retention (deleted after immediate processing) Zero retention
Tongyi Lingma Code context not stored Not stored, AES-256 encrypted transmission
GitHub Copilot Prompts retained for 28 days Zero IDE retention, 28 days for CLI
Windsurf Depends on settings Zero retention (immediate deletion)
Anthropic Claude Code Training agreed: 5 years; Refused: 30 days API 7-30 days, ZDR optional
OpenAI Codex Not specified 30 days (ZDR optional)
Amazon Q Free version may retain Pro does not collect
Google Gemini Not specified Stateless architecture, no storage
Cursor Zero retention in Privacy Mode Business zero retention
Volcengine Perpetual Enterprise zero cloud storage
Zhipu AI Retained after agreement termination (anonymized) Same as individual
Kimi No explicit period Negotiable

1.3 “Irrevocable Perpetual License” Clauses

This is the most noteworthy dimension of this survey. A “perpetual license” means that when using the service, the user grants the platform a permanent right to use their data, and even if the user later terminates the service, the authorized data usage cannot be withdrawn.

Tool Perpetual License Clause Exists Specific Content
Volcengine Explicitly exists, most aggressive “Authorization period is perpetual,” “technically cannot be withdrawn,” service stops if authorization is terminated
Zhipu AI Explicitly exists “Perpetual, free license to use,” “right to sub-license to third parties”
OpenAI (Consumer) Exists “Perpetual, irrevocable license” for User Content (does not apply to API/Enterprise)
Kimi ⚠️ May exist Third-party analysis points to a “perpetual training data usage” framework
Windsurf ⚠️ Feedback only Perpetual license granted for user feedback; code is not subject to this
Anthropic ❌ Not found Consumer terms do not include perpetual license clauses
GitHub Copilot ❌ Not found for Copilot specifically General platform terms have license grants, not specific to Copilot
Tabnine ❌ Does not exist Explicitly “does not retain any code”
Tongyi Lingma ❌ Does not exist “Code information is entirely owned and controlled by you”
Google Gemini ❌ Not found
Amazon Q ❌ Not found

1.4 IP Indemnity (Intellectual Property Infringement Protection)

IP indemnity means that if AI-generated code is accused of infringing third-party intellectual property rights, the service provider will bear the legal defense and compensation for the user.

Tool Individual Enterprise
Google Gemini ✅ Available from Standard ($19/mo)
GitHub Copilot ✅ Business/Enterprise
Amazon Q ✅ Pro ($19/mo)
Windsurf ✅ Enterprise
Tabnine ✅ Enterprise
Anthropic ✅ Commercial/API customers
OpenAI ✅ API/Enterprise
Cursor Not specified
Volcengine Not specified
Zhipu AI ❌ “Handle on your own” Not specified
Kimi Not specified

1.5 Opt-out Mechanism Details

Whether one can easily opt out of data training is a key indicator of the actual friendliness of an AI coding tool’s data policy. Some tools collect data by default but provide a convenient opt-out switch; others write opt-out into their terms but make actual execution difficult; and some provide no opt-out option at all.

Tool Opt-out Availability Operation Method Actual Convenience Remarks
Tongyi Lingma 🟢 No opt-out needed N/A ★★★★★ Never trains, never stores code; no action required
Tabnine 🟢 No opt-out needed N/A ★★★★★ Zero retention and zero training across all tiers; prevented at the architectural level
GitHub Copilot 🟢 One-click in settings Settings → Uncheck “Allow GitHub to use my code snippets for product improvements” ★★★★☆ Self-service toggle for Individual; Business/Enterprise does not train by default
Windsurf 🟢 Toggle in settings User Settings → code sharing options ★★★★☆ Individual can self-service opt out of chat training; Teams/Enterprise zero retention by default
Anthropic Claude Code 🟢 Toggle in settings Settings → Privacy → Turn off “Help improve Claude” ★★★☆☆ Self-service toggle available, but enabled by default after 2025.8 policy change; UI design criticized for being nudging; logging in with API Key bypasses this entirely
OpenAI Codex 🟢 Toggle in settings Settings → Data controls → opt-out (Instructions) ★★★☆☆ Self-service opt-out available, but consumer terms still retain perpetual license clause; API bypasses this
Amazon Q 🟡 Requires per-environment setup Console: AWS Organizations AI services opt-out policy; IDE: Set per IDE; CLI: qct configure ★★★☆☆ Free version requires manual opt-out for each environment, easy to miss; Pro version automatically opts out
Cursor 🟡 Manual Privacy Mode activation Settings → Turn on Privacy Mode ★★★☆☆ Self-service toggle available, but Privacy Mode has Legacy/New versions without clear labeling; Business Plan has it forced on by default
Google Gemini 🟡 Free version requires opt-out Specific method not detailed in documentation ★★☆☆☆ Free version enables data collection by default; opt-out path is not clear enough; Standard/Enterprise requires no action
Kimi 🔴 Nominal opt-out, difficult in practice Contact membership@moonshot.ai to apply ★☆☆☆☆ Official ToS mentions opt-out, but community reports say customer service requires account deletion to opt out; trained data is irreversible
Zhipu AI 🔴 No opt-out mechanism None ☆☆☆☆☆ No opt-out option found in user agreement; perpetual license + third-party sub-licensing
Volcengine 🔴 No substantive opt-out Can contact customer service to “terminate authorization,” but termination stops service, and used data “technically cannot be withdrawn” ☆☆☆☆☆ Formally provides a termination path, but at the cost of losing service, and existing data cannot be deleted; does not constitute an effective opt-out

In summary, opt-out friendliness is roughly divided into four tiers: Tier 1 requires no opt-out (Tongyi Lingma, Tabnine) as they don’t collect or train by design; Tier 2 provides convenient self-service opt-out switches (GitHub Copilot, Windsurf, Anthropic, OpenAI); Tier 3 has opt-out but with inconvenient operations or risks of omission (Amazon Q, Cursor, Gemini); Tier 4 has opt-out in name only or not at all (Kimi, Zhipu, Volcengine).


II. Detailed Product Analysis

2.1 Volcengine Coding Plan / Trae (ByteDance)

Volcengine’s data authorization agreement has the most aggressive terms in this survey.

Original Core Terms (Specific Terms for Doubao Assistant Zone Service):

3.1 Purpose of Authorization: Authorization under these rules will be used for the purposes of developing machine learning, artificial intelligence-related technologies, and the optimization, development, and use of Doubao Assistant Zone services.

3.2 Scope of Authorization: You agree to grant Volcengine a non-exclusive, non-transferable, non-sublicensable (but sublicensable to Volcengine affiliates and third-party outsourcing service providers to achieve the purpose of authorization), free right to allow Volcengine to transmit, store, use, copy, download, modify, or otherwise process customer data to achieve the purpose of authorization.

3.3 Authorization Period: The authorization period is perpetual. … You further fully acknowledge, understand, and agree that even if you complete the termination operation, due to the special nature of machine learning and artificial intelligence technology, once you authorize Volcengine to use relevant customer data and the relevant customer data has been used, the use of that part of the customer data will be technically irreversible. … If you terminate the authorization, you will be unable to continue using this service.

This agreement applies to multiple product lines including Coding Plan, Doubao Assistant, PromptPilot, and Global AI Search. The “irrevocable perpetual license” discussed in the community is entirely true, and Volcengine is the only vendor among those surveyed to explicitly state “technically irreversible.”

Trae IDE Additional Controversy: Independent security researchers found that even when users turn off telemetry, Trae IDE still initiated about 500 network requests within 7 minutes, uploading a total of 26MB of data (Source). The official response stated that only VS Code-related telemetry was turned off, and Trae’s own telemetry is not controlled by that setting. A research project on GitHub (segmentationf4u1t/trae_telemetry_research) documented the detailed scope of data collection, including system information, device ID, usage data, performance metrics, location and region, and workspace information.

Enterprise Differences: Trae CN Enterprise Edition supports VPC deployment, full-link code encrypted transmission, and zero cloud storage. In other words, the enterprise version can avoid the aforementioned data risks, but at an additional cost.

2.2 Zhipu AI (GLM / CodeGeeX)

Original Core Terms (User Agreement):

You grant Zhipu and its affiliates a non-exclusive, geographically unrestricted, perpetual, free license to use (including storing, using, copying, revising, editing, publishing, displaying, translating, distributing the above information or creating derivative works, incorporating the above information into other works in forms, media, or technologies known now or developed in the future, etc.) and the right to sub-license to third parties, as well as the right to collect evidence and file lawsuits against third-party infringements in its own name.

Another clause regarding model training:

To improve the quality of the products and services we provide to you, we may use data generated during your use of the large model platform or models within the platform to locate, maintain, and optimize our products and services, unless otherwise agreed between you and Zhipu.

Zhipu’s perpetual license terms are similar to Volcengine’s, but with one notable difference: Zhipu’s terms include the “right to sub-license to third parties,” meaning Zhipu has the right to authorize your data to any third party. No opt-out mechanism was found in the agreement. Regarding differences between Chinese and international versions, the survey only found the agreement for the Chinese version (bigmodel.cn) and did not find an independent international user agreement. CodeGeeX, as Zhipu’s coding assistant, follows the platform’s unified agreement and has no independent data policy.

2.3 Kimi / Moonshot (Moonshot AI)

Original Core Terms (Open Platform Service Agreement):

To continuously improve the service quality of Kimi Intelligent Assistant, Kimi Intelligent Assistant may use the content you input into Kimi Intelligent Assistant and the content Kimi Intelligent Assistant outputs to you for further development and training. You fully understand and accept such use and will not claim rights against Kimi Intelligent Assistant or claim that Kimi Intelligent Assistant infringes your rights due to such use.

Opt-out Mechanism: The official Terms of Service state that one can contact membership@moonshot.ai to opt out of training. However, community feedback suggests execution is questionable. A user on Reddit reported that customer service replied “account deletion is required to opt out” (Source). In Hugging Face community discussions, users also pointed out that “even the paid API uses your data” (Source).

Data Sovereignty Issues: As a company registered in China, Moonshot AI must comply with the Cybersecurity Law, Data Security Law, and Personal Information Protection Law, and government agencies can retrieve user data according to the law. This is an additional compliance consideration for overseas users.

2.4 Tongyi Lingma (Alibaba)

Tongyi Lingma has the most user-friendly data policy among Chinese vendors.

Original Core Terms (Privacy Policy):

2.2.3 The code information you upload and generate based on this basic function or service is entirely owned and controlled by you. Except for the usage scenarios and purposes listed in this agreement, we will not store it, nor will we use it for any other scenarios without your authorization, including not using it for model training.

Official FAQ (Source) further clarifies:

During code completion, context information will not be stored or used for any other purpose. During R&D intelligent Q&A, only after you click dislike/like, and only for chat records (not including code), will data be used for algorithm upgrades and iterations after de-identification and anonymization.

The Alibaba Cloud Model Studio also explicitly declares that “your data will never be used for model training,” and transmitted data is AES-256 encrypted (Source). The Enterprise Exclusive Edition supports VPC private deployment.

2.5 Anthropic Claude Code

The August 28, 2025 consumer terms update is one of the most controversial policy changes in this survey.

Core Changes (Official Announcement):

We will train new models using data from Free, Pro, and Max accounts when this setting is on (including when you use Claude Code from these accounts).

We are also extending data retention to five years, if you allow us to use your data for model training.

Key points: Free/Pro/Max users (including when using Claude Code through these accounts) can be used for training by default (opt-out mode); the data retention period for users who agree to training is extended from 30 days to 5 years; users must make a choice before October 8, 2025, or they will be unable to continue using the service.

API and Enterprise Versions Unaffected: Claude for Work, API usage, Amazon Bedrock, and Google Vertex covered by Commercial Terms are not subject to this policy. The API data retention period was shortened from 30 days to 7 days (starting September 15, 2025). Enterprise customers can apply for Zero Data Retention (ZDR).

Community Reaction: A post on Reddit r/ClaudeAI titled “Anthropic’s New Privacy Policy is Systematically Screwing Over Solo Developers” (Source) criticized this for creating a “two-tier system” where independent developers’ code becomes free training data for competitors. TechCrunch also noted that Anthropic’s UI design might be nudging users to agree to data sharing (Source).

Practical Advice: When using Claude Code, logging in with an API Key protects you under API terms (no training, 7-day retention); logging in with a personal account subjects you to consumer terms.

2.6 OpenAI Codex

Consumer Terms (Terms of Use) contain a noteworthy authorization clause:

By uploading any User Content you hereby grant and will grant OpenAI and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any OpenAI-related purpose in any form, medium or technology now known or later developed.

This is a typical “perpetual irrevocable license” clause applicable to User Content uploaded by consumers. However, API/Business/Enterprise terms explicitly exclude training:

We will not use Customer Content to develop or improve the Services.

The data policy for Codex as a coding agent depends on the access method: access via ChatGPT Plus/Pro is subject to consumer terms, access via API is protected by API terms, and access via Enterprise is protected by enterprise terms. API data is retained for 30 days by default for abuse monitoring, and enterprise versions can apply for ZDR.

2.7 GitHub Copilot

Business/Enterprise Tiers (Official Explanation):

No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.

Code completion and chat within the IDE adopt a zero-retention policy, with data deleted immediately after generation. CLI and Coding Agent retain data for 28 days.

Individual/Free Tiers: By default, data is not used for training, but users can choose to allow it in settings. The free version is more lenient regarding data usage.

Third-party Model Providers: GitHub Copilot uses Claude Sonnet via AWS Bedrock and Gemini via GCP, both of which have zero-retention agreements. Users can choose to disable specific third-party models.

Copyright Litigation: The Doe v. GitHub case is still ongoing, with oral arguments held in the Ninth Circuit Court of Appeals in February 2026. The core controversy is whether the similarity between AI-generated code and open-source code constitutes copyright infringement. The outcome of this case will affect the entire AI coding tool industry.

2.8 Cursor

Privacy Mode is key. Cursor has two privacy modes:

Privacy Mode (Legacy) provides the strongest protection: zero data retention, and code is never stored or trained. Privacy Mode (New) has slight changes: zero retention for third-party model providers, but Cursor itself may store some code data to provide additional features (such as remote indexing, memory, etc.), though it is still not used for training. When Privacy Mode is off: Cursor may use code data to improve AI features and train models.

Business Plan: Privacy Mode is forced on by default. OpenAI and Anthropic do not retain Business user data.

Community Controversy: Privacy discussions around Cursor are quite intense. Main focuses include: lack of transparency in privacy mode changes (Legacy vs New), embeddings generated by code indexing being stored on Cursor servers without full user control, and the possibility of .cursorignore files being bypassed by AI agents. A LinkedIn post warned enterprises to “stay away from Cursor,” claiming it sends sensitive files like .env to external servers (Source).

2.9 Windsurf (formerly Codeium)

Teams/Enterprise (ToS):

Customer Data is not used for any other purpose, including the training of language models. Customer Data is encrypted during transit and is not stored at rest.

Zero data retention is enabled by default. Code ownership clearly belongs to the user:

Exafunction agrees that you own all Suggestions. Exafunction hereby assigns to you all of its right, title, and interest in and to any Suggestions.

Individual/Pro (ToS): Chat content may be used for model improvement, with an opt-out available in settings. Code completion is not used for training.

Windsurf has a zero-data-retention agreement with OpenAI, and enterprise admins can disable OpenAI models. Self-hosting and hybrid deployment options are available.

2.10 Tabnine

Tabnine provides the most thorough privacy protection among all surveyed (Privacy Documentation):

When using Tabnine models, your code remains private. Tabnine NEVER retains or shares any of your code with third parties. Tabnine has a no-train-no-retain policy. This is in place regardless which model is being used.

Tabnine doesn’t use third-party APIs or models to deliver our service. Instead, we’ve developed proprietary models based on our own deep experience in generative AI.

Zero data retention for all tiers (including the free version). No third-party models or APIs are used. Models are trained only on open-source licensed code. Supports completely air-gapped deployment with zero telemetry data leakage. SOC 2 Type 2, ISO 27001, GDPR, and HIPAA certified.

The trade-off is that model capabilities may not be as strong as competitors using foundation models from major tech companies.

2.11 Google Gemini Code Assist and Amazon Q Developer

Gemini Code Assist (Data Governance): Standard and Enterprise versions do not use prompts or responses to train models. Caution is needed for the free version, which DevClass reported may use data to improve models by default. Enterprise provides stateless architecture, HIPAA BAA, and FedRAMP High authorization. IP indemnity is provided starting from the Standard version ($19/mo), the lowest threshold among all tools.

Amazon Q Developer (FAQs): The Pro version automatically opts out of data collection and model training and provides IP indemnity. The Free version collects data by default and requires manual opt-out, with each environment (Console, IDE, CLI) needing separate setup.


III. Cross-verification and Contradictory Findings

3.1 High-Confidence Conclusions from Multiple Sources

The following conclusions have been cross-verified across multiple independent sources:

  1. Volcengine’s “perpetual license + technically irreversible” clause indeed exists. The same or nearly identical wording appears in independent agreements for multiple product lines including Doubao Assistant, PromptPilot, and Global AI Search.
  2. Tongyi Lingma indeed does not store code data. Official privacy policies, FAQs, and Alibaba Cloud platform descriptions from three independent sources consistently confirm this.
  3. Anthropic indeed extended the consumer data retention period to 5 years in August 2025. Official announcements, TechCrunch, and Reddit discussions all confirm this.
  4. Tabnine indeed implements zero retention across all tiers. Official documentation, third-party reviews, and competitor comparisons all confirm this.

3.2 Contradictions or Ambiguities to Note

  1. Kimi’s opt-out execution issues: The official Terms of Service explicitly mention an opt-out mechanism, but community users report being required to delete their accounts during actual execution. This constitutes a contradiction between policy commitment and actual execution and requires continued attention.
  2. Cursor’s Privacy Mode changes: From Legacy to New versions, Cursor added exceptions for “possibly storing some code data to provide additional features.” The community has questioned the transparency of this change.
  3. Zhipu AI’s “right to sub-license to third parties”: The terms grant Zhipu the right to sub-license data to third parties, yet the privacy policy claims “no unauthorized use or disclosure will be conducted.” There is tension between these two clauses.
  4. GitHub Copilot Free vs Individual data usage differences: Documentation for the Free tier’s data policy is not as clear as for Business/Enterprise, requiring users to carefully read settings options.

3.3 Single-Source Information (Cite with Caution)

The following conclusions come from only a single source:


IV. Recommendations for Different Roles

4.1 Independent Developers

If you use a personal account for side projects, the safest choices are Tabnine (zero retention across all tiers) or Tongyi Lingma (explicitly no storage or training). If you prefer the model capabilities of Claude or GPT, be sure to use an API Key instead of a personal account to log in to Claude Code / Codex, as this protects you under API terms rather than consumer terms. Personal versions of Volcengine, Zhipu, and Kimi have higher risks in terms of data protection.

4.2 Enterprise Users

Enterprise versions of almost all mainstream tools provide sufficient data protection. Focus on the following dimensions when choosing:

4.3 Precautions for Using Chinese Tools

Chinese tools generally face two structural issues: first, the compliance of “perpetual license” clauses in user agreements under the Chinese legal framework is questionable (the Personal Information Protection Law requires data processing to have clear, reasonable purposes and periods); second, all Chinese platforms must cooperate with government data retrieval according to the law.

If you must use Chinese tools to process sensitive code, prioritize enterprise versions that support VPC/private deployment to avoid uploading core business logic or user data through personal versions.


V. GDPR and Data Compliance Analysis

5.1 GDPR Applicability

The applicability of GDPR (General Data Protection Regulation) is based on two triggering conditions (Article 3): having an establishment in the EU that processes data, or not being in the EU but offering goods/services to EU residents or monitoring their behavior. For the tools surveyed in this report, the situation is as follows:

Tools explicitly subject to GDPR: GitHub Copilot, Anthropic Claude, OpenAI Codex, Cursor, Windsurf, Tabnine, Google Gemini Code Assist, Amazon Q Developer. These tools provide services to global users, have a large number of users in the EU, and most have declared GDPR compliance (e.g., Tabnine is GDPR certified, GitHub has an EU Data Protection Agreement).

Tools that may not be directly subject to GDPR: Volcengine Trae (mainly targeting the Chinese market, with “CN” in the product name), Zhipu AI (mainly serving Chinese users via bigmodel.cn), Kimi/Moonshot (mainly in the Chinese market despite having international users), Tongyi Lingma (mainly serving Alibaba Cloud China region customers). However, if EU residents use these services, GDPR may still be triggered.

5.2 Compliance of Tools with Core GDPR Clauses

The most relevant GDPR clauses for AI coding tool data policies include:

Article 7(3) Right to Withdraw Consent: Data subjects have the right to withdraw consent at any time, and it shall be as easy to withdraw as to give consent.

Article 7(4) Freedom of Consent: When assessing whether consent is “freely given,” utmost account shall be taken of whether the performance of a contract is conditional on consent to processing personal data that is not necessary for that performance (i.e., prohibition of bundling).

Article 17 Right to be Forgotten: Users have the right to request the deletion of personal data.

Article 5(1)(e) Storage Limitation Principle: Personal data shall be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed.

Tool Art.7(3) Withdrawal Art.7(4) Unbundled Art.17 Deletion Art.5(1)(e) Storage Limit Overall Assessment
Tongyi Lingma ✅ No withdrawal needed ✅ No collection ✅ No storage ✅ No retention 🟢 Compliant
Tabnine ✅ No withdrawal needed ✅ No collection ✅ Zero retention ✅ Zero retention 🟢 Compliant (Certified)
GitHub Copilot ✅ Withdrawable ✅ Unbundled ✅ Deletable ⚠️ 28-day retention 🟢 Generally compliant
Windsurf ✅ Withdrawable ✅ Unbundled ✅ Deletable ✅ Enterprise zero retention 🟢 Generally compliant
Amazon Q ✅ Withdrawable ✅ Unbundled ✅ Deletable ⚠️ Free version may retain 🟢 Generally compliant
Google Gemini ⚠️ Free opt-out unclear ⚠️ Free default collection ✅ Deletable ✅ Paid version stateless 🟡 Free version questionable
Anthropic ⚠️ Withdrawable but UI nudges ⚠️ Choice required to continue ⚠️ 5-year retention is long ⚠️ 5-year retention 🟡 Borderline operations
OpenAI ✅ Withdrawable ⚠️ Consumer perpetual license ⚠️ Perpetual license vs deletion right ⚠️ Period not specified 🟡 Consumer terms questionable
Cursor ✅ Withdrawable ✅ Unbundled ⚠️ Embeddings storage ⚠️ Period not specified 🟡 Ambiguous areas exist
Volcengine “Technically irreversible” Termination = Stop service Perpetual retention Perpetual 🔴 Clear conflict
Zhipu AI No opt-out Use = Authorization ⚠️ Retained after anonymization Perpetual license 🔴 Clear conflict
Kimi ⚠️ Nominal withdrawal, difficult ⚠️ Use = Consent to training ⚠️ Trained data irreversible ⚠️ No explicit period 🔴 Multiple conflicts

5.3 GDPR Conflict Analysis of Volcengine Terms

Volcengine has the most obvious conflicts with GDPR among those surveyed. Specifically:

Art.7(3) Right to Withdraw Consent: GDPR explicitly states that “the data subject shall have the right to withdraw his or her consent at any time,” and it shall be as easy to withdraw as to give consent. Volcengine’s “technically irreversible” directly contradicts this. Although Volcengine provides a formal path to “contact customer service to terminate authorization,” the attached condition is that “termination of authorization means stopping the service,” which constitutes a penalty for exercising the right to withdraw.

Art.7(4) Unbundled Consent Principle: Recital 43 of GDPR points out that when the performance of a contract is conditional on consent to processing data not necessary for the contract, the consent should not be considered freely given. Volcengine bundling data training authorization with service use (no consent = no use) is exactly the bundling mode GDPR seeks to prohibit.

Art.17 Right to be Forgotten: Users have the right to request the deletion of personal data. “Data already used for training is technically irreversible” means the platform cannot fulfill its deletion obligation. It is worth noting that “technically impossible” is not a compliance defense under the GDPR framework. The logic of GDPR is: if you cannot satisfy the deletion obligation, you should not have processed the data in this way in the first place. This is also the legal logic behind Tabnine’s choice of the “no retention at all” route, bypassing the deletion obligation at the architectural level.

Art.5(1)(e) Storage Limitation Principle: “Perpetual” retention directly contradicts the principle of data minimization.

5.4 Why These Terms Can Exist in Practice

After understanding the legal conflicts, a natural question is: why can these terms exist? There are several levels of reasons.

Jurisdictional Level: The primary markets for Volcengine, Zhipu, and Kimi are in China, and they likely do not consider themselves subject to GDPR. The reason aggressive data terms can be written into agreements is precisely because these tools do not intend to accept GDPR constraints.

Chinese Legal Level: China’s Personal Information Protection Law (PIPL) has similar requirements to GDPR in its text. Article 15 of PIPL stipulates that individuals have the right to withdraw consent, and personal information processors should provide convenient ways to withdraw; Article 47 stipulates that individuals have the right to request the deletion of personal information. There is also tension between Volcengine’s terms and PIPL. However, PIPL has exemption clauses for “anonymized data,” and both Volcengine and Zhipu’s agreements mention anonymization, which may be their compliance argument under the domestic legal framework. More importantly, there is a substantive difference in enforcement intensity between PIPL and GDPR.

Industry Level: The irreversibility of AI training data is a technical fact faced by the entire industry, and Anthropic’s August 2025 FAQ also hinted at similar logic. The difference is that vendors subject to GDPR (such as Anthropic, OpenAI) chose to provide opt-out mechanisms and set limited retention periods to address this issue; while Chinese vendors not directly subject to GDPR chose to explicitly state “perpetual” and “irreversible” in their terms.

5.5 Actual Impact on Users

For users in the EU or concerned about GDPR compliance, the following is recommended:

Choose tools that are GDPR certified or have explicitly declared compliance (Tabnine, GitHub Copilot Enterprise, Windsurf Enterprise). When using tools from international vendors, confirm that opt-out is in effect. Avoid using personal versions of Chinese tools without confirming data protection terms. In enterprise scenarios, if Chinese tools must be used, choose enterprise versions that support VPC/private deployment and sign a Data Processing Agreement (DPA) containing GDPR-equivalent protection clauses.

It should be particularly pointed out that although Anthropic’s 5-year data retention period is controversial under the GDPR framework (storage limitation principle), it still retains the user’s right to withdraw consent and toggle settings, and EU data subjects can independently assert rights based on GDPR. This is fundamentally different from Volcengine’s triple restriction of “perpetual + irreversible + bundled service.”


VI. Summary of Information Sources

Official Policy Documents

Community Discussion and Third-Party Analysis