Guides

PII Protection

NeuronEdge uses a dual-engine protection system combining 102 optimized regex patterns with NER-based machine learning to detect and redact 105+ types of personally identifiable information.

How Protection Works

1. Regex Engine (Primary)

102 optimized regex patterns run in Rust/WASM for sub-millisecond detection. Patterns are designed for high precision to minimize false positives while catching common PII formats like SSNs, credit cards, emails, and phone numbers.

2. NER Engine (Supplementary)

Named Entity Recognition using Workers AI (Llama 3.2) detects contextual entities like names, organizations, and locations that don't follow fixed patterns. This catches PII that regex alone would miss.

3. Deduplication

Results from both engines are merged and deduplicated to ensure each PII instance is only redacted once, preventing double-replacement issues.

Entity Categories

NeuronEdge supports 105+ entity types organized into 8 categories:

Identity

Personal identifiers and government-issued IDs

PERSONSSNPASSPORTDRIVERS_LICENSEDOBNATIONAL_IDTAX_ID

Contact

Contact information and addresses

EMAILPHONEADDRESSZIP_CODEPO_BOX

Financial

Financial account numbers and payment data

CREDIT_CARDBANK_ACCOUNTIBANSWIFTROUTING_NUMBERCVV

Medical

Protected health information (PHI)

MEDICAL_RECORDHEALTH_PLANNPIDEA_NUMBERDIAGNOSISPRESCRIPTION

Location

Geographic and network location data

ADDRESSCOORDINATESIP_ADDRESSMAC_ADDRESSGEOLOCATION

Technical

Secrets, credentials, and API keys

API_KEYPASSWORDAWS_KEYGITHUB_TOKENJWTPRIVATE_KEY

Organization

Organization names and business identifiers

ORGGPECOMPANY_IDEINDUNS

Compliance

Regulatory-specific identifiers

GDPR_IDCCPA_IDHIPAA_ID

Redaction Formats

Choose how detected PII is replaced before sending to the LLM provider:

Token Format

All tiers

Replaces PII with type-based placeholders. Simple, readable, and reversible. The LLM sees the entity type but not the value.

text
// Input
"My name is John Smith and my SSN is 123-45-6789"

// Sent to LLM
"My name is [PERSON] and my SSN is [SSN]"

// Response (restored)
"Hello John Smith, I see you provided your SSN..."

Hash Format

Professional+

Generates deterministic hash-based placeholders. The same input always produces the same hash, enabling consistent references across conversations.

text
// Input
"Contact John Smith at john@example.com"

// Sent to LLM
"Contact [HASH:a1b2c3d4] at [HASH:e5f6g7h8]"

// The same values always produce the same hashes
// Useful for multi-turn conversations

Synthetic Format

Professional+

Generates realistic fake data that maintains semantic meaning. The LLM sees believable placeholder data that looks real but isn't.

text
// Input
"My name is John Smith, SSN 123-45-6789, email john@example.com"

// Sent to LLM
"My name is Sarah Johnson, SSN 987-65-4321, email sarah.j@demo.org"

// Response uses synthetic data, then restored to original

Detection Modes

Balance between speed and thoroughness with detection modes:

ModeEnginesLatencyUse Case
real-timeRegex only<1msUltra-low latency, streaming
balancedRegex + lightweight NER~5msRecommended default
thoroughRegex + full NER~15msMaximum accuracy

Configuring Protection

Set redaction preferences per-request using headers:

bash
curl -X POST https://api.neuronedge.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer ne_live_..." \
  -H "X-Provider-API-Key: sk-..." \
  -H "X-NeuronEdge-Format: synthetic" \
  -H "X-NeuronEdge-Mode: balanced" \
  -H "X-NeuronEdge-Entities: PERSON,SSN,EMAIL" \
  -d '{"model": "gpt-5.2", "messages": [...]}'

Or configure defaults in your policy to avoid per-request headers.