Guides

Streaming

NeuronEdge fully supports streaming responses with real-time PII protection on each token. Get the responsiveness of streaming while maintaining complete privacy protection.

How Streaming Works

Request with stream: true

Enable streaming by setting stream: true in your request body.

PII Protection

NeuronEdge protects PII in your prompt before forwarding to the provider.

SSE Token Streaming

The provider streams tokens back. NeuronEdge checks each token for PII.

Real-time Restoration

Tokens containing redacted placeholders are restored with original values before forwarding to your client.

Streaming Examples

OpenAI

bash

curl -X POST https://api.neuronedge.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer ne_live_your_api_key" \
  -H "X-Provider-API-Key: sk-your-openai-key" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      {"role": "user", "content": "Tell me about John Smith at john@example.com"}
    ],
    "stream": true
  }'

Anthropic

typescript

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  baseURL: 'https://api.neuronedge.ai/v1/anthropic',
  defaultHeaders: {
    'Authorization': `Bearer ${process.env.NEURONEDGE_API_KEY}`,
  },
});

const stream = await anthropic.messages.stream({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Tell me about John Smith' }],
});

for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    process.stdout.write(event.delta.text);
  }
}

SSE Response Format

Streaming responses use Server-Sent Events (SSE). Each event contains a JSON chunk with the token data:

text

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Browser Streaming

For browser-based applications, use the Fetch API with a ReadableStream:

typescript

async function streamChat(message: string) {
  const response = await fetch('https://api.neuronedge.ai/v1/openai/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer ne_live_your_api_key',
      'X-Provider-API-Key': 'sk-your-openai-key',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'gpt-5.2',
      messages: [{ role: 'user', content: message }],
      stream: true,
    }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') break;

        try {
          const json = JSON.parse(data);
          const content = json.choices[0]?.delta?.content || '';
          // Append content to your UI
          document.getElementById('output')!.textContent += content;
        } catch (e) {
          // Skip non-JSON lines
        }
      }
    }
  }
}

Error Handling

Errors during streaming are returned as SSE events with an error payload:

text

data: {"error":{"code":"PROVIDER_ERROR","message":"Provider returned an error","status":500}}

data: [DONE]

Always handle both successful chunks and error events in your stream processing:

typescript

for await (const chunk of stream) {
  if ('error' in chunk) {
    console.error('Stream error:', chunk.error);
    break;
  }

  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);
}

Performance Considerations

Detection Mode

Use real-time detection mode for lowest latency streaming. This uses regex-only detection which adds <1ms per token.

Token Buffering

NeuronEdge may buffer a few tokens to detect multi-token PII patterns. This adds minimal latency while ensuring complete protection.

Connection Keepalive

Keep your HTTP client connection alive for the duration of the stream. Set appropriate timeouts for long-running completions.

Best Practices

•Use streaming for interactive chat UIs to improve perceived responsiveness
•Handle connection drops gracefully—implement reconnection logic
•Set appropriate timeouts for your use case (some completions take minutes)
•Use real-time mode for best streaming performance
•Monitor X-NeuronEdge-Detection-Time-Ms to track latency impact