Guides

Streaming

NeuronEdge fully supports streaming responses with real-time PII protection on each token. Get the responsiveness of streaming while maintaining complete privacy protection.

How Streaming Works

1

Request with stream: true

Enable streaming by setting stream: true in your request body.

2

PII Protection

NeuronEdge protects PII in your prompt before forwarding to the provider.

3

SSE Token Streaming

The provider streams tokens back. NeuronEdge checks each token for PII.

4

Real-time Restoration

Tokens containing redacted placeholders are restored with original values before forwarding to your client.

Streaming Examples

OpenAI

bash
curl -X POST https://api.neuronedge.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer ne_live_your_api_key" \
  -H "X-Provider-API-Key: sk-your-openai-key" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      {"role": "user", "content": "Tell me about John Smith at john@example.com"}
    ],
    "stream": true
  }'

Anthropic

typescript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  baseURL: 'https://api.neuronedge.ai/v1/anthropic',
  defaultHeaders: {
    'Authorization': `Bearer ${process.env.NEURONEDGE_API_KEY}`,
  },
});

const stream = await anthropic.messages.stream({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Tell me about John Smith' }],
});

for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    process.stdout.write(event.delta.text);
  }
}

SSE Response Format

Streaming responses use Server-Sent Events (SSE). Each event contains a JSON chunk with the token data:

text
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Browser Streaming

For browser-based applications, use the Fetch API with a ReadableStream:

typescript
async function streamChat(message: string) {
  const response = await fetch('https://api.neuronedge.ai/v1/openai/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer ne_live_your_api_key',
      'X-Provider-API-Key': 'sk-your-openai-key',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'gpt-5.2',
      messages: [{ role: 'user', content: message }],
      stream: true,
    }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') break;

        try {
          const json = JSON.parse(data);
          const content = json.choices[0]?.delta?.content || '';
          // Append content to your UI
          document.getElementById('output')!.textContent += content;
        } catch (e) {
          // Skip non-JSON lines
        }
      }
    }
  }
}

Error Handling

Errors during streaming are returned as SSE events with an error payload:

text
data: {"error":{"code":"PROVIDER_ERROR","message":"Provider returned an error","status":500}}

data: [DONE]

Always handle both successful chunks and error events in your stream processing:

typescript
for await (const chunk of stream) {
  if ('error' in chunk) {
    console.error('Stream error:', chunk.error);
    break;
  }

  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);
}

Performance Considerations

Detection Mode

Use real-time detection mode for lowest latency streaming. This uses regex-only detection which adds <1ms per token.

Token Buffering

NeuronEdge may buffer a few tokens to detect multi-token PII patterns. This adds minimal latency while ensuring complete protection.

Connection Keepalive

Keep your HTTP client connection alive for the duration of the stream. Set appropriate timeouts for long-running completions.

Best Practices

  • Use streaming for interactive chat UIs to improve perceived responsiveness
  • Handle connection drops gracefully—implement reconnection logic
  • Set appropriate timeouts for your use case (some completions take minutes)
  • Use real-time mode for best streaming performance
  • Monitor X-NeuronEdge-Detection-Time-Ms to track latency impact