Guides
Streaming
NeuronEdge fully supports streaming responses with real-time PII protection on each token. Get the responsiveness of streaming while maintaining complete privacy protection.
How Streaming Works
Request with stream: true
Enable streaming by setting stream: true in your request body.
PII Protection
NeuronEdge protects PII in your prompt before forwarding to the provider.
SSE Token Streaming
The provider streams tokens back. NeuronEdge checks each token for PII.
Real-time Restoration
Tokens containing redacted placeholders are restored with original values before forwarding to your client.
Streaming Examples
OpenAI
curl -X POST https://api.neuronedge.ai/v1/openai/chat/completions \
-H "Authorization: Bearer ne_live_your_api_key" \
-H "X-Provider-API-Key: sk-your-openai-key" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "gpt-5.2",
"messages": [
{"role": "user", "content": "Tell me about John Smith at john@example.com"}
],
"stream": true
}'Anthropic
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
baseURL: 'https://api.neuronedge.ai/v1/anthropic',
defaultHeaders: {
'Authorization': `Bearer ${process.env.NEURONEDGE_API_KEY}`,
},
});
const stream = await anthropic.messages.stream({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Tell me about John Smith' }],
});
for await (const event of stream) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
process.stdout.write(event.delta.text);
}
}SSE Response Format
Streaming responses use Server-Sent Events (SSE). Each event contains a JSON chunk with the token data:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702000000,"model":"gpt-5.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Browser Streaming
For browser-based applications, use the Fetch API with a ReadableStream:
async function streamChat(message: string) {
const response = await fetch('https://api.neuronedge.ai/v1/openai/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer ne_live_your_api_key',
'X-Provider-API-Key': 'sk-your-openai-key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-5.2',
messages: [{ role: 'user', content: message }],
stream: true,
}),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader!.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || '';
// Append content to your UI
document.getElementById('output')!.textContent += content;
} catch (e) {
// Skip non-JSON lines
}
}
}
}
}Error Handling
Errors during streaming are returned as SSE events with an error payload:
data: {"error":{"code":"PROVIDER_ERROR","message":"Provider returned an error","status":500}}
data: [DONE]Always handle both successful chunks and error events in your stream processing:
for await (const chunk of stream) {
if ('error' in chunk) {
console.error('Stream error:', chunk.error);
break;
}
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}Performance Considerations
Detection Mode
Use real-time detection mode for lowest latency streaming. This uses regex-only detection which adds <1ms per token.
Token Buffering
NeuronEdge may buffer a few tokens to detect multi-token PII patterns. This adds minimal latency while ensuring complete protection.
Connection Keepalive
Keep your HTTP client connection alive for the duration of the stream. Set appropriate timeouts for long-running completions.
Best Practices
- •Use streaming for interactive chat UIs to improve perceived responsiveness
- •Handle connection drops gracefully—implement reconnection logic
- •Set appropriate timeouts for your use case (some completions take minutes)
- •Use
real-timemode for best streaming performance - •Monitor
X-NeuronEdge-Detection-Time-Msto track latency impact