Mock LLM API Guide

Create free LLM streaming endpoints that mimic OpenAI, Claude, and other AI providers. Build and test AI features without spending thousands on API calls.

Getting Started

Creating a mock LLM streaming endpoint takes less than 30 seconds. Follow these simple steps:

  1. 1

    Visit the LLM Mock Page

    Go to mockapi.dog/llm-mock. A unique 6-character code is automatically generated for your endpoint.

  2. 2

    Choose LLM Provider Profile

    Select which provider's response format to emulate:

    • OpenAI - Chat Completions API format (GPT-4, GPT-3.5)
    • Anthropic Claude - Claude streaming format
    • Generic Stream - Provider-agnostic token stream
    • Generic JSON - Simple JSON response (no streaming)
  3. 3

    Select Content Mode

    Choose how response content is generated:

    • Generated - Auto-generate LLM-like text (Chat, Technical, or Markdown style)
    • Static - Use your provided text exactly as is
    • Hybrid - Your text followed by generated continuation
  4. 4

    Configure Token Generation (Optional)

    For Generated or Hybrid modes, set minimum and maximum tokens (100-300 recommended). Generated text length will be randomly between these values. Not needed for Static mode.

  5. 5

    Complete Verification & Save

    Complete the Turnstile verification, then click "Save Mock Endpoint". Your endpoint URL is automatically copied!

    https://abc123.mockapi.dog/v1/chat/completions

That's it! Start streaming immediately

Your endpoint is ready to use. Replace your OpenAI/Claude baseURL with your mock endpoint and start testing. No authentication or API keys required.

The Cost Problem

Real LLM APIs are expensive. During development, testing, and prototyping, costs can quickly spiral out of control. Here's what you'd pay with real providers:

OpenAI GPT-4

Expensive
Input$10 / 1M tokens
Output$30 / 1M tokens

Example: Testing a chatbot with 1000 conversations (avg 500 tokens each) = $20+

Anthropic Claude

Costly
Input$8 / 1M tokens
Output$24 / 1M tokens

CI/CD Pipeline: Running tests 100 times per day = $300+/month

With MockAPI Dog: $0

Free streaming responses for development and testing. Save thousands during the development phase. Switch to real APIs only when you're ready for production.

Why Use LLM Mock API?

Save Money

Avoid spending thousands of dollars during development. Test your UI, streaming logic, and error handling without burning through API credits.

  • No API keys or billing setup required
  • Free requests during development
  • Perfect for indie developers and startups

Instant Testing

Test streaming responses, UI animations, and error states instantly. No waiting for real API calls or dealing with rate limits.

  • Configurable response speed and tokens
  • Test edge cases and error scenarios
  • Works offline - no internet required

Multiple Providers

Test your app with different LLM providers without managing multiple API keys. Switch between OpenAI, Claude, and generic formats effortlessly.

  • OpenAI-compatible endpoints
  • Anthropic Claude format support
  • Generic SSE streaming format

CI/CD Integration

Run automated tests in your CI/CD pipeline without worrying about API costs or rate limits. Test your AI features on every commit.

  • No authentication required
  • Consistent, predictable responses
  • Fast execution for quick feedback

Supported Providers

MockAPI Dog supports streaming formats for popular LLM providers. Simply set your endpoint as the baseURL in your preferred SDK.

OpenAI Format

GPT-4, GPT-3.5

Compatible with the official OpenAI SDK. Supports streaming responses in the same format as GPT-4 and GPT-3.5-turbo.

Compatible Models:
gpt-4gpt-4-turbogpt-3.5-turbogpt-4o

Anthropic Format

Claude

Compatible with the Anthropic SDK. Supports streaming responses in the same format as Claude 3 Opus, Sonnet, and Haiku.

Compatible Models:
claude-3-opusclaude-3-sonnetclaude-3-haikuclaude-2

Generic SSE Format

Universal

Standard Server-Sent Events (SSE) format. Use with any streaming client or build your own custom integration.

Use Cases:
  • Custom LLM integrations
  • Testing EventSource implementations
  • Learning streaming protocols

Content Modes

Choose how your mock LLM endpoint generates response content. Each mode offers different control over the streamed text.

Generated

Auto-generate LLM-like text in different styles. Choose from Chat (conversational tone), Technical (programming focused), or Markdown (formatted with lists and code blocks).

Best for: Realistic testing without writing custom content, UI animations, general prototyping

Static

Use your exact provided text as the response. The text streams exactly as written without any generation or modification.

Best for: Specific test scenarios, exact expected responses, edge case testing

Hybrid

Combines your provided text with auto-generated continuation. Your text streams first, followed by generated LLM-like content.

Best for: Controlled start with realistic continuation, testing partial responses

Text Styles for Generated Content

When using Generated or Hybrid modes, you can choose between Chat (conversational), Technical (programming-focused), or Markdown (includes formatting, lists, code blocks) styles.

Token Generation Settings

Fine-tune how your mock LLM endpoint generates and streams tokens to match your testing needs.

Token Count

Set the number of tokens (roughly equivalent to words) to generate. Useful for testing different response lengths.

Short Response50-100 tokens
Medium Response200-500 tokens
Long Response1000+ tokens

Streaming Speed

Control how fast tokens are streamed. Test your UI with different streaming speeds to ensure smooth animations.

Fast~50ms/token
Normal~100ms/token
Slow~200ms/token

Pro Tip

Test with different speeds to ensure your UI handles both fast and slow streaming gracefully. Real LLM APIs can vary significantly in response time.

Code Examples

Here's how to use your mock LLM endpoint with popular SDKs and libraries.

OpenAI SDK

Replace the baseURL with your mock endpoint. No API key required!

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://xyz789.mockapi.dog/llm',
  apiKey: 'dummy-api-key', // Mock endpoint doesn't check API keys
});

async function main() {
  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
}

main();

Anthropic SDK

Use with the Anthropic SDK by setting a custom baseURL.

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  baseURL: 'https://xyz789.mockapi.dog/llm',
  apiKey: 'dummy-api-key', // Mock endpoint doesn't check API keys
});

async function main() {
  const stream = await anthropic.messages.stream({
    model: 'claude-3-opus-20240229',
    max_tokens: 1024,
    messages: [{ role: 'user', content: 'Hello!' }],
  });

  for await (const chunk of stream) {
    if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
      process.stdout.write(chunk.delta.text);
    }
  }
}

main();

Generic Fetch (SSE)

Use with vanilla JavaScript/TypeScript for maximum flexibility.

async function streamResponse() {
  const response = await fetch('https://xyz789.mockapi.dog/llm/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      prompt: 'Hello, world!',
      max_tokens: 500,
    }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;
        
        try {
          const json = JSON.parse(data);
          console.log(json.content);
        } catch (e) {
          // Skip invalid JSON
        }
      }
    }
  }
}

streamResponse();

It's that simple!

Just replace the baseURL and you're ready to go. Your existing code will work without modifications.

Real-World Use Cases

Chatbot Development

Build and test chatbot UIs without spending on API calls. Test message threading, streaming animations, and error handling.

  • Test streaming message animations
  • Verify conversation threading
  • Debug UI edge cases

Testing & QA

Run automated tests and manual QA without API costs. Test different response scenarios and edge cases consistently.

  • Automated E2E tests in CI/CD
  • Consistent test data
  • Fast test execution

Learning & Tutorials

Learn AI integration without spending money. Perfect for tutorials, courses, and educational content.

  • No API key setup for students
  • Free practice
  • Safe learning environment

MVPs & Demos

Build proof-of-concepts and demos without upfront costs. Show investors and stakeholders your vision before investing in production APIs.

  • Quick prototyping
  • Investor demos
  • Validate ideas cheaply

Advanced Features

Custom Headers

Add custom response headers to test CORS, authentication flows, and other header-based logic in your LLM integration.

Configurable Delays

Simulate network latency and slow streaming speeds to test loading states and timeout handling in your application.

Error Simulation

Test error handling by simulating rate limits, authentication errors, and streaming interruptions.

No Authentication

Mock endpoints don't require API keys or authentication. Perfect for CI/CD pipelines and public demos.

Troubleshooting

Streaming not working

Ensure you're using the correct provider format and that your client supports streaming. Check that you're reading the response as a stream, not as a complete response.

// Make sure to set stream: true const stream = await openai.chat.completions.create({ stream: true, // This is required! // ... });

Response too fast/slow

Adjust the streaming speed in your endpoint configuration. Different speeds help test various network conditions and user experiences.

SDK compatibility issues

Make sure you're using a recent version of the SDK. Check the provider format matches your SDK (OpenAI SDK needs OpenAI format, Anthropic SDK needs Anthropic format).

CORS errors in browser

Mock endpoints are configured with permissive CORS headers. If you're still getting CORS errors, check your request headers and ensure you're not sending restricted headers.

Tips & Best Practices

Test with different speeds

Real LLM APIs vary in speed. Test your UI with both fast and slow streaming to ensure smooth user experience in all conditions.

Use environment variables

Store your baseURL in environment variables. Switch between mock and production APIs by changing a single variable.

// .env.development OPENAI_BASE_URL=https://xyz789.mockapi.dog/llm // .env.production OPENAI_BASE_URL=https://api.openai.com/v1

Test error scenarios

Don't just test happy paths. Use error simulation to test rate limits, network failures, and malformed responses.

LLM Development Workflow

Follow this workflow for efficient AI development:

  1. Build UI and streaming logic with mock endpoints
  2. Test thoroughly with different content modes and speeds
  3. Run automated tests in CI/CD with mock endpoints
  4. Switch to real API only for final integration testing
  5. Deploy with production API keys

Validate before production

Before switching to production APIs, validate your implementation with the real provider's API in a staging environment to catch any differences in behavior.

Glossary

LLM (Large Language Model)

AI models like GPT-4 and Claude that generate human-like text responses. Examples: OpenAI's GPT series, Anthropic's Claude, Google's Gemini.

Streaming API

An API that sends data in chunks rather than waiting for the complete response. Allows for real-time display of AI-generated text as it's being created.

Token

The basic unit of text in LLMs. Roughly equivalent to a word or word fragment. LLM pricing is typically based on token count.

SSE (Server-Sent Events)

A technology that allows servers to push data to clients in real-time. Used by LLM APIs to stream responses.

baseURL

The base address for API requests. Replace this with your mock endpoint URL to redirect requests to MockAPI Dog instead of the real provider.

Provider

Companies that offer LLM APIs, such as OpenAI (GPT), Anthropic (Claude), Google (Gemini), etc.

Ready to Start Building?

Create your first mock LLM streaming endpoint in seconds. No signup, no credit card, no hassle. Start building AI features without spending thousands on API calls.