Rate Limits

The Monogoto API enforces rate limits to ensure fair usage and platform stability. Understanding rate limits — and building integrations that respect them — is essential for running reliable production services at scale.


Overview

Two independent rate-limiting systems are in effect:

Scope Applied to Keyed by
API rate limit All endpoints except /v1/auth/* Authenticated User ID
Auth rate limit /v1/auth/token and /v1/auth/refresh Client IP address

This means authentication failures and general API failures count against separate budgets. Exhausting your API rate limit will not lock you out of refreshing your token, and vice versa.


Rate Limit Headers

Every API response includes headers that tell you exactly where you stand in the current window:

Header Type Description
X-RateLimit-Limit integer Total requests allowed in the current window
X-RateLimit-Remaining integer Requests remaining before you hit the limit
Retry-After integer Seconds to wait before retrying — only present on 429 responses

Example Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 843
Content-Type: application/json

Check these headers proactively in your integration. If X-RateLimit-Remaining drops near zero, slow down before you receive a 429.


Rate Limit Tiers

Standard API Endpoints

Plan Requests / minute Requests / hour
Standard 60 1,000
Business 300 10,000
Enterprise Custom Custom

Contact support@monogoto.io to discuss volume requirements and enterprise limits.

Authentication Endpoints

The following endpoints are rate-limited independently, keyed by the client IP address, regardless of account plan:

Endpoint Limit
POST /v1/auth/token 5 requests / 60 seconds per IP
POST /v1/auth/refresh 5 requests / 60 seconds per IP

Note: This means login/refresh calls from the same server will share an IP-level budget. If you run multiple worker processes, coordinate so that only one performs token management at a time.


Handling 429 Too Many Requests

When you exceed the rate limit, the API returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 47
Content-Type: application/json
{
  "status_code": 429,
  "message": "Rate limit exceeded. Retry after 47 seconds.",
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
  1. Read the Retry-After response header (value in seconds)
  2. Wait at least that long before retrying
  3. Apply exponential backoff with jitter if the 429 persists across retries
async function withRateLimitRetry(fn, maxAttempts = 4) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const res = await fn();

    if (res.status !== 429) return res;

    if (attempt === maxAttempts - 1) {
      throw new Error(`Rate limit exceeded after ${maxAttempts} attempts`);
    }

    const retryAfter = parseInt(res.headers.get('Retry-After') ?? '60', 10);
    // Exponential backoff: retryAfter * 2^attempt + random jitter (0–1s)
    const delay = retryAfter * 1000 * Math.pow(2, attempt) + Math.random() * 1000;
    console.warn(`Rate limited. Retrying in ${(delay / 1000).toFixed(1)}s (attempt ${attempt + 1}/${maxAttempts})`);
    await new Promise(r => setTimeout(r, delay));
  }
}

Proactive Rate Limit Tracking

Rather than waiting for a 429, track the X-RateLimit-Remaining header on every response and throttle your requests when the budget runs low:

class RateLimitAwareClient {
  constructor(baseUrl, accessToken) {
    this.baseUrl = baseUrl;
    this.accessToken = accessToken;
    this.remaining = Infinity;
  }

  async fetch(path, options = {}) {
    // If budget is critically low, pause before sending
    if (this.remaining < 5) {
      console.warn(`Rate limit budget low (${this.remaining} remaining). Pausing 2s.`);
      await new Promise(r => setTimeout(r, 2000));
    }

    const res = await fetch(`${this.baseUrl}${path}`, {
      ...options,
      headers: {
        Authorization: `Bearer ${this.accessToken}`,
        ...options.headers,
      },
    });

    // Update budget from response headers
    const remaining = res.headers.get('X-RateLimit-Remaining');
    if (remaining !== null) this.remaining = parseInt(remaining, 10);

    return res;
  }
}

Bulk Operations

If you need to operate on many SIM cards at once, prefer bulk endpoints over individual per-resource calls. A single bulk request counts as one request against your rate limit, regardless of how many resources it modifies.

Available bulk operations are listed under the Things tag in the API Reference. Look for endpoints that accept an array of ICCIDs in the request body.

Example: Instead of 500 individual calls to activate SIM cards, a single bulk activation request uses 1 rate limit unit and completes faster due to reduced round-trip overhead.


Avoiding Rate Limit Issues

Spread requests over time. Instead of firing all requests at once, use a queue with a configurable throughput ceiling. Libraries like p-limit (Node.js) or asyncio.Semaphore (Python) make this straightforward.

Cache responses where possible. Static or slowly-changing data (rate plans, SIM profiles, tag lists) can be cached locally for seconds or minutes, dramatically reducing your request volume.

Use webhooks for state changes. Polling an endpoint every few seconds to detect a SIM status change wastes your rate limit budget. Where Monogoto offers webhook or event notifications, prefer those over polling.

Filter at the API level. Use query parameters to filter, sort, and paginate responses so you fetch only the data you need, rather than fetching everything and filtering client-side.


  • Error Reference — Full 429 error response format and general error handling patterns
  • Authentication — Auth endpoint rate limits and the refresh strategy to stay within them