Rate Limits

The Monogoto API enforces rate limits to ensure fair usage and platform stability. Understanding rate limits — and building integrations that respect them — is essential for running reliable production services at scale.

Overview

Two independent rate-limiting systems are in effect:

Scope	Applied to	Keyed by
API rate limit	All endpoints except `/v1/auth/*`	Authenticated User ID
Auth rate limit	`/v1/auth/token` and `/v1/auth/refresh`	Client IP address

This means authentication failures and general API failures count against separate budgets. Exhausting your API rate limit will not lock you out of refreshing your token, and vice versa.

Rate Limit Headers

Every API response includes headers that tell you exactly where you stand in the current window:

Header	Type	Description
`X-RateLimit-Limit`	integer	Total requests allowed in the current window
`X-RateLimit-Remaining`	integer	Requests remaining before you hit the limit
`Retry-After`	integer	Seconds to wait before retrying — only present on `429` responses

Example Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 843
Content-Type: application/json

Check these headers proactively in your integration. If X-RateLimit-Remaining drops near zero, slow down before you receive a 429.

Rate Limit Tiers

Standard API Endpoints

Plan	Requests / minute	Requests / hour
Standard	60	1,000
Business	300	10,000
Enterprise	Custom	Custom

Contact support@monogoto.io to discuss volume requirements and enterprise limits.

Authentication Endpoints

The following endpoints are rate-limited independently, keyed by the client IP address, regardless of account plan:

Endpoint	Limit
`POST /v1/auth/token`	5 requests / 60 seconds per IP
`POST /v1/auth/refresh`	5 requests / 60 seconds per IP

Note: This means login/refresh calls from the same server will share an IP-level budget. If you run multiple worker processes, coordinate so that only one performs token management at a time.

Handling 429 Too Many Requests

When you exceed the rate limit, the API returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 47
Content-Type: application/json

{
  "status_code": 429,
  "message": "Rate limit exceeded. Retry after 47 seconds.",
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Recommended Retry Strategy

Read the Retry-After response header (value in seconds)
Wait at least that long before retrying
Apply exponential backoff with jitter if the 429 persists across retries

async function withRateLimitRetry(fn, maxAttempts = 4) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const res = await fn();

    if (res.status !== 429) return res;

    if (attempt === maxAttempts - 1) {
      throw new Error(`Rate limit exceeded after ${maxAttempts} attempts`);
    }

    const retryAfter = parseInt(res.headers.get('Retry-After') ?? '60', 10);
    // Exponential backoff: retryAfter * 2^attempt + random jitter (0–1s)
    const delay = retryAfter * 1000 * Math.pow(2, attempt) + Math.random() * 1000;
    console.warn(`Rate limited. Retrying in ${(delay / 1000).toFixed(1)}s (attempt ${attempt + 1}/${maxAttempts})`);
    await new Promise(r => setTimeout(r, delay));
  }
}





def with_rate_limit_retry(fn, max_attempts=4):
    """
    Calls fn() and retries on 429 using Retry-After + exponential backoff.
    fn must return a requests.Response object.
    """
    for attempt in range(max_attempts):
        resp = fn()
        if resp.status_code != 429:
            return resp

        if attempt == max_attempts - 1:
            raise RuntimeError(f"Rate limit exceeded after {max_attempts} attempts")

        retry_after = int(resp.headers.get("Retry-After", 60))
        delay = retry_after * (2 ** attempt) + random.uniform(0, 1)
        print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_attempts})")
        time.sleep(delay)

Proactive Rate Limit Tracking

Rather than waiting for a 429, track the X-RateLimit-Remaining header on every response and throttle your requests when the budget runs low:

class RateLimitAwareClient {
  constructor(baseUrl, accessToken) {
    this.baseUrl = baseUrl;
    this.accessToken = accessToken;
    this.remaining = Infinity;
  }

  async fetch(path, options = {}) {
    // If budget is critically low, pause before sending
    if (this.remaining < 5) {
      console.warn(`Rate limit budget low (${this.remaining} remaining). Pausing 2s.`);
      await new Promise(r => setTimeout(r, 2000));
    }

    const res = await fetch(`${this.baseUrl}${path}`, {
      ...options,
      headers: {
        Authorization: `Bearer ${this.accessToken}`,
        ...options.headers,
      },
    });

    // Update budget from response headers
    const remaining = res.headers.get('X-RateLimit-Remaining');
    if (remaining !== null) this.remaining = parseInt(remaining, 10);

    return res;
  }
}

Bulk Operations

If you need to operate on many SIM cards at once, prefer bulk endpoints over individual per-resource calls. A single bulk request counts as one request against your rate limit, regardless of how many resources it modifies.

Available bulk operations are listed under the Things tag in the API Reference. Look for endpoints that accept an array of ICCIDs in the request body.

Example: Instead of 500 individual calls to activate SIM cards, a single bulk activation request uses 1 rate limit unit and completes faster due to reduced round-trip overhead.

Avoiding Rate Limit Issues

Spread requests over time. Instead of firing all requests at once, use a queue with a configurable throughput ceiling. Libraries like p-limit (Node.js) or asyncio.Semaphore (Python) make this straightforward.

Cache responses where possible. Static or slowly-changing data (rate plans, SIM profiles, tag lists) can be cached locally for seconds or minutes, dramatically reducing your request volume.

Use webhooks for state changes. Polling an endpoint every few seconds to detect a SIM status change wastes your rate limit budget. Where Monogoto offers webhook or event notifications, prefer those over polling.

Filter at the API level. Use query parameters to filter, sort, and paginate responses so you fetch only the data you need, rather than fetching everything and filtering client-side.

Error Reference — Full 429 error response format and general error handling patterns
Authentication — Auth endpoint rate limits and the refresh strategy to stay within them