Rate Limits
The Monogoto API enforces rate limits to ensure fair usage and platform stability. Understanding rate limits — and building integrations that respect them — is essential for running reliable production services at scale.
Overview
Two independent rate-limiting systems are in effect:
| Scope | Applied to | Keyed by |
|---|---|---|
| API rate limit | All endpoints except /v1/auth/* |
Authenticated User ID |
| Auth rate limit | /v1/auth/token and /v1/auth/refresh |
Client IP address |
This means authentication failures and general API failures count against separate budgets. Exhausting your API rate limit will not lock you out of refreshing your token, and vice versa.
Rate Limit Headers
Every API response includes headers that tell you exactly where you stand in the current window:
| Header | Type | Description |
|---|---|---|
X-RateLimit-Limit |
integer | Total requests allowed in the current window |
X-RateLimit-Remaining |
integer | Requests remaining before you hit the limit |
Retry-After |
integer | Seconds to wait before retrying — only present on 429 responses |
Example Response Headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 843
Content-Type: application/json
Check these headers proactively in your integration. If X-RateLimit-Remaining drops near zero, slow down before you receive a 429.
Rate Limit Tiers
Standard API Endpoints
| Plan | Requests / minute | Requests / hour |
|---|---|---|
| Standard | 60 | 1,000 |
| Business | 300 | 10,000 |
| Enterprise | Custom | Custom |
Contact support@monogoto.io to discuss volume requirements and enterprise limits.
Authentication Endpoints
The following endpoints are rate-limited independently, keyed by the client IP address, regardless of account plan:
| Endpoint | Limit |
|---|---|
POST /v1/auth/token |
5 requests / 60 seconds per IP |
POST /v1/auth/refresh |
5 requests / 60 seconds per IP |
Note: This means login/refresh calls from the same server will share an IP-level budget. If you run multiple worker processes, coordinate so that only one performs token management at a time.
Handling 429 Too Many Requests
When you exceed the rate limit, the API returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 47
Content-Type: application/json
{
"status_code": 429,
"message": "Rate limit exceeded. Retry after 47 seconds.",
"request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Recommended Retry Strategy
- Read the
Retry-Afterresponse header (value in seconds) - Wait at least that long before retrying
- Apply exponential backoff with jitter if the
429persists across retries
async function withRateLimitRetry(fn, maxAttempts = 4) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const res = await fn();
if (res.status !== 429) return res;
if (attempt === maxAttempts - 1) {
throw new Error(`Rate limit exceeded after ${maxAttempts} attempts`);
}
const retryAfter = parseInt(res.headers.get('Retry-After') ?? '60', 10);
// Exponential backoff: retryAfter * 2^attempt + random jitter (0–1s)
const delay = retryAfter * 1000 * Math.pow(2, attempt) + Math.random() * 1000;
console.warn(`Rate limited. Retrying in ${(delay / 1000).toFixed(1)}s (attempt ${attempt + 1}/${maxAttempts})`);
await new Promise(r => setTimeout(r, delay));
}
}
def with_rate_limit_retry(fn, max_attempts=4):
"""
Calls fn() and retries on 429 using Retry-After + exponential backoff.
fn must return a requests.Response object.
"""
for attempt in range(max_attempts):
resp = fn()
if resp.status_code != 429:
return resp
if attempt == max_attempts - 1:
raise RuntimeError(f"Rate limit exceeded after {max_attempts} attempts")
retry_after = int(resp.headers.get("Retry-After", 60))
delay = retry_after * (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_attempts})")
time.sleep(delay)Proactive Rate Limit Tracking
Rather than waiting for a 429, track the X-RateLimit-Remaining header on every response and throttle your requests when the budget runs low:
class RateLimitAwareClient {
constructor(baseUrl, accessToken) {
this.baseUrl = baseUrl;
this.accessToken = accessToken;
this.remaining = Infinity;
}
async fetch(path, options = {}) {
// If budget is critically low, pause before sending
if (this.remaining < 5) {
console.warn(`Rate limit budget low (${this.remaining} remaining). Pausing 2s.`);
await new Promise(r => setTimeout(r, 2000));
}
const res = await fetch(`${this.baseUrl}${path}`, {
...options,
headers: {
Authorization: `Bearer ${this.accessToken}`,
...options.headers,
},
});
// Update budget from response headers
const remaining = res.headers.get('X-RateLimit-Remaining');
if (remaining !== null) this.remaining = parseInt(remaining, 10);
return res;
}
}
Bulk Operations
If you need to operate on many SIM cards at once, prefer bulk endpoints over individual per-resource calls. A single bulk request counts as one request against your rate limit, regardless of how many resources it modifies.
Available bulk operations are listed under the Things tag in the API Reference. Look for endpoints that accept an array of ICCIDs in the request body.
Example: Instead of 500 individual calls to activate SIM cards, a single bulk activation request uses 1 rate limit unit and completes faster due to reduced round-trip overhead.
Avoiding Rate Limit Issues
Spread requests over time. Instead of firing all requests at once, use a queue with a configurable throughput ceiling. Libraries like p-limit (Node.js) or asyncio.Semaphore (Python) make this straightforward.
Cache responses where possible. Static or slowly-changing data (rate plans, SIM profiles, tag lists) can be cached locally for seconds or minutes, dramatically reducing your request volume.
Use webhooks for state changes. Polling an endpoint every few seconds to detect a SIM status change wastes your rate limit budget. Where Monogoto offers webhook or event notifications, prefer those over polling.
Filter at the API level. Use query parameters to filter, sort, and paginate responses so you fetch only the data you need, rather than fetching everything and filtering client-side.
Related Guides
- Error Reference — Full
429error response format and general error handling patterns - Authentication — Auth endpoint rate limits and the refresh strategy to stay within them