Skip to main content

Async generation

Models that advertise supports_async: true MAY return 202 Accepted to a POST /generate request instead of 200.

The 202 response

  • Sets Location: /generate/jobs/{id} pointing at the poll endpoint.
  • Sets Retry-After: <seconds> as a polling hint.
  • Has an empty body, or a small JSON object:
    { "job_id": "abc123", "submitted_at": "2025-04-25T11:32:18Z" }

Polling

GET /generate/jobs/{id}
StatusMeaning
200 OKGeneration complete; body is the glTF document
202 AcceptedStill running; honor Retry-After
4xx / 5xxFailed; body is the standard error envelope

Servers SHOULD retain completed job results for at least 5 minutes. After expiry, polling returns 404 Not Found.

Servers MAY cancel a pending job via DELETE /generate/jobs/{id} (204 No Content).

Client requirements

Async is OPTIONAL in v1.0, but clients MUST be prepared to handle either a synchronous 200 or an asynchronous 202 from POST /generate, gated on the model's supports_async flag.

A reasonable client loop:

resp = httpx.post("/generate", json=req)
if resp.status_code == 200:
return resp # done
if resp.status_code == 202:
job_url = resp.headers["Location"]
delay = int(resp.headers.get("Retry-After", 2))
while True:
time.sleep(delay)
poll = httpx.get(job_url)
if poll.status_code == 200:
return poll
if poll.status_code == 202:
delay = int(poll.headers.get("Retry-After", delay))
continue
raise generate_error(poll)
raise generate_error(resp)
Don't poll faster than Retry-After

Servers SHOULD return a Retry-After on every 202. The value may grow or shrink as the queue moves; clients SHOULD use the latest value rather than a fixed interval.