An Exhaustive Analysis of the OpenAI Codex "Selected Model is at Capacity" Error

Disclaimer: This report is for informational purposes only and does not constitute professional software engineering, billing, or financial advice.

Executive Summary

In direct response to your query regarding the OpenAI Codex "Selected model is at capacity. Please try a different model" error, this summary synthesizes its origins, its destructive impact on workflows, and the prevailing online community discourse. This systemic interruption began manifesting as a graphical glitch in mid-February 2026 before escalating to a hard operational block. The error is universally despised because the agent thread breaks mid-execution without gracefully pausing, permanently dropping the conversational context accumulated over hours of work. Consequently, online community discussions are overwhelmingly dominated by financial frustration—as users experience massive quota drains without successful code generation—and the active exchange of mitigation strategies.

The core elements of this phenomenon include:

Initial Emergence: The earliest reported instances of the "Selected model is at capacity" error emerged in mid-February 2026 as a graphical interface glitch, but escalated into hard operational blocks by late March 2026.
Peak Disruptions: Massive, documented service degradations occurred throughout April and culminated in a major, acknowledged outage spanning June 16 to June 18, 2026.
Core User Frustration: The error destroys active development workflows by abruptly dropping context threads, forcing users to manually restart complex, multi-hour coding sessions.
Financial Discrepancy: Developers utilizing premium tiers (such as the $200/month Pro plan) report that server-side capacity rejections still incorrectly consume their limited account quotas.
Community Workarounds: Users have mitigated these interruptions by manually downgrading reasoning levels (e.g., from xHigh to High), utilizing the simple "continue" text prompt, or purchasing priority queue access.

The sudden appearance of the "⚠ Selected model is at capacity. Please try a different model" message has become one of the most disruptive hurdles for developers relying on AI-assisted coding tools. When a complex, multi-layered workflow is abruptly terminated by a server-side capacity limitation, the resulting loss of context and productivity can be immensely frustrating. This report provides a comprehensive, academic synthesis of when this issue began, the underlying mechanics of the error, the widespread community discourse surrounding it, and the strategies developers are employing to bypass the disruption.

The Genesis and Timeline of the Capacity Error

Understanding the origin of this error requires tracking user telemetry, GitHub issue logs, and official service status reports. The emergence of the capacity error was not a single, catastrophic event, but rather a gradual degradation of service stability that correlated with the rollout of more advanced, compute-intensive models like GPT-5.4 and GPT-5.5.

Early Anomalies: February to March 2026

The first traces of the specific phrasing—"Selected model is at capacity. Please try a different model"—can be traced back to mid-February 2026. However, its initial manifestation was highly peculiar. On February 12, 2026, users reported an issue on the OpenAI Codex GitHub repository (Issue #11635) regarding the Codex Desktop Application [cite: 1]. Users utilizing the GPT-5.3-Codex model observed a persistent banner displaying the capacity warning. Strangely, this early iteration functioned as a "ghost" error; despite the warning banner instructing users to try again later, the system continued to successfully process messages and generate code without interruption [cite: 1]. At this stage, it appeared to be a desynchronization between the graphical user interface and the backend server health checks.

The situation deteriorated rapidly by late March 2026. The "ghost" error transitioned into a hard operational block. On March 25, 2026, the community forums and platforms like Reddit saw a surge of developers reporting that their requests were being completely halted. Users utilizing the GPT-5.4 model, specifically on the higher reasoning effort settings (such as xHigh), reported receiving the error immediately upon their first prompt of the day, completely unrelated to their personal account usage limits [cite: 2].

The Escalation: April 2026

By April 2026, the frequency of the error reached a critical mass, severely impacting professional developers. On April 23 and 24, users on the OpenAI Developer Community forums reported that long-running workflows—sometimes executing for hours—were stopping suddenly with no prior warning [cite: 3]. The error message was often buried within large outputs or manifested as stream disconnections [cite: 3].

Simultaneously, GitHub Issue #19583 was opened on April 25, 2026, detailing a session where a developer used GPT-5.5 xHigh to test an application interface and log errors. The agent successfully operated for over 3 hours before abruptly terminating with the capacity error, rendering the user unable to continue with the selected model [cite: 4, 5]. At this point, the community consensus shifted: this was no longer a localized glitch, but a systemic bottleneck in OpenAI's infrastructure failing to support the compute demands of sustained, autonomous agent workflows.

The June 2026 Outage Crisis

The instability peaked in the middle of June 2026. Between June 16 and June 18, OpenAI's official status API formally acknowledged the issue, noting "Partial System Degradation" and explicitly listing the "Codex 'Selected Model is at Capacity' Error" as a recognized incident [cite: 6, 7, 8]. Third-party uptime monitoring services, such as Pagerly and OpenAI's own public status API, logged the outage duration at over three hours on June 16 [cite: 9, 10].

Despite official reports indicating that mitigations were applied and the impacted services had fully recovered by mid-day on June 18 [cite: 8, 9], community feedback contradicted this. On June 24 and 25, users continued to flood GitHub (e.g., Issue #30008) reporting that both the Codex Desktop App and the Command Line Interface (CLI—a text-based way to interact with the software) were still repeatedly throwing the capacity error [cite: 11].

Architectural Flaws and Workflow Disruption

To understand why this specific error is so infuriating for developers, one must examine the architecture of how AI coding agents operate and where the failure occurs within the pipeline.

The Catastrophic Loss of Context

AI coding agents like Codex are not simple question-and-answer bots; they build "state" or "context" over time. When a developer asks Codex to analyze a large repository, propose architectural changes, and execute them, the software bundles massive amounts of data—specifically, utilizing the immense context window of 400,000 tokens for Codex (while the standard API allows up to 1 million tokens) [cite: 12, 13]—encompassing entire worktrees, file systems, and previous conversational history into a payload (the packaged data transmitted over the network to the server).

When the server rejects this payload due to capacity limits, the Codex application handles the rejection poorly. According to detailed bug reports (such as Issue #22277), the agent thread breaks mid-execution [cite: 14]. Because the software does not gracefully pause, the context accumulated by the agent is permanently dropped and the workflow execution state becomes corrupted [cite: 14]. For a developer who has spent hours guiding the AI to isolate a bug or map a codebase, this means their work is entirely erased. They must begin the process again from scratch, manually re-uploading files and re-establishing the conversational context.

The Lack of Pre-Flight Checks

A major point of contention within the engineering community is the lack of "pre-flight checks" in the Codex application suite. In traditional cloud architecture, a client application will ping a server via an API (Application Programming Interface, the software bridge allowing the client and server to communicate) to verify that it has the available bandwidth or compute power to process a massive request before actually sending the heavy payload.

Codex does not appear to utilize this standard practice. Users have noted that the application fails to check server health prior to initiating heavy tasks [cite: 14]. Consequently, the agent runs blindly forward until it collides with the server's load balancer (a system that distributes incoming network traffic across multiple servers), resulting in mid-pipeline crashes [cite: 14]. Developers have heavily lobbied OpenAI to implement a backoff algorithm (an error-handling strategy that exponentially delays retries, much like calling a busy restaurant and waiting longer between each subsequent redial to avoid overwhelming the line). This algorithm is directly relevant to stabilizing the Codex application because it would prevent the client from repeatedly hammering an overloaded server, allowing the backend time to recover while gracefully holding the user's context in a paused state until bandwidth becomes available [cite: 14, 15].

The Financial Impact and Quota Controversies

Perhaps the most universally criticized aspect of the "Selected model is at capacity" error is how it interacts with user billing and token quotas. OpenAI utilizes token-based pricing and strict usage limits to govern how much compute power a single user can consume.

Paying for Server Failures

For enterprise and power users, access to advanced models requires substantial financial investment, such as the $200 per month ChatGPT Pro subscription [cite: 14, 16]. These premium tiers come with strict usage limits. For instance, the Plus tier at $20/month permits an estimated 15-80 Codex messages per 5-hour window on GPT-5.5. The Pro 5x tier at $100/month permits 75-400 messages, and the Pro 20x tier at $200/month permits 300-1,600 messages per 5-hour window [cite: 16, 17, 18]. API users are additionally charged $5 per million input tokens and $30 per million output tokens on standard endpoints, scaling up to $30 and $180 respectively for Pro processing [cite: 12, 19].

Logically, if a server is too busy to process a request and returns an error (akin to a 503 Service Unavailable HTTP status), the user should not be charged for that transaction. However, telemetry from the Codex community indicates a severe bug in the billing logic. When the agents send massive context payloads and the server rejects the completion due to capacity, the usage limits are still deducted from the user's Pro account [cite: 11, 14].

This has resulted in devastating quota drain. Users report losing their expensive limits for zero actual output [cite: 14]. In some instances, developers noted that a single, looping capacity error burned through 30% of their 5-hour token quota before they could manually terminate the process [cite: 20]. In severe cases, users reported that the bug consumed 25% of their entire weekly quota in just one day [cite: 11]. From the user's perspective, this creates an infuriating paradox: they are financially penalized for OpenAI's infrastructural shortcomings.

The Diagnostic Ambiguity

Adding to the frustration is the extreme vagueness of the error message itself. When a developer receives the alert, it is impossible to determine the true cause of the blockage. The community has highlighted the need for a clear distinction between account exhaustion and server saturation [cite: 15].

When the error occurs, the user-facing interface often still displays that the developer has plenty of remaining usage in their 5-hour and weekly allowances [cite: 15]. For developers using the Codex CLI for governed, review-heavy engineering work, the generic message creates operational ambiguity [cite: 21, 22]. Users are left guessing whether they need to switch to an entirely different model family, lower their requested reasoning effort, check their personal billing limits, or simply wait for a global incident mitigation [cite: 21, 22].

Refunds and Token Reimbursements

The most immediate, logical follow-up question for developers experiencing this massive quota drain is: How do I get my money or tokens back? Unfortunately, the answer has incited further community outrage. OpenAI does not currently offer an automated token-refund mechanism for transactions that end in a capacity rejection [cite: 23, 24].

Users must manually contact billing support to reclaim erroneously consumed quotas, but developers report these requests are frequently denied or left unaddressed for days. Online forums show developers explicitly demanding token refunds for "aborted runs" or "unaccepted code" due to the capacity drops [cite: 24, 25]. Because OpenAI does not actively reimburse these lost tokens, a widespread community consensus has emerged advising developers to "vote with their wallets" by either demanding a full subscription refund or migrating to competitors like Anthropic's Claude, where users report greater server stability and transparent context management [cite: 25].

Community Suspicions and the "Acceleration Mode" Debate

When a proprietary, closed-source system begins failing systematically, the user base often fills the communication vacuum with speculation. The sudden, aggressive throttling of models like GPT-5.4 and GPT-5.5 has led to extensive debate regarding OpenAI's resource allocation strategies.

Shadow Downgrades and Inference Costs

Some developers suspect that the capacity errors are a mechanism to force users onto less capable, cheaper models. The error text explicitly commands the user to "try a different model." Users have noted that when they switch from GPT-5.4 xHigh to a lower reasoning tier, the blockage magically disappears [cite: 2, 5].

Within engineering forums, there is speculation that during periods of high computational demand, OpenAI dynamically reroutes requests to lower-precision serving pools (such as transitioning from high-precision FP16 calculations to more efficient but less accurate INT8 or INT4 quantization) or alternative kernel paths (the low-level execution routes within the GPU computing hardware) to reduce inference costs [cite: 20]. When the high-precision infrastructure (required for xHigh reasoning) becomes bottlenecked, the system simply ejects the user rather than queuing them, essentially strong-arming developers into accepting lower-quality code generation if they wish to continue working immediately. Furthermore, the retirement of highly reliable older models (like GPT-5.3-Codex) from standard ChatGPT accounts has left users feeling trapped, as the suggested replacement models feel heavier, consume more tokens, and are more prone to capacity outages [cite: 26].

The "Fast Pass" Controversy

The most cynical, yet widely discussed, element of the capacity crisis revolves around OpenAI's monetization strategies. Some users have discovered that the capacity error can be bypassed entirely by utilizing specific paid features.

A prominent community discussion highlighted that when a user is locked out of GPT-5.4 xHigh due to capacity constraints, they can bypass the waitlist by enabling "acceleration mode" [cite: 27]. This system effectively allows users to pay an additional premium for priority access, leapfrogging the standard request queue [cite: 27]. Developers have likened this to purchasing a "fast pass" at an amusement park—a brutally simple economic filter that ensures those willing to pay extra avoid the capacity outages entirely [cite: 27]. This has generated significant ill will, as users already paying $200 a month feel they are being artificially throttled to extract further microtransactions.

Workarounds, Mitigation Strategies, and Alternatives

In the absence of a permanent structural fix from OpenAI, the developer community has crowdsourced various strategies to maintain their productivity and prevent total workflow collapse.

Tactics for Bypassing the Capacity Error

The community has developed several functional, though imperfect, methods to navigate around the capacity blockages. The following step-by-step guides detail the most common approaches:

1. Reasoning Effort Downgrades: The most common immediate fix is manually adjusting the cognitive load requested from the model. If a developer hits a wall using GPT-5.5 xHigh (Extended High reasoning), they can frequently resume work by dropping the setting to High or Medium [cite: 2, 21]. While this solves the connectivity issue, it compromises the quality of the output, often resulting in code that is more generic or less performant [cite: 28].

Step 1: Locate the model selector and reasoning effort dropdown in the Codex UI toolbar.
Step 2: Click the current active setting (e.g., xHigh).
Step 3: Select a lower tier, such as High or Medium.
Step 4: Re-submit the prompt to continue the workflow under the newly selected reasoning constraints.

2. The "Continue" Prompt: Curiously, the capacity block is sometimes temporary or localized to the specific automated agent request. Several users have reported that when the Codex application throws the capacity error mid-thread, manually typing the simple text prompt "continue" into the chat interface can force the system to bypass the block and resume generating code in the exact same thread [cite: 29]. This underlying mechanism works because typing "continue" frequently triggers a localized cache retrieval or forces the system to hit a different, less saturated API endpoint, bypassing the stalled primary server node [cite: 29].

Step 1: Click directly into the chat interface input box at the bottom of the interrupted thread where the error occurred.
Step 2: Manually type the exact phrase "continue".
Step 3: Press Enter or submit the prompt to bypass the blockage.

3. CLI Fallback Mechanics: For power users operating in the Command Line Interface, developers have requested and attempted to implement automated fallback scripts. Rather than failing the session outright, these scripts are designed to catch the capacity error and automatically retry the prompt with a lower reasoning tier (e.g., "xhigh unavailable — retry with high") [cite: 21, 22].

Migrating to Alternative Ecosystems

Because Codex serves as a central pillar in many modern software development workflows, its unreliability has prompted developers to explore competitor platforms and alternative client architectures [cite: 6].

To avoid the specific ecosystem bottlenecks of OpenAI's first-party applications, many developers have shifted to third-party Integrated Development Environment (IDE) extensions [cite: 6]. For example, applications like Cursor allow users to plug into multiple different foundational models. If OpenAI's API is at capacity, users can seamlessly switch the backend to Anthropic's Claude (such as the Opus 4.5 model, which developers praise for complex execution and layout design) or Google's Gemini 3 Pro, all without leaving their text editor or losing their local file context [cite: 30, 31].

Open-source alternatives like OpenCode also offer similar multi-provider flexibility, allowing users to route their requests to whichever cloud provider is currently stable or even fall back to locally hosted AI models running on the user's own hardware, completely bypassing cloud capacity limits [cite: 6, 32]. By integrating with local hosting software like Ollama or Unsloth, developers are successfully running specific local models such as the Qwen 3.5 27b Q3_XXS quantization, Qwen 3.5 35b, or OpenCode Zen's Bigpicle model entirely locally, guaranteeing absolute privacy and zero server downtime [cite: 33, 34, 35].

Comparison of Core Ecosystems

The following table contextualizes how the primary competing tools stack up against one another in light of these capacity disruptions.

Feature	OpenAI (Codex / ChatGPT)	Cursor (IDE Extension)	OpenCode (CLI / Desktop)
Functional Scope	First-party desktop, web, and CLI interface locked strictly to the OpenAI model family (GPT-5.4, GPT-5.5).	Deep IDE integration supporting multiple cloud providers (OpenAI, Anthropic Claude, Gemini, Grok).	Open-source multi-provider client supporting cloud APIs and locally hosted models via Unsloth/Ollama.
Current Price/Cost	Tiers range from Plus ($20/mo) to Pro 5x ($100/mo) and Pro 20x ($200/mo) [cite: 16].	Varied plans; allows "Bring Your Own Key" (API usage) allowing flexible dollar-for-token spending [cite: 36].	Free core app; costs vary based on exact API usage or hardware running local models (e.g., Qwen 3.5) [cite: 6].
Availability	Highly susceptible to "Selected model is at capacity" outages; vulnerable to single-point-of-failure cloud dependency.	High availability through redundancy; users simply toggle to a competitor's model if OpenAI goes down.	100% Guaranteed Uptime when connected to local models (e.g., Qwen 3.5 27b); immune to cloud capacity.
Real-World Context	Excellent proprietary logic, but severely flawed workflow retention when server limits are hit.	The current industry favorite for uninterrupted multi-model switching inside the text editor [cite: 36].	Ideal for developers requiring strict data privacy, offline functionality, or zero API rate limits [cite: 35].

Future Outlook: Server Expansion and Architectural Shifts

Looking forward, the persistence of the "Selected model is at capacity" error suggests that OpenAI's infrastructure is currently locked in a transitional phase. As autonomous agent capabilities (such as the computer use and multi-step execution seen in GPT-5.5) require exponentially more compute than simple chatbot interactions, current hardware allocations are clearly struggling to keep pace with enterprise demand.

To permanently resolve this, a dual-pronged architectural shift is anticipated. First, massive hardware expansions, potentially leveraging the next generation of GPU infrastructure, will be required to increase absolute capacity ceilings. Second, and perhaps more importantly, the industry is likely to pivot toward sophisticated "routing" architectures involving Small Language Models (SLMs). By employing highly efficient, localized models to handle basic code formatting and syntax checks, the heavy, high-precision servers (handling xHigh GPT-5.5 requests) can be strictly reserved for complex architectural reasoning. Until these load-balancing innovations are fully realized and deployed, the capacity error will likely remain an intermittent reality for power users.

Conclusion

The "Selected model is at capacity" error is far more than a minor server hiccup; it represents a fundamental clash between the immense computational demands of modern AI coding agents and the physical limitations of current server infrastructure. What began in early 2026 as a graphical glitch has evolved into a daily operational hazard for software engineers.

The frustration expressed by the community is entirely justified. When a tool designed to exponentially increase developer productivity instead becomes a source of lost context, corrupted workflows, and wasted financial resources, trust in the platform erodes. Until OpenAI can implement resilient software architectures—such as pre-flight capacity checks, graceful workflow pausing, and accurate token reimbursement—developers must rely on defensive engineering tactics, model downgrading, and multi-provider IDEs to ensure their workflows remain uninterrupted.

Sources:

An Exhaustive Analysis of the OpenAI Codex 'Selected Model is at Capacity' Error"

An Exhaustive Analysis of the OpenAI Codex "Selected Model is at Capacity" Error

Executive Summary