Anyone who has used the Poe app frequently on iOS over the last few months may have come across an annoying message far sooner than expected: a rate limit warning. For many premium users, this came as a surprise. After all, they’d paid for a subscription to get increased access to powerful AI models such as GPT-4 or Claude, only to find themselves locked out mid-session. This issue sparked a deep dive by developers, leading to a revelation about unnecessarily triggered rate limits and a full token audit that corrected the core issue.

TL;DR

Many iOS Poe users experienced premature rate limits, especially on premium plans offering access to high-tier language models. It was discovered that these limits were caused not by misuse, but by incorrect token accounting behind the scenes. The root cause stemmed from discrepancies in token usage tracking mechanisms. After a rigorous internal audit, the Poe development team corrected the faulty logic and eliminated what have now been dubbed “phantom limits.”

The Mystery of Sudden Rate Limits

Throughout late 2023 and early 2024, Poe app users increasingly reported rate limit messages that were seemingly out of sync with their expected usage. Many posted screenshots on social platforms and forums, baffled by being locked out of their conversations barely a few prompts in. This wasn’t just an inconvenience — for some, it rendered the service nearly unusable for large spans of time.

The symptoms included:

  • Receiving rate limit warnings unexpectedly within a few messages
  • Users on the highest-tier subscriptions getting locked out before the day was halfway through
  • Analytics on the user side showing far less interaction than the cap supposedly allowed

At first, customer support offered common-sense suggestions: restart the app, check for updates, or log out and back in again. But as reports grew and some users shared detailed logs of their usage, one thing became clear — the problem wasn’t on the client side.

How Rate Limits Are Supposed to Work

To understand what went wrong, it helps to know how rate limiting in Poe typically operates. There are essentially two layers:

  1. Request Count Limits: The number of responses you can request in a given time frame (e.g., 100 requests per day for GPT-4).
  2. Token Usage Limits: The amount of data — measured in tokens — processed by the model (both input and output). Premium models like GPT-4 can have a cap such as 500K tokens per day.

While most users are familiar with the concept of daily usage caps, what was less apparent was how complex tracking token usage can be. Tokens don’t just include what the user types — they also include any system prompts, prior conversation history that gets included for context, and the response size itself.

This complexity turned out to be the Achilles’ heel of the system.

A Bug in the Machine

Investigating engineers eventually discovered that on iOS — and crucially, only on iOS — an anomaly in the token accounting system was causing large inaccurately counted token values per interaction, especially when chats involved multiple turns of back-and-forth. Sometimes, due to faulty token aggregation logic, tokens were being counted twice: once for the current message and again for previously stored system-level context.

The result?

A user might have thought they’d used only 5,000 tokens in a session, but the internal server was recording it as closer to 50,000.

Image not found in postmeta

This gradually pushed every request closer to its calculated “usage bar,” ultimately triggering the rate limit restrictions prematurely — hence the term “phantom limits.” Unsuspecting users believed they were messaging responsibly, while the backend was panicking due to an inflated token count.

Running the Token Audit

Recognizing the scale of the issue, Poe’s development and infrastructure teams initiated a full-scale audit of token traffic. This wasn’t merely a bug fix — it required combing through days of analytics, backend logs, and comparing actual token usage with recorded values.

The audit revealed several key issues:

  • Token Inflation from Context Interleaving: Older parts of the conversation were being duplicated on the server during certain reply generations, doubling the input token count.
  • Misalignment Between iOS and Web Logic: The iOS client encoded and packaged conversations differently, which caused inconsistencies when compared with the web and Android counterparts.
  • Poor Visibility of Token Impact in UI: Users weren’t adequately informed about how their messages translated into token consumption, hindering self-regulation.

Once the discrepancies were identified, engineers patched the token accounting layer to ensure tokens were tallied correctly and didn’t multiply unexpectedly. More importantly, a secondary token verification layer was added to cross-check usage before applying account-level limits.

The Rollout of Fixes — And the Rebound

Poe quietly rolled out the backend patches in waves in late March 2024. Within a week, reports of unexpected rate limits dropped by over 80%. By April, the feedback loop had stabilized, and more users were able to go full sessions without interruption.

Image not found in postmeta

In parallel, developers also introduced a more transparent display of token usage in the Poe app interface, allowing users to monitor how much they’d consumed every day relative to their plan’s limits. This empowered users to better understand what “500K tokens” really translated to in practical use — often hundreds of replies for typical-sized conversations.

User Reactions and Trust Rebuilding

The community response to the fix was largely positive, though it didn’t come without some residual frustration. Many users expressed satisfaction that the real issue was found and resolved — but some remained disappointed that such a basic flaw persisted for so long.

In response, Poe’s development team committed to stronger diagnostic tooling and more aggressive monitoring of model usage anomalies moving forward, along with:

  • A monthly transparency report for premium users, detailing any systemic incidents affecting usage
  • A new API-based debug option for users encountering limits, to submit compact logs automatically
  • More visual feedback in the UI when nearing token or request thresholds

These changes, along with the fix itself, have largely restored user confidence and cemented Poe’s commitment to reliability as more users rely on it for long-form writing, research, or customer support automation.

Lessons and Looking Ahead

The token audit saga revealed much about the challenges of operating at scale with AI models that meter resources by the token. Some of the biggest takeaways include:

  • Granular usage tracking needs cross-platform consistency
  • Internal monitoring must model real-world behavior, not just edge cases
  • User trust demands visibility into consumption

For Poe and similar platforms, such audits are likely to become cyclical processes — not just for bug hunting, but as essential tuning mechanisms as AI model pricing, compute costs, and usage evolve.

And for anyone using Poe today: if you were once haunted by rate limits that came out of nowhere, you can now breathe easier — the ghosts of phantom limits have been exorcised.

Pin It on Pinterest