I built an AI-powered fantasy football assistant with a natural language chat interface.

Not separate tools you have to choose between.

A conversational system where you ask questions in plain English, and the AI automatically routes to the right tool, fetches the data it needs, and gives you actionable analysis.

Ask "Who should I start this week?" and it:

  • Detects you want lineup optimization

  • Pulls your current roster from ESPN's API

  • Fetches your opponent's lineup for context

  • Analyzes matchups and projections

  • Returns specific recommendations with reasoning

Ask "What trades can I make?" and it:

  • Switches to trade analysis mode

  • Scans all team rosters in your league

  • Identifies teams with complementary needs

  • Generates realistic multi-player trade proposals

The key innovation was removing the friction.

No dropdown menus, no selecting which feature you want.

Just chat, and the system figures out what you need.

The Architecture

This required three main components working together:

1. ESPN API Reverse Engineering

ESPN doesn't have a public API.

I reverse engineered their undocumented endpoints to pull live data.

2. AI Agent System with Tool Routing

The chat interface doesn't just pass your message to GPT-4.

It intelligently routes to specialized tools based on intent:

  • Lineup optimization

  • Trade analysis

  • Waiver wire recommendations

  • Player comparisons

  • Deep research with web search

Each function has its own data requirements and prompting strategy.

3. Rate Limiting Infrastructure

This is deployed where others can access it.

I implemented a $10/hour spending cap to prevent runaway OpenAI costs:

  • Pre-request cost estimation

  • Real-time usage tracking

  • Automatic request blocking when approaching limit

  • Per-feature cost breakdown in the dashboard

Part 1: How to Reverse Engineer Any API

This process works for ESPN, DoorDash, Netflix, LinkedIn, whatever.

The steps are always the same.

Step 1: Open Developer Tools

Press F12 or right-click and hit "Inspect", then go to the Network tab.

My process:

  1. Clear out old requests

  2. Filter to "Fetch/XHR" requests

  3. Navigate to the page with data you want

  4. Watch what requests show up

For ESPN, I found:

lm-api-reads.fantasy.espn.com/apis/v3/games/ffl/seasons/2025/segments/0/leagues/{LEAGUE_ID}

Step 2: Analyze the Request

Click on the request to see everything.

Headers show authentication methods, required custom headers, browser info.

URL Parameters reveal filtering, pagination, sorting options.

Request Body (for POST) shows expected data format.

Step 3: Extract Authentication

Most APIs require authentication.

Common methods:

Cookie-Based: Look in DevTools under Application → Cookies. Copy the values.

Token-Based: Check for Authorization headers like Authorization: Bearer <token>.

API Keys: Sometimes in URL parameters or custom headers.

Store these in environment variables, never commit to version control.

Step 4: Replicate the Request

Make the same request programmatically.

Key things:

  • Match all headers the browser sends

  • Include authentication

  • Use correct HTTP method

  • Structure request bodies exactly as expected

I use Python's requests library with a Session to maintain cookies, add custom headers, and make authenticated GET requests.

Step 5: Parse the Response

APIs use nested data structures and numeric IDs.

ESPN uses numeric IDs everywhere:

  • Position ID 2 = QB

  • Team ID 1 = Atlanta

  • Lineup slot 20 = Bench

I built mapping dictionaries by:

  • Parsing league settings dynamically

  • Hardcoding static values (NFL teams don't change)

  • Cross-referencing multiple endpoints

Step 6: Handle Production Concerns

This is where the real work begins.

Rate Limiting: I limit to 30 requests/minute for ESPN.

Caching: League settings cached all season, projections for 15 minutes.

Error Handling: Retry logic with exponential backoff, graceful failures.

Response Validation: Check fields exist before accessing, handle partial data during live games.

Part 2: The Chat Interface Architecture

The hard part wasn't reverse engineering ESPN's API.

It was building a chat system that intelligently routes to the right tools.

The Problem with Tool Selection

You can't just pass every message to GPT-4 and let it figure out what to do.

That's expensive and slow.

You need intelligent pre-routing based on user intent.

How I Built the Router

The chat handler analyzes the incoming message for keywords and context:

User: "Who should I start this week?"
→ Detects: "start" + "week" 
→ Routes to: Lineup Optimizer
→ Data needed: My roster + opponent roster + projections
User: "What trades can I make for a running back?"
→ Detects: "trade" + position mention
→ Routes to: Trade Analyzer
→ Data needed: All league rosters + positional needs
User: "Should I pick up Player X?"
→ Detects: "pick up" + player name
→ Routes to: Waiver Wire Analyzer
→ Data needed: Available players + my team needs

Each route triggers different ESPN API calls to gather only the necessary data.

The Data Collection Phase

Once intent is detected, the system makes multiple ESPN API calls:

For lineup optimization:

  1. Get current NFL week

  2. Fetch my team's roster with projections

  3. Get opponent's roster and projections

  4. Pull injury statuses

  5. Retrieve matchup context

For trade analysis:

  1. Fetch all team rosters in the league

  2. Get season-long projections

  3. Calculate positional needs per team

  4. Pull team records and playoff positioning

This happens in parallel where possible to minimize latency.

The AI Layer

After data collection, I transform everything into clean, structured prompts.

Raw ESPN data is a mess for LLMs:

Nested objects, numeric IDs, inconsistent fields.

I built a transformation layer that:

  • Flattens nested structures

  • Translates numeric IDs to readable names

  • Formats specifically for context windows

  • Strips unnecessary fields to save tokens

The AI receives formatted data like:

Your Roster:
- Josh Allen (QB, BUF) - Proj: 22.5 pts - Status: Healthy - vs MIA
- Christian McCaffrey (RB, SF) - Proj: 18.3 pts - Status: Questionable - vs LAR

Opponent's Roster:
- Patrick Mahomes (QB, KC) - Proj: 24.1 pts - Status: Healthy - vs DEN

Instead of raw JSON with nested objects.

Context Window Optimization

Fitting everything into the context window required careful engineering.

For a start/bench decision, the agent needs:

  • My full roster (15 players with stats)

  • Opponent's roster (15 players with stats)

  • League scoring settings

  • Current lineup configuration

  • Matchup analysis

That's 3000+ tokens before the AI responds.

I optimized by:

  • Removing redundant information

  • Abbreviating field names

  • Including only relevant stats

  • Using token-efficient formats

This made the difference between agents that work and agents that hit limits.

The Specialized Agents

Different questions need different analysis approaches.

Lineup Optimizer: Considers variance, not just projections. A high-variance player might be better if you're projected to lose.

Trade Constructor: Scans all league teams to find complementary needs. Generates specific 1-for-1, 2-for-1, or 2-for-2 proposals with reasoning for both sides.

Waiver Wire Analyzer: Filters to only available players, ranks by upside and fit, focuses on weak positions.

Deep Research Agent: Makes web searches for injury reports, weather, defensive matchups. Slower (30-60 seconds) but gives analysis you can't get from projections alone.

Each agent has custom system prompts and output formats.

Part 3: The Production Engineering

Building a demo is easy.

Making it work reliably in production is hard.

Problem 1: Rate Limiting for Cost Control

This is deployed publicly, so I needed protection against runaway costs.

I implemented a $10/hour spending cap:

Before each request:

  • Estimate token usage based on data size

  • Check current hourly spending from session state

  • Block request if it would exceed limit

  • Return 429 error with clear message

After each request:

  • Record actual token usage

  • Calculate real cost (input + output tokens × pricing)

  • Update session state with timestamp

  • Clean up usage records older than 1 hour

This prevents $500 surprise bills if someone spams the API.

Problem 2: Handling ESPN's Data Inconsistencies

ESPN's API sometimes returns partial data, especially during live games.

I had to handle:

  • Missing player projections

  • Incomplete roster entries

  • Null injury statuses

  • Different league scoring formats

  • Edge cases like bye weeks

The wrapper validates every field before accessing it and fails gracefully when data is missing.

Problem 3: Making It Fast Enough to Be Usable

Nobody wants to wait 30 seconds for a lineup decision.

Optimizations I made:

  • Aggressive caching (settings cached all season, projections for 15 minutes)

  • Parallel ESPN API calls where possible

  • Request batching using ESPN's "view" system

  • Context window optimization to reduce AI processing time

Most decisions now feel instant despite making multiple API calls and AI requests.

Problem 4: Maintaining Conversation Context

The chat interface needs to remember previous messages.

I use Streamlit session state to persist:

  • Full conversation history

  • Previous tool calls

  • User preferences mentioned in chat

  • Cost tracking across the session

This lets you ask follow-up questions like "What about if I trade for a WR instead?" and the system knows you're continuing the trade analysis conversation.

The Technical Stack

Backend:

  • FastAPI for REST endpoints

  • Custom ESPN API wrapper

  • Rate limiter with cost tracking

Frontend:

  • React/Typescript

  • Real-time chat interface

  • Usage/cost dashboard

  • Quick action buttons

AI:

  • OpenAI for reasoning

  • Structured prompts per agent type

  • Context window optimization

  • Multi-agent routing system

Three Key Takeaways

  1. Every web app exposes its APIs in the browser: The Network tab shows everything. Frontend apps have to make API calls, you can see exactly what they're doing.

  2. Chat interfaces need intelligent routing: Don't just pass every message to the LLM. Pre-route based on intent, then gather only the necessary data and use specialized prompts.

  3. Production requires engineering: Rate limiting, caching, error handling, and cost control aren't nice-to-haves. They're what separates demos from systems people actually use.

Disclaimer: This post describes techniques for accessing your own data through undocumented APIs for personal projects. Use responsibly, respect rate limits, and don't access data you don't have permission to view. This is for educational purposes and personal use only.

Keep Reading

No posts found