Using an LLM API in Your App
You've used a chat assistant in a browser, and now you want one inside your own app — to summarize, to answer, to draft. Somewhere along the way the phrase "call the LLM API" showed up, and it sounds like it needs a research lab and a GPU farm. It doesn't.
Here's the part nobody says out loud: a hosted language model is reached the exact same way as any other web service. You send an HTTP request, you get a response back. If you've ever called a weather API or a payments API, you already know 90% of this. The model is the unusual part; the calling is ordinary. This guide installs that mental model first, then walks you through what actually costs money, and finally the habits that keep a real feature from embarrassing you in production.
⏭️ New to the idea of an API at all? Read What an API Actually Is first — this guide assumes you're comfortable with the idea of one program asking another for something over HTTP.
How to read this
- Want it to finally make sense? Read in order. We start with the request/response shape (it really is just an API call), then cover tokens and cost so the bill never surprises you, then the reliability habits that separate a demo from a feature.
- Already calling the model and hitting walls? Jump to Phase 3: Building Reliably — non-determinism, hallucinations, timeouts, retries, and asking for structured output.
The phases
- It's Just an API Call — an LLM API is a normal HTTP request. You POST a list of messages (system, user), you get back generated text. The annotated request and response, provider-neutral.
- Tokens, Context & Cost — what a token is, the context window (the model's limited short-term memory), why you pay per token, and why long conversation histories cost more and can overflow. Plus streaming for responsiveness.
- Building Reliably — the model is non-deterministic, it can be confidently wrong, it can be slow, and it can fail. How to handle errors, timeouts, and retries; how to ask for structured output; and how not to ship a foot-gun.
This guide deliberately stops at how to call the thing well. Getting the model to actually do what you want — writing the instructions — is its own craft, covered in Prompt Engineering, Honestly.