Using an LLM API in Your App

You've used a chat assistant in a browser, and now you want one inside your own app — to summarize, to answer, to draft. Somewhere along the way the phrase "call the LLM API" showed up, and it sounds like it needs a research lab and a GPU farm. It doesn't.

Here's the part nobody says out loud: a hosted language model is reached the exact same way as any other web service. You send an HTTP request, you get a response back. If you've ever called a weather API or a payments API, you already know 90% of this. The model is the unusual part; the calling is ordinary. This guide installs that mental model first, then walks you through what actually costs money, and finally the habits that keep a real feature from embarrassing you in production.

⏭️ New to the idea of an API at all? Read What an API Actually Is first — this guide assumes you're comfortable with the idea of one program asking another for something over HTTP.

How to read this

Want it to finally make sense? Read in order. We start with the request/response shape (it really is just an API call), then cover tokens and cost so the bill never surprises you, then the reliability habits that separate a demo from a feature.
Already calling the model and hitting walls? Jump to Phase 3: Building Reliably — non-determinism, hallucinations, timeouts, retries, and asking for structured output.

The phases

It's Just an API Call — an LLM API is a normal HTTP request. You POST a list of messages (system, user), you get back generated text. The annotated request and response, provider-neutral.
Tokens, Context & Cost — what a token is, the context window (the model's limited short-term memory), why you pay per token, and why long conversation histories cost more and can overflow. Plus streaming for responsiveness.
Building Reliably — the model is non-deterministic, it can be confidently wrong, it can be slow, and it can fail. How to handle errors, timeouts, and retries; how to ask for structured output; and how not to ship a foot-gun.

This guide deliberately stops at how to call the thing well. Getting the model to actually do what you want — writing the instructions — is its own craft, covered in Prompt Engineering, Honestly.