Structured Output: Getting JSON, Tables, and Data from AI
# Structured Output: Getting JSON, Tables, and Data from AI
One of the most practical prompt engineering skills is coaxing structured, parseable output from language models. Whether you need JSON for an API, CSV for a spreadsheet, or formatted tables for a report, getting AI to produce consistently structured data requires specific prompting techniques and validation strategies.
Why Structured Output Matters
In production applications, AI outputs feed into downstream systems โ databases, APIs, user interfaces, analytics pipelines. These systems expect data in specific formats. A chatbot might need JSON to populate a form. An analytics tool might need structured categories. An automation might need precisely formatted action items. If the AI output is even slightly malformed, the entire pipeline breaks.
JSON Output Techniques
To get reliable JSON from language models, provide the exact schema you expect. Do not just say "return JSON" โ specify every field, its type, and whether it is required. Include a complete example of the expected output structure in your prompt.
Effective prompting looks like: "Extract the following information and return it as valid JSON matching this exact schema: {name: string, email: string, phone: string | null, interests: string[]}. If a field cannot be determined from the input, use null. Do not include any text before or after the JSON object."
The instruction to avoid text before and after the JSON is critical. Models love to add conversational wrappers like "Here is the JSON:" which break automated parsing.
Handling Complex Nested Structures
For deeply nested JSON, provide the complete structure as an example rather than describing it in words. The model follows structural examples much more reliably than verbal descriptions of structure. Show one complete example of the desired output with all nesting levels, optional fields, and array structures represented.
When dealing with variable-length arrays or conditional fields, document these cases explicitly: "The items array may contain 0-N items. Each item must have at minimum a 'name' and 'price' field. Optional fields include 'description' and 'category'."
Table and CSV Generation
For tabular data, specify column headers, data types for each column, and formatting rules. "Generate a comparison table with these exact columns: Feature, Product A, Product B, Winner. Use markdown table syntax. The Winner column should contain only 'A', 'B', or 'Tie'."
For CSV output, be explicit about delimiters, quoting rules, and header rows: "Output as CSV with comma delimiters. Quote any fields containing commas. Include a header row. Do not include any text before or after the CSV data."
Using Function Calling and Tool Use
Modern AI APIs offer structured output through function calling or tool use mechanisms. Instead of hoping the model produces valid JSON in free text, you define a function schema and the model fills in the parameters. This is far more reliable than free-text JSON generation because the API infrastructure enforces the schema.
OpenAI, Anthropic, and Google all offer structured output modes or function calling that guarantees schema compliance. Use these mechanisms whenever available rather than relying on prompt-based structuring alone.
Validation and Error Recovery
Never trust AI-structured output without validation. Implement schema validation that checks every response before processing it. When validation fails, retry with a more explicit prompt or error message. A common pattern: attempt parsing, on failure send the malformed output back to the model with instructions to fix it.
Build retry logic with escalating specificity. First attempt uses the standard prompt. If it fails, the retry includes the specific error and the malformed output. If that fails, a final attempt provides an even more constrained prompt with the exact template to fill in.
Consistent Enumeration Values
When outputs need specific enum values (categories, statuses, types), list all valid options explicitly and instruct the model to use only these exact strings. "Categorize each item as exactly one of: 'electronics', 'clothing', 'food', 'furniture', 'other'. Use lowercase. Do not create new categories."
Without this constraint, models will create variations โ "Electronics" vs "electronics" vs "electronic devices" โ that break downstream processing.
Handling Uncertainty in Structured Data
Define how the model should represent uncertainty within your structure. Should missing data be null, empty string, "unknown", or omitted entirely? Inconsistent uncertainty representation causes parsing issues. Be explicit: "Use null for any field where the information is not available in the source text. Never use empty strings or placeholder text."
Batch Processing
When processing multiple items into structured output, decide between one response containing all items versus one response per item. Batch responses are faster but risk partial failures corrupting the entire output. Per-item processing is more resilient but slower and more expensive.
For batch approaches, instruct the model to produce a JSON array and validate each element independently. If the array is malformed, you can often salvage individual valid elements rather than retrying the entire batch.
Testing Structured Output Prompts
Test your structured output prompts with varied inputs, including edge cases: empty inputs, extremely long inputs, ambiguous data, and inputs in unexpected formats. Build a test suite that verifies both the structure (valid JSON, correct fields) and the content (accurate extraction, correct categorization) of outputs.
Production Patterns
In production, combine prompt-based structuring with API-level constraints (function calling), application-level validation (schema checking), and error recovery (retry logic). This defense-in-depth approach ensures that your application handles structured output failures gracefully rather than crashing on the inevitable malformed response.