OpenAI JSONL Format Guide
Everything you need to know about the JSONL format used by OpenAI for fine-tuning models and the Batch API. Includes format specifications, code examples, and common pitfalls.
Last updated: February 2026
What is the OpenAI JSONL Format?
OpenAI uses JSONL (JSON Lines) as the standard file format for fine-tuning datasets and Batch API requests. Each line in the file is a complete, independent JSON object β no wrapping array, no commas between lines.
This format is chosen because it allows efficient streaming and line-by-line processing. Each training example or API request can be validated independently, and files can be processed without loading the entire dataset into memory.
Understanding the exact format requirements is critical. Even small formatting errors β like a trailing comma or a missing field β will cause the entire file to be rejected.
Fine-tuning JSONL Format
For chat model fine-tuning (GPT-4o, GPT-4o-mini, GPT-3.5 Turbo), each line must contain a "messages" array with the conversation turns.
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of France?"},{"role":"assistant","content":"The capital of France is Paris."}]}{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"2+2 equals 4."}]}
Required Fields
- messages β Array of message objects (required)
- role β One of: "system", "user", or "assistant" (required)
- content β The text content of the message (required)
The system message is optional but recommended. Each conversation must have at least one user message and one assistant message. The assistant message is what the model learns to generate.
Batch API JSONL Format
The Batch API uses a different JSONL format where each line is an API request with a custom ID.
{"custom_id":"request-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello, how are you?"}]}'}{"custom_id":"request-2","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"What is the weather today?"}]}'}
Required Fields
- custom_id β A unique identifier for each request (required)
- method β HTTP method, typically "POST" (required)
- url β API endpoint path (required)
- body β The request body, same as a regular API call (required)
Format Requirements
Follow these rules to ensure your JSONL file is accepted by OpenAI:
- Each line must be valid JSON β no syntax errors allowed
- Each line must be a JSON object (starts with '{' and ends with '}')
- Fine-tuning files must include a "messages" array in each line
- Each message must have both "role" and "content" fields
- Valid roles are: "system", "user", and "assistant"
- File must be UTF-8 encoded without BOM (Byte Order Mark)
- No trailing commas, comments, or extra whitespace between lines
- Empty lines are allowed and will be ignored
Common Mistakes
These are the most frequent errors when creating OpenAI JSONL files:
Using a JSON array instead of JSONL
Wrong: wrapping all objects in [ ]. JSONL files must have one object per line with no wrapping array.
['{'"messages":[...]'}', '{'"messages":[...]'}']'{'"messages":[...]'}'
'{'"messages":[...]'}'Missing required fields
Every message must have both "role" and "content". Omitting either will cause validation to fail.
'{'"messages":['{'"role":"user"'}']'}''{'"messages":['{'"role":"user","content":"Hello"'}']'}'Trailing commas in JSON
JSON does not allow trailing commas after the last item in an array or object.
'{'"messages":['{'"role":"user","content":"Hi",'}']'}''{'"messages":['{'"role":"user","content":"Hi"'}']'}'BOM characters or wrong encoding
Save your file as UTF-8 without BOM. Some text editors add invisible BOM characters that break JSON parsing.
\uFEFF'{'"messages":[...]'}''{'"messages":[...]'}'Code Examples
Here's how to create OpenAI JSONL files programmatically:
import jsontraining_data = [{"messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "What is JSONL?"},{"role": "assistant", "content": "JSONL (JSON Lines) is a text format where each line is a valid JSON object."}]},{"messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "How do I fine-tune a model?"},{"role": "assistant", "content": "Prepare a JSONL file with training examples, then use the OpenAI fine-tuning API."}]},]with open("training.jsonl", "w", encoding="utf-8") as f:for entry in training_data:f.write(json.dumps(entry, ensure_ascii=False) + "\n")print(f"Created training.jsonl with {len(training_data)} examples")
const fs = require('fs');const trainingData = [{ messages: [{ role: 'system', content: 'You are a helpful assistant.' },{ role: 'user', content: 'What is JSONL?' },{ role: 'assistant', content: 'JSONL (JSON Lines) is a text format where each line is a valid JSON object.' },]},{ messages: [{ role: 'system', content: 'You are a helpful assistant.' },{ role: 'user', content: 'How do I fine-tune a model?' },{ role: 'assistant', content: 'Prepare a JSONL file with training examples, then use the OpenAI fine-tuning API.' },]},];const jsonl = trainingData.map(d => JSON.stringify(d)).join('\n');fs.writeFileSync('training.jsonl', jsonl + '\n', 'utf-8');console.log(`Created training.jsonl with ${trainingData.length} examples`);