JSONL in Python: Read, Write & Parse

A complete guide to working with JSONL (JSON Lines) files in Python. Learn to read, write, parse, and stream JSONL data using built-in modules, pandas, and high-performance libraries.

Last updated: February 2026

Why Python for JSONL?

Python is the most popular language for working with JSONL files, and for good reason. Its built-in json module handles JSON parsing out of the box, file iteration is memory-efficient by default, and the ecosystem offers powerful libraries like pandas and orjson for specialized workflows. Whether you are processing machine learning datasets, application logs, or API responses, Python makes JSONL handling straightforward.

JSONL (JSON Lines) stores one JSON object per line, making it ideal for streaming, append-only logging, and processing large datasets without loading everything into memory. Python's line-by-line file reading aligns perfectly with this format. In this guide, you will learn three approaches to reading JSONL, two approaches to writing it, and how to handle files that are too large to fit in memory.

Reading JSONL Files in Python

There are several ways to read JSONL files in Python, each suited to different use cases. The standard json module works for most scenarios, pandas is convenient for tabular analysis, and generators are best for large files.

The simplest approach uses Python's built-in json module. Open the file, iterate line by line, and parse each line with json.loads(). This loads all records into a list in memory.

Basic Reading with json Module
import json
records = []
with open('data.jsonl', 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
if line: # Skip empty lines
records.append(json.loads(line))
print(f'Loaded {len(records)} records')
print(records[0])

If your JSONL data is tabular (same keys in every record), pandas can read it directly into a DataFrame with a single function call. This is the fastest way to start analyzing structured JSONL data.

Reading with pandas
import pandas as pd
# Read entire file into a DataFrame
df = pd.read_json('data.jsonl', lines=True)
print(df.head())
print(f'Shape: {df.shape}')
# For large files, read in chunks
chunks = pd.read_json('large.jsonl', lines=True, chunksize=10000)
for chunk in chunks:
# Process each chunk (DataFrame)
print(f'Chunk shape: {chunk.shape}')

For files that are too large to fit in memory, use a generator function. This yields one record at a time, keeping memory usage constant regardless of file size. This is the recommended pattern for production data pipelines.

Generator Pattern for Large Files
import json
from typing import Iterator, Any
def read_jsonl(path: str) -> Iterator[dict[str, Any]]:
"""Read a JSONL file lazily, yielding one record at a time."""
with open(path, 'r', encoding='utf-8') as f:
for line_num, line in enumerate(f, 1):
line = line.strip()
if not line:
continue
try:
yield json.loads(line)
except json.JSONDecodeError as e:
print(f'Skipping invalid JSON at line {line_num}: {e}')
# Process records one at a time
for record in read_jsonl('large_data.jsonl'):
process(record) # Only one record in memory at a time

Writing JSONL Files in Python

Writing JSONL files is straightforward: serialize each record as a JSON string and append a newline character. The key rule is one JSON object per line, with no trailing commas or wrapping array.

Use json.dumps() to serialize each record, then write it followed by a newline. Set ensure_ascii=False to preserve Unicode characters like Chinese, Japanese, or emoji in the output.

Basic Writing with json Module
import json
records = [
{"id": 1, "name": "Alice", "age": 30},
{"id": 2, "name": "Bob", "age": 25},
{"id": 3, "name": "Charlie", "age": 35},
]
with open('output.jsonl', 'w', encoding='utf-8') as f:
for record in records:
f.write(json.dumps(record, ensure_ascii=False) + '\n')
print(f'Wrote {len(records)} records to output.jsonl')

If your data is already in a pandas DataFrame, use to_json() with orient='records' and lines=True to export it directly as JSONL. This is the inverse of pd.read_json() with lines=True.

Writing with pandas
import pandas as pd
df = pd.DataFrame([
{"id": 1, "name": "Alice", "age": 30},
{"id": 2, "name": "Bob", "age": 25},
{"id": 3, "name": "Charlie", "age": 35},
])
# Write DataFrame to JSONL
df.to_json('output.jsonl', orient='records', lines=True, force_ascii=False)
print(f'Wrote {len(df)} records to output.jsonl')

Python Libraries for JSONL

Python offers several JSON parsing libraries with different performance characteristics. Choosing the right one depends on your file size and performance requirements.

json (stdlib)

Built-in

Python's built-in json module requires no installation and works everywhere. It is sufficient for most JSONL workloads and supports all standard JSON types. Performance is adequate for files up to a few hundred MB.

orjson

Fastest

orjson is the fastest Python JSON library, written in Rust. It provides 2-10x faster parsing and serialization compared to the standard json module. It outputs bytes instead of strings and natively supports dataclasses, datetime, numpy, and UUID types.

ujson

Fast

ujson (UltraJSON) is a C-based JSON library that is 2-5x faster than the standard json module. Its API is nearly identical to the built-in json module, making it a drop-in replacement. A good middle ground between compatibility and speed.

Streaming Large JSONL Files

When processing JSONL files that are gigabytes in size, you need a streaming approach that reads, transforms, and writes data in batches. This keeps memory usage constant and provides progress tracking.

Streaming Large JSONL Files
import json
import sys
def process_large_jsonl(
input_path: str,
output_path: str,
batch_size: int = 1000
) -> int:
"""Stream-process a large JSONL file in batches."""
processed = 0
batch: list[dict] = []
with open(input_path, 'r') as fin, \
open(output_path, 'w') as fout:
for line in fin:
line = line.strip()
if not line:
continue
record = json.loads(line)
# Transform the record
record['processed'] = True
batch.append(record)
if len(batch) >= batch_size:
for r in batch:
fout.write(json.dumps(r) + '\n')
processed += len(batch)
batch.clear()
print(f'\rProcessed {processed} records...', end='')
# Write remaining records
for r in batch:
fout.write(json.dumps(r) + '\n')
processed += len(batch)
print(f'\nDone. Processed {processed} records total.')
return processed
# Usage
process_large_jsonl('input.jsonl', 'output.jsonl', batch_size=5000)

This pattern uses constant memory by processing records in fixed-size batches. The batch_size parameter controls the trade-off between memory usage and I/O efficiency. For most systems, batches of 1,000 to 10,000 records work well. The progress indicator helps monitor long-running jobs.

Try Our Free JSONL Tools

Don't want to write code? Use our free online tools to view, validate, and convert JSONL files right in your browser.

Work with JSONL Files Online

View, validate, and convert JSONL files up to 1GB right in your browser. No uploads required, 100% private.

Frequently Asked Questions

JSONL in Python β€” Read, Write, Stream & Validate JSON Lin...