JSONL in Python: Read, Write & Parse
A complete guide to working with JSONL (JSON Lines) files in Python. Learn to read, write, parse, and stream JSONL data using built-in modules, pandas, and high-performance libraries.
Last updated: February 2026
Why Python for JSONL?
Python is the most popular language for working with JSONL files, and for good reason. Its built-in json module handles JSON parsing out of the box, file iteration is memory-efficient by default, and the ecosystem offers powerful libraries like pandas and orjson for specialized workflows. Whether you are processing machine learning datasets, application logs, or API responses, Python makes JSONL handling straightforward.
JSONL (JSON Lines) stores one JSON object per line, making it ideal for streaming, append-only logging, and processing large datasets without loading everything into memory. Python's line-by-line file reading aligns perfectly with this format. In this guide, you will learn three approaches to reading JSONL, two approaches to writing it, and how to handle files that are too large to fit in memory.
Reading JSONL Files in Python
There are several ways to read JSONL files in Python, each suited to different use cases. The standard json module works for most scenarios, pandas is convenient for tabular analysis, and generators are best for large files.
The simplest approach uses Python's built-in json module. Open the file, iterate line by line, and parse each line with json.loads(). This loads all records into a list in memory.
import jsonrecords = []with open('data.jsonl', 'r', encoding='utf-8') as f:for line in f:line = line.strip()if line: # Skip empty linesrecords.append(json.loads(line))print(f'Loaded {len(records)} records')print(records[0])
If your JSONL data is tabular (same keys in every record), pandas can read it directly into a DataFrame with a single function call. This is the fastest way to start analyzing structured JSONL data.
import pandas as pd# Read entire file into a DataFramedf = pd.read_json('data.jsonl', lines=True)print(df.head())print(f'Shape: {df.shape}')# For large files, read in chunkschunks = pd.read_json('large.jsonl', lines=True, chunksize=10000)for chunk in chunks:# Process each chunk (DataFrame)print(f'Chunk shape: {chunk.shape}')
For files that are too large to fit in memory, use a generator function. This yields one record at a time, keeping memory usage constant regardless of file size. This is the recommended pattern for production data pipelines.
import jsonfrom typing import Iterator, Anydef read_jsonl(path: str) -> Iterator[dict[str, Any]]:"""Read a JSONL file lazily, yielding one record at a time."""with open(path, 'r', encoding='utf-8') as f:for line_num, line in enumerate(f, 1):line = line.strip()if not line:continuetry:yield json.loads(line)except json.JSONDecodeError as e:print(f'Skipping invalid JSON at line {line_num}: {e}')# Process records one at a timefor record in read_jsonl('large_data.jsonl'):process(record) # Only one record in memory at a time
Writing JSONL Files in Python
Writing JSONL files is straightforward: serialize each record as a JSON string and append a newline character. The key rule is one JSON object per line, with no trailing commas or wrapping array.
Use json.dumps() to serialize each record, then write it followed by a newline. Set ensure_ascii=False to preserve Unicode characters like Chinese, Japanese, or emoji in the output.
import jsonrecords = [{"id": 1, "name": "Alice", "age": 30},{"id": 2, "name": "Bob", "age": 25},{"id": 3, "name": "Charlie", "age": 35},]with open('output.jsonl', 'w', encoding='utf-8') as f:for record in records:f.write(json.dumps(record, ensure_ascii=False) + '\n')print(f'Wrote {len(records)} records to output.jsonl')
If your data is already in a pandas DataFrame, use to_json() with orient='records' and lines=True to export it directly as JSONL. This is the inverse of pd.read_json() with lines=True.
import pandas as pddf = pd.DataFrame([{"id": 1, "name": "Alice", "age": 30},{"id": 2, "name": "Bob", "age": 25},{"id": 3, "name": "Charlie", "age": 35},])# Write DataFrame to JSONLdf.to_json('output.jsonl', orient='records', lines=True, force_ascii=False)print(f'Wrote {len(df)} records to output.jsonl')
Python Libraries for JSONL
Python offers several JSON parsing libraries with different performance characteristics. Choosing the right one depends on your file size and performance requirements.
json (stdlib)
Built-inPython's built-in json module requires no installation and works everywhere. It is sufficient for most JSONL workloads and supports all standard JSON types. Performance is adequate for files up to a few hundred MB.
orjson
Fastestorjson is the fastest Python JSON library, written in Rust. It provides 2-10x faster parsing and serialization compared to the standard json module. It outputs bytes instead of strings and natively supports dataclasses, datetime, numpy, and UUID types.
ujson
Fastujson (UltraJSON) is a C-based JSON library that is 2-5x faster than the standard json module. Its API is nearly identical to the built-in json module, making it a drop-in replacement. A good middle ground between compatibility and speed.
Streaming Large JSONL Files
When processing JSONL files that are gigabytes in size, you need a streaming approach that reads, transforms, and writes data in batches. This keeps memory usage constant and provides progress tracking.
import jsonimport sysdef process_large_jsonl(input_path: str,output_path: str,batch_size: int = 1000) -> int:"""Stream-process a large JSONL file in batches."""processed = 0batch: list[dict] = []with open(input_path, 'r') as fin, \open(output_path, 'w') as fout:for line in fin:line = line.strip()if not line:continuerecord = json.loads(line)# Transform the recordrecord['processed'] = Truebatch.append(record)if len(batch) >= batch_size:for r in batch:fout.write(json.dumps(r) + '\n')processed += len(batch)batch.clear()print(f'\rProcessed {processed} records...', end='')# Write remaining recordsfor r in batch:fout.write(json.dumps(r) + '\n')processed += len(batch)print(f'\nDone. Processed {processed} records total.')return processed# Usageprocess_large_jsonl('input.jsonl', 'output.jsonl', batch_size=5000)
This pattern uses constant memory by processing records in fixed-size batches. The batch_size parameter controls the trade-off between memory usage and I/O efficiency. For most systems, batches of 1,000 to 10,000 records work well. The progress indicator helps monitor long-running jobs.
Try Our Free JSONL Tools
Don't want to write code? Use our free online tools to view, validate, and convert JSONL files right in your browser.