How to Handle Large JSONL Files (1GB+)

Strategies and best practices for processing gigabyte-scale JSONL data efficiently

Last updated: February 2026

Why Large JSONL Files Need Special Handling

When JSONL files grow beyond a few hundred megabytes, loading them entirely into memory becomes impractical. A 1GB JSONL file with complex nested objects can consume 3-5GB of RAM when parsed into Python dictionaries or JavaScript objects. This can crash your application or bring your system to a halt.

The key advantage of JSONL over regular JSON is that it can be processed line by line. Each line is an independent JSON document, which means you never need to load the entire file. This streaming capability is what makes JSONL the preferred format for large datasets in machine learning, log analysis, and data engineering.

Stream Reading Strategies

The fundamental approach to handling large JSONL files is to read them line by line, processing each record independently. Here are implementations in popular languages and tools.

Python's file iteration is inherently memory-efficient. The for loop reads one line at a time from disk, keeping memory usage constant regardless of file size.

Python
import json
def process_large_jsonl(filepath: str) -> int:
"""Process a large JSONL file line by line."""
count = 0
errors = 0
with open(filepath, 'r', encoding='utf-8') as f:
for line_num, line in enumerate(f, 1):
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
# Process your record here
count += 1
except json.JSONDecodeError as e:
errors += 1
print(f'Line {line_num}: {e}')
print(f'Processed {count} records, {errors} errors')
return count

Node.js readline interface provides an efficient way to process files line by line using streams, keeping memory usage minimal even for multi-gigabyte files.

Node.js
import { createReadStream } from 'fs';
import { createInterface } from 'readline';
async function processLargeJsonl(filepath) {
const rl = createInterface({
input: createReadStream(filepath, 'utf-8'),
crlfDelay: Infinity,
});
let count = 0;
for await (const line of rl) {
const trimmed = line.trim();
if (!trimmed) continue;
try {
const record = JSON.parse(trimmed);
// Process your record here
count++;
} catch (err) {
console.error(`Parse error: ${err.message}`);
}
}
console.log(`Processed ${count} records`);
}

Unix command-line tools are perfect for quick inspection and processing of large JSONL files without writing any code.

Command Line Tools
# Count lines in a JSONL file
wc -l data.jsonl
# View first 10 records
head -n 10 data.jsonl
# View last 5 records
tail -n 5 data.jsonl
# Pretty-print first record
head -n 1 data.jsonl | jq .
# Filter records with jq
jq -c 'select(.age > 30)' data.jsonl
# Extract specific fields
jq -c {name, email} data.jsonl

Memory Management Techniques

Beyond basic line-by-line reading, these techniques help you process large JSONL files more efficiently.

Process records in batches of 1,000-10,000 to balance memory usage with processing efficiency. This is especially useful when writing to databases or making API calls.

Batch Processing
import json
from typing import Iterator
def read_jsonl_batches(
filepath: str,
batch_size: int = 5000
) -> Iterator[list]:
batch = []
with open(filepath, 'r') as f:
for line in f:
record = json.loads(line.strip())
batch.append(record)
if len(batch) >= batch_size:
yield batch
batch = []
if batch:
yield batch
# Usage
for batch in read_jsonl_batches('large.jsonl'):
# Insert batch into database
db.insert_many(batch)

Monitor memory usage during processing to catch issues early and tune your batch size.

Memory Monitoring
import json
import psutil
import os
def process_with_monitoring(filepath: str):
process = psutil.Process(os.getpid())
with open(filepath, 'r') as f:
for i, line in enumerate(f):
record = json.loads(line)
# Process record
if i % 100000 == 0:
mem = process.memory_info().rss / 1024 / 1024
print(f'Line {i:,}: {mem:.1f} MB')

Splitting Large JSONL Files

Sometimes you need to split a large JSONL file into smaller pieces for parallel processing, uploading to services with size limits, or distributing work across machines.

The Unix split command is the fastest way to split a JSONL file. It works directly with lines, making it perfect for JSONL.

Using the split Command
# Split into files of 100,000 lines each
split -l 100000 data.jsonl chunk_
# Split into files of approximately 100MB each
split -b 100m data.jsonl chunk_
# Add .jsonl extension to split files
for f in chunk_*; do mv "$f" "$f.jsonl"; done

For more control over splitting logic, such as splitting by a field value or ensuring balanced output sizes.

Python Script
import json
def split_jsonl(input_path: str, lines_per_file: int = 100000):
file_num = 0
line_count = 0
out_file = None
with open(input_path, 'r') as f:
for line in f:
if line_count % lines_per_file == 0:
if out_file:
out_file.close()
file_num += 1
out_file = open(f'part_{file_num:04d}.jsonl', 'w')
out_file.write(line)
line_count += 1
if out_file:
out_file.close()
print(f'Split into {file_num} files')

Compression Strategies

JSONL files compress extremely well because JSON text has high redundancy. Compression can reduce file sizes by 70-90%, saving storage and speeding up transfers.

Python's gzip module transparently handles compressed JSONL files. The .gz extension is a convention that tools recognize automatically.

Reading and Writing Gzipped JSONL
import gzip
import json
# Reading gzipped JSONL
with gzip.open('data.jsonl.gz', 'rt', encoding='utf-8') as f:
for line in f:
record = json.loads(line)
# Process record
# Writing gzipped JSONL
with gzip.open('output.jsonl.gz', 'wt', encoding='utf-8') as f:
for record in records:
f.write(json.dumps(record) + '\n')

Compression Comparison

Typical compression ratios for a 1GB JSONL file with mixed data:

gzip: 70-80% reduction (1GB to 200-300MB), widely supported

zstd: 75-85% reduction (1GB to 150-250MB), faster decompression

lz4: 60-70% reduction (1GB to 300-400MB), fastest speed

No compression: Fastest access, best for frequent random reads

Processing Large Files in the Browser

jsonl.co is designed to handle JSONL files up to 1GB+ directly in your browser. It uses streaming and Web Workers to process files locally without uploading them to any server.

This means your data stays private and you get instant results without waiting for uploads. The viewer can display millions of records with virtual scrolling, and all conversion tools support streaming for large files.

Try Our Free JSONL Tools

View, validate, and convert large JSONL files right in your browser. No uploads, no file size limits, 100% private.

Handle Large JSONL Files Online

View, validate, and convert JSONL files up to 1GB right in your browser. No uploads required, 100% private.

Frequently Asked Questions

Large JSONL Files (1GB+) β€” Streaming, Splitting, Compress...