How to Handle Large JSONL Files (1GB+)
Strategies and best practices for processing gigabyte-scale JSONL data efficiently
Last updated: February 2026
Why Large JSONL Files Need Special Handling
When JSONL files grow beyond a few hundred megabytes, loading them entirely into memory becomes impractical. A 1GB JSONL file with complex nested objects can consume 3-5GB of RAM when parsed into Python dictionaries or JavaScript objects. This can crash your application or bring your system to a halt.
The key advantage of JSONL over regular JSON is that it can be processed line by line. Each line is an independent JSON document, which means you never need to load the entire file. This streaming capability is what makes JSONL the preferred format for large datasets in machine learning, log analysis, and data engineering.
Stream Reading Strategies
The fundamental approach to handling large JSONL files is to read them line by line, processing each record independently. Here are implementations in popular languages and tools.
Python's file iteration is inherently memory-efficient. The for loop reads one line at a time from disk, keeping memory usage constant regardless of file size.
import jsondef process_large_jsonl(filepath: str) -> int:"""Process a large JSONL file line by line."""count = 0errors = 0with open(filepath, 'r', encoding='utf-8') as f:for line_num, line in enumerate(f, 1):line = line.strip()if not line:continuetry:record = json.loads(line)# Process your record herecount += 1except json.JSONDecodeError as e:errors += 1print(f'Line {line_num}: {e}')print(f'Processed {count} records, {errors} errors')return count
Node.js readline interface provides an efficient way to process files line by line using streams, keeping memory usage minimal even for multi-gigabyte files.
import { createReadStream } from 'fs';import { createInterface } from 'readline';async function processLargeJsonl(filepath) {const rl = createInterface({input: createReadStream(filepath, 'utf-8'),crlfDelay: Infinity,});let count = 0;for await (const line of rl) {const trimmed = line.trim();if (!trimmed) continue;try {const record = JSON.parse(trimmed);// Process your record herecount++;} catch (err) {console.error(`Parse error: ${err.message}`);}}console.log(`Processed ${count} records`);}
Unix command-line tools are perfect for quick inspection and processing of large JSONL files without writing any code.
# Count lines in a JSONL filewc -l data.jsonl# View first 10 recordshead -n 10 data.jsonl# View last 5 recordstail -n 5 data.jsonl# Pretty-print first recordhead -n 1 data.jsonl | jq .# Filter records with jqjq -c 'select(.age > 30)' data.jsonl# Extract specific fieldsjq -c {name, email} data.jsonl
Memory Management Techniques
Beyond basic line-by-line reading, these techniques help you process large JSONL files more efficiently.
Process records in batches of 1,000-10,000 to balance memory usage with processing efficiency. This is especially useful when writing to databases or making API calls.
import jsonfrom typing import Iteratordef read_jsonl_batches(filepath: str,batch_size: int = 5000) -> Iterator[list]:batch = []with open(filepath, 'r') as f:for line in f:record = json.loads(line.strip())batch.append(record)if len(batch) >= batch_size:yield batchbatch = []if batch:yield batch# Usagefor batch in read_jsonl_batches('large.jsonl'):# Insert batch into databasedb.insert_many(batch)
Monitor memory usage during processing to catch issues early and tune your batch size.
import jsonimport psutilimport osdef process_with_monitoring(filepath: str):process = psutil.Process(os.getpid())with open(filepath, 'r') as f:for i, line in enumerate(f):record = json.loads(line)# Process recordif i % 100000 == 0:mem = process.memory_info().rss / 1024 / 1024print(f'Line {i:,}: {mem:.1f} MB')
Splitting Large JSONL Files
Sometimes you need to split a large JSONL file into smaller pieces for parallel processing, uploading to services with size limits, or distributing work across machines.
The Unix split command is the fastest way to split a JSONL file. It works directly with lines, making it perfect for JSONL.
# Split into files of 100,000 lines eachsplit -l 100000 data.jsonl chunk_# Split into files of approximately 100MB eachsplit -b 100m data.jsonl chunk_# Add .jsonl extension to split filesfor f in chunk_*; do mv "$f" "$f.jsonl"; done
For more control over splitting logic, such as splitting by a field value or ensuring balanced output sizes.
import jsondef split_jsonl(input_path: str, lines_per_file: int = 100000):file_num = 0line_count = 0out_file = Nonewith open(input_path, 'r') as f:for line in f:if line_count % lines_per_file == 0:if out_file:out_file.close()file_num += 1out_file = open(f'part_{file_num:04d}.jsonl', 'w')out_file.write(line)line_count += 1if out_file:out_file.close()print(f'Split into {file_num} files')
Compression Strategies
JSONL files compress extremely well because JSON text has high redundancy. Compression can reduce file sizes by 70-90%, saving storage and speeding up transfers.
Python's gzip module transparently handles compressed JSONL files. The .gz extension is a convention that tools recognize automatically.
import gzipimport json# Reading gzipped JSONLwith gzip.open('data.jsonl.gz', 'rt', encoding='utf-8') as f:for line in f:record = json.loads(line)# Process record# Writing gzipped JSONLwith gzip.open('output.jsonl.gz', 'wt', encoding='utf-8') as f:for record in records:f.write(json.dumps(record) + '\n')
Compression Comparison
Typical compression ratios for a 1GB JSONL file with mixed data:
gzip: 70-80% reduction (1GB to 200-300MB), widely supported
zstd: 75-85% reduction (1GB to 150-250MB), faster decompression
lz4: 60-70% reduction (1GB to 300-400MB), fastest speed
No compression: Fastest access, best for frequent random reads
Processing Large Files in the Browser
jsonl.co is designed to handle JSONL files up to 1GB+ directly in your browser. It uses streaming and Web Workers to process files locally without uploading them to any server.
This means your data stays private and you get instant results without waiting for uploads. The viewer can display millions of records with virtual scrolling, and all conversion tools support streaming for large files.
Try Our Free JSONL Tools
View, validate, and convert large JSONL files right in your browser. No uploads, no file size limits, 100% private.