What is the best compression for JSONL files?

It depends on your use case. For real-time data pipelines, zstd (Zstandard) offers the best balance of compression ratio and speed. For archival storage, Brotli at maximum quality achieves the smallest file sizes. For universal compatibility across tools and platforms, gzip remains the safest choice since it is supported everywhere.

How much can you compress a JSONL file?

JSONL files typically compress 5x to 12x depending on the algorithm and data characteristics. Files with repetitive keys, low-cardinality values (like log levels or status codes), and consistent record structure compress better. A 100 MB JSONL log file can shrink to 10-20 MB with gzip, or as little as 8-10 MB with Brotli at maximum compression.

Is zstd better than gzip for JSONL?

In most metrics, yes. Zstd compresses 3-5x faster than gzip at default settings while achieving equal or better compression ratios. Decompression is also 2-3x faster. The main advantage of gzip is universal compatibility — every tool, language, and platform supports it. If your toolchain supports zstd, it is the better choice for JSONL compression.

Can I use compressed JSONL with AWS Athena or BigQuery?

Yes. AWS Athena natively reads gzip and zstd compressed JSONL files from S3 without any special configuration. Google BigQuery supports gzip-compressed JSONL for both loading and querying. Simply upload your compressed .jsonl.gz or .jsonl.zst files and query them directly. Both services handle decompression transparently.

Should I compress JSONL before uploading to S3 or GCS?

Yes, always compress JSONL before uploading to cloud storage. It reduces storage costs (often 5-10x), speeds up uploads and downloads, and reduces data transfer charges. Use gzip for maximum compatibility with downstream tools, or zstd for better compression with modern toolchains. Set the Content-Encoding header when uploading so CDNs can handle decompression automatically.

How do I stream compressed JSONL in Python?

Use gzip.open() or pyzstd.open() with mode 'rt' (read text) to get a file handle that decompresses on-the-fly. Iterate line by line just like a regular file: for line in gzip.open('data.jsonl.gz', 'rt'): record = json.loads(line). This streams decompression and parsing together, keeping memory usage constant regardless of file size.

JSONL Compression: gzip vs zstd vs Brotli

A practical guide to compressing JSONL files. Compare compression ratios, speed benchmarks, and learn when to use gzip, zstd, or Brotli for your data pipelines, cloud storage, and web delivery.

Last updated: February 2026

Why Compress JSONL Files?

JSONL files grow fast. A single day of application logs can produce gigabytes of line-delimited JSON, and machine learning datasets routinely reach tens of gigabytes. Without compression, you pay more for storage, transfers take longer, and I/O becomes the bottleneck in your data pipeline. Compression is not optional at scale — it is a fundamental part of working with JSONL data efficiently.

The good news is that JSONL compresses exceptionally well. Because JSON is repetitive text with recurring keys, delimiters, and structural patterns, compression algorithms can exploit this redundancy to achieve 5x to 15x size reduction. The challenge is choosing the right algorithm for your use case: gzip offers universal compatibility, zstd delivers the best speed-to-ratio tradeoff, and Brotli achieves the highest compression for static assets. This guide compares all three with real benchmarks, working code examples, and clear recommendations.

Compression Algorithms Overview

Three algorithms dominate the JSONL compression landscape. Each uses different strategies and is optimized for different scenarios. Understanding their tradeoffs helps you make the right choice for your specific workload.

gzip (DEFLATE)

Universal

The universal standard. gzip has been around since 1992 and is supported everywhere — every programming language, every operating system, every cloud provider, and every web browser. It uses the DEFLATE algorithm combining LZ77 and Huffman coding. While not the fastest or most efficient, its ubiquity makes it the safe default choice when compatibility matters most.

Zstandard (zstd)

Recommended

Developed by Facebook in 2016, zstd is the modern workhorse of data compression. It compresses and decompresses significantly faster than gzip while achieving similar or better ratios. Zstd also supports dictionary compression, which is especially powerful for JSONL files where every line shares the same key structure. It is the best choice for data pipelines and real-time processing.

Brotli

Best Ratio

Created by Google, Brotli achieves the highest compression ratios among the three, especially at maximum compression levels. It uses a combination of LZ77, Huffman coding, and a built-in static dictionary of common web content. Brotli excels at compressing JSONL for HTTP delivery and static storage, but its compression speed at high levels is notably slower than gzip or zstd.

Head-to-Head Comparison

The following table summarizes the key differences between gzip, zstd, and Brotli across the metrics that matter most when compressing JSONL files. These are general characteristics at default settings; actual performance varies with data and compression level.

Metric	gzip	zstd	Brotli
Compression Ratio	Good (5-8x)	Very Good (6-10x)	Excellent (7-12x)
Compression Speed	Moderate	Fast	Slow to Moderate
Decompression Speed	Moderate	Very Fast	Fast
CPU Usage	Moderate	Low to Moderate	High (at max level)
Browser Support	All browsers	Chrome 123+, Firefox 126+	All modern browsers
Streaming Support	Yes (native)	Yes (native)	Limited

Benchmark Results: 100 MB JSONL File

To give concrete numbers, here are benchmark results from compressing a 100 MB JSONL file containing application log records. Each record has 12 fields including timestamps, log levels, message strings, and nested metadata objects. Tests were run on an AMD Ryzen 7 with 32 GB RAM and NVMe storage.

Algorithm & Level	Compressed Size	Ratio	Compress Time	Decompress Time
gzip (level 6)	14.2 MB	7.0x	2.8s	0.9s
gzip (level 9)	13.1 MB	7.6x	8.4s	0.9s
zstd (level 3)	12.8 MB	7.8x	0.6s	0.3s
zstd (level 1)	15.1 MB	6.6x	0.3s	0.3s
Brotli (level 6)	11.5 MB	8.7x	3.2s	0.5s
Brotli (level 11)	9.8 MB	10.2x	42.1s	0.4s

Benchmarks are representative of typical JSONL log data. Results vary depending on field cardinality, value entropy, and record structure. Files with highly repetitive keys and low-entropy values (such as log levels or status codes) compress better than those with unique high-entropy strings.

Compression Code Examples

Here are practical examples for compressing and decompressing JSONL files in Python, Node.js, and from the command line. Each example shows how to work with all three algorithms.

Python has built-in gzip support. For zstd and Brotli, install the pyzstd and brotli packages. All three follow the same pattern: open a compressed file handle, then read or write JSONL lines through it.

Python: gzip, zstd & Brotli

import gzip
import json

# === gzip (built-in) ===
# Write compressed JSONL
with gzip.open('data.jsonl.gz', 'wt', encoding='utf-8') as f:
    for record in records:
        f.write(json.dumps(record, ensure_ascii=False) + '\n')

# Read compressed JSONL
with gzip.open('data.jsonl.gz', 'rt', encoding='utf-8') as f:
    for line in f:
        record = json.loads(line)

# === zstd (pip install pyzstd) ===
import pyzstd

# Write compressed JSONL
with pyzstd.open('data.jsonl.zst', 'wt', encoding='utf-8') as f:
    for record in records:
        f.write(json.dumps(record, ensure_ascii=False) + '\n')

# Read compressed JSONL
with pyzstd.open('data.jsonl.zst', 'rt', encoding='utf-8') as f:
    for line in f:
        record = json.loads(line)

# === Brotli (pip install brotli) ===
import brotli

# Compress an entire JSONL file
with open('data.jsonl', 'rb') as f:
    raw = f.read()
compressed = brotli.compress(raw, quality=6)
with open('data.jsonl.br', 'wb') as f:
    f.write(compressed)

# Decompress
with open('data.jsonl.br', 'rb') as f:
    raw = brotli.decompress(f.read())
for line in raw.decode('utf-8').splitlines():
    record = json.loads(line)

Node.js includes built-in support for both gzip and Brotli through the zlib module. For zstd, use the @aspect-build/zstd or fzstd npm package. The stream-based API is ideal for processing large JSONL files without loading them entirely into memory.

Node.js: zlib gzip & Brotli

import { createReadStream, createWriteStream } from 'fs';
import { createGzip, createGunzip, createBrotliCompress,
         createBrotliDecompress } from 'zlib';
import { createInterface } from 'readline';
import { pipeline } from 'stream/promises';

// === gzip compress ===
await pipeline(
  createReadStream('data.jsonl'),
  createGzip({ level: 6 }),
  createWriteStream('data.jsonl.gz')
);

// === gzip decompress & parse ===
const gunzip = createGunzip();
const rl = createInterface({
  input: createReadStream('data.jsonl.gz').pipe(gunzip),
});
for await (const line of rl) {
  if (line.trim()) {
    const record = JSON.parse(line);
    // process record
  }
}

// === Brotli compress ===
await pipeline(
  createReadStream('data.jsonl'),
  createBrotliCompress(),
  createWriteStream('data.jsonl.br')
);

// === Brotli decompress & parse ===
const br = createBrotliDecompress();
const rl2 = createInterface({
  input: createReadStream('data.jsonl.br').pipe(br),
});
for await (const line of rl2) {
  if (line.trim()) {
    const record = JSON.parse(line);
  }
}

Command-line tools are the fastest way to compress JSONL files. gzip is pre-installed on all Unix systems. Install zstd and brotli via your package manager for the other two algorithms.

Command Line: gzip, zstd & Brotli

# === gzip ===
# Compress (keeps original by default with -k)
gzip -k data.jsonl                # -> data.jsonl.gz
gzip -9 -k data.jsonl              # max compression

# Decompress
gzip -d data.jsonl.gz
# or: gunzip data.jsonl.gz

# === zstd ===
# Install: brew install zstd / apt install zstd
zstd data.jsonl                    # -> data.jsonl.zst
zstd -3 data.jsonl                 # level 3 (default)
zstd --fast data.jsonl             # fastest compression

# Decompress
zstd -d data.jsonl.zst
# or: unzstd data.jsonl.zst

# === Brotli ===
# Install: brew install brotli / apt install brotli
brotli data.jsonl                  # -> data.jsonl.br
brotli -q 6 data.jsonl             # quality 6
brotli -q 11 data.jsonl            # max compression

# Decompress
brotli -d data.jsonl.br

# === Piping with jq ===
# Compress filtered JSONL
cat data.jsonl | jq -c 'select(.level == "error")' | gzip > errors.jsonl.gz

# Decompress and count lines
zstd -dc data.jsonl.zst | wc -l

Cloud Storage Compression Strategies

When storing JSONL files in cloud object storage, compression reduces both storage costs and transfer time. Most cloud providers support transparent decompression for gzip and Brotli through their CDN layers, but the upload and storage strategies differ.

Upload compressed JSONL to S3 with the correct Content-Encoding header. S3 stores the compressed bytes, and CloudFront can serve them with automatic decompression. For data lake workloads, tools like AWS Athena and Spark natively read gzip and zstd compressed JSONL.

AWS S3 with Compression

guide-jsonl-compression.jsonlCompression.cloudStorage.s3.code

Google Cloud Storage supports gzip transcoding. When you upload a gzip-compressed object with the Content-Encoding: gzip header, GCS can serve the decompressed version automatically when clients send Accept-Encoding: gzip. For BigQuery imports, use gzip-compressed JSONL directly.

Google Cloud Storage with Compression

from google.cloud import storage
import gzip
import json

client = storage.Client()
bucket = client.bucket('my-data-bucket')

# Upload gzip-compressed JSONL
def upload_compressed(records, blob_name):
    blob = bucket.blob(f{blob_name}.jsonl.gz)
    blob.content_encoding = 'gzip'
    blob.content_type = 'application/x-ndjson'

    data = '\n'.join(
        json.dumps(r, ensure_ascii=False) for r in records
    ).encode('utf-8')

    blob.upload_from_string(
        gzip.compress(data),
        content_type='application/x-ndjson',
    )

# BigQuery: load compressed JSONL directly
# bq load --source_format=NEWLINE_DELIMITED_JSON \
#   my_dataset.my_table gs://bucket/data.jsonl.gz schema.json

Best Practices: When to Use Which Algorithm

There is no single best compression algorithm. The right choice depends on whether you prioritize storage size, processing speed, compatibility, or a balance of all three. Here are clear recommendations for common JSONL use cases.

Archival & Cold Storage

Use Brotli (quality 9-11) or zstd (level 19+) for maximum compression.

Compression time matters less for archival. You compress once and decompress rarely. Brotli at quality 11 can achieve 10x+ compression on JSONL data, significantly reducing long-term storage costs.

Real-time Data Pipelines

Use zstd (level 1-3) for the best speed-to-ratio tradeoff.

In streaming pipelines (Kafka, Kinesis, Flink), compression and decompression speed directly affect throughput and latency. Zstd at level 1 compresses faster than gzip while achieving better ratios. Its dictionary mode is ideal for JSONL with fixed schemas.

Web Delivery & APIs

Use Brotli for static files, gzip as fallback for maximum compatibility.

All modern browsers support Brotli via Accept-Encoding: br. CDNs like Cloudflare and CloudFront can automatically compress with Brotli. Use gzip as fallback for older clients. Zstd browser support is growing but not yet universal.

ETL & Batch Processing

Use gzip for maximum compatibility, or zstd for better performance.

Most data tools (Spark, Athena, BigQuery, pandas) support gzip natively. Zstd support is improving rapidly. If your toolchain supports zstd, prefer it for 3-5x faster compression with comparable ratios.

Try Our Free JSONL Tools

Compress your JSONL files before uploading, or validate and convert them using our free online tools. No installation required.

large JSONL files

JSONL best practices

JSONL splitter

Work with JSONL Files Online

View, validate, and convert JSONL files up to 1GB right in your browser. No uploads required, 100% private.