如何在 Python 中讀取 JSONL 檔案？

開啟檔案並逐行迭代，使用 json.loads() 解析每一行。對於大型檔案，使用生成器以避免將所有內容載入記憶體。您也可以使用 pandas 的 pd.read_json('file.jsonl', lines=True) 來處理表格資料。

如何在 Python 中寫入 JSONL 檔案？

以寫入模式開啟檔案，並將每筆記錄寫入為 JSON 字串後接換行符：f.write(json.dumps(record) + '\n')。使用 ensure_ascii=False 以保留 Unicode 字元。使用 pandas 時，可以用 df.to_json('file.jsonl', orient='records', lines=True)。

如何在 Python 中將 JSON 轉換為 JSONL？

使用 json.load() 載入您的 JSON 陣列，然後將每個元素寫入為獨立的一行：for item in data: f.write(json.dumps(item) + '\n')。如果您有一個頂層為陣列的 JSON 檔案，這會將其轉換為 JSONL 格式，每個陣列元素成為一行。

Python 中最快的 JSONL 函式庫是什麼？

orjson 是最快的 Python JSON 函式庫，比標準 json 模組提升 2-10 倍的速度。它輸出 bytes 而非字串，並原生支援 dataclasses 和 numpy 陣列。ujson 是另一個快速的替代方案，API 更為熟悉。

如何在 Python 中處理大型 JSONL 檔案？

使用生成器模式一次處理一筆記錄，無需載入整個檔案。開啟檔案並使用 for 迴圈迭代，Python 會高效地處理。對於批次處理，將記錄累積到列表中，並以 1000-10000 筆記錄為一批進行處理。

如何使用 pandas 解析 JSONL？

使用 pd.read_json('file.jsonl', lines=True) 將 JSONL 檔案直接讀入 DataFrame。對於大型檔案，使用 chunksize 參數：pd.read_json('file.jsonl', lines=True, chunksize=10000) 進行分批處理。這會回傳一個 DataFrame 的迭代器。

Python 處理 JSONL：讀取、寫入與解析

在 Python 中處理 JSONL（JSON Lines）檔案的完整指南。學習使用內建模組、pandas 和高效能函式庫來讀取、寫入、解析和串流處理 JSONL 資料。

最後更新：2026 年 2 月

為什麼使用 Python 處理 JSONL？

Python 是處理 JSONL 檔案最熱門的程式語言，這是有充分理由的。其內建的 json 模組可以直接解析 JSON，檔案迭代預設就具有記憶體效率，而且生態系統提供了 pandas 和 orjson 等強大的函式庫來滿足專業化的工作流程。無論您是在處理機器學習資料集、應用程式日誌還是 API 回應，Python 都能讓 JSONL 的處理變得簡單直接。

JSONL（JSON Lines）每行儲存一個 JSON 物件，非常適合串流處理、僅追加式日誌記錄，以及處理大型資料集而無需將所有內容載入記憶體。Python 的逐行檔案讀取與此格式完美契合。在本指南中，您將學習三種讀取 JSONL 的方法、兩種寫入方法，以及如何處理無法完全放入記憶體的大型檔案。

在 Python 中讀取 JSONL 檔案

在 Python 中讀取 JSONL 檔案有多種方式，每種方式適用於不同的使用情境。標準 json 模組適用於大多數場景，pandas 方便進行表格分析，而生成器最適合處理大型檔案。

最簡單的方法是使用 Python 內建的 json 模組。開啟檔案，逐行迭代，並使用 json.loads() 解析每一行。這會將所有記錄載入到記憶體中的列表裡。

使用 json 模組的基本讀取

import json

records = []
with open('data.jsonl', 'r', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if line:  # Skip empty lines
            records.append(json.loads(line))

print(f'Loaded {len(records)} records')
print(records[0])

如果您的 JSONL 資料是表格型的（每筆記錄都有相同的鍵值），pandas 可以用一個函式呼叫直接將其讀入 DataFrame。這是開始分析結構化 JSONL 資料最快的方式。

使用 pandas 讀取

import pandas as pd

# Read entire file into a DataFrame
df = pd.read_json('data.jsonl', lines=True)
print(df.head())
print(f'Shape: {df.shape}')

# For large files, read in chunks
chunks = pd.read_json('large.jsonl', lines=True, chunksize=10000)
for chunk in chunks:
    # Process each chunk (DataFrame)
    print(f'Chunk shape: {chunk.shape}')

對於無法完全放入記憶體的檔案，使用生成器函式。它一次 yield 一筆記錄，無論檔案大小如何，記憶體使用量都保持恆定。這是正式環境資料管線的推薦模式。

大型檔案的生成器模式

import json
from typing import Iterator, Any

def read_jsonl(path: str) -> Iterator[dict[str, Any]]:
    """Read a JSONL file lazily, yielding one record at a time."""
    with open(path, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                yield json.loads(line)
            except json.JSONDecodeError as e:
                print(f'Skipping invalid JSON at line {line_num}: {e}')

# Process records one at a time
for record in read_jsonl('large_data.jsonl'):
    process(record)  # Only one record in memory at a time

在 Python 中寫入 JSONL 檔案

寫入 JSONL 檔案很簡單：將每筆記錄序列化為 JSON 字串並追加一個換行符。關鍵規則是每行一個 JSON 物件，沒有尾隨逗號，也沒有外層包裹陣列。

使用 json.dumps() 序列化每筆記錄，然後寫入並在後面加上換行符。設定 ensure_ascii=False 以保留輸出中的 Unicode 字元，如中文、日文或表情符號。

使用 json 模組的基本寫入

import json

records = [
    {"id": 1, "name": "Alice", "age": 30},
    {"id": 2, "name": "Bob", "age": 25},
    {"id": 3, "name": "Charlie", "age": 35},
]

with open('output.jsonl', 'w', encoding='utf-8') as f:
    for record in records:
        f.write(json.dumps(record, ensure_ascii=False) + '\n')

print(f'Wrote {len(records)} records to output.jsonl')

如果您的資料已經在 pandas DataFrame 中，使用 to_json() 並設定 orient='records' 和 lines=True 即可直接匯出為 JSONL。這是 pd.read_json() 搭配 lines=True 的反向操作。

使用 pandas 寫入

import pandas as pd

df = pd.DataFrame([
    {"id": 1, "name": "Alice", "age": 30},
    {"id": 2, "name": "Bob", "age": 25},
    {"id": 3, "name": "Charlie", "age": 35},
])

# Write DataFrame to JSONL
df.to_json('output.jsonl', orient='records', lines=True, force_ascii=False)

print(f'Wrote {len(df)} records to output.jsonl')

Python 的 JSONL 函式庫

Python 提供了多個具有不同效能特性的 JSON 解析函式庫。根據您的檔案大小和效能需求選擇合適的函式庫。

json（標準函式庫）

內建

Python 內建的 json 模組無需安裝，隨處可用。它能滿足大多數 JSONL 工作負載，並支援所有標準 JSON 類型。對於數百 MB 以內的檔案，效能完全足夠。

orjson

最快

orjson 是最快的 Python JSON 函式庫，以 Rust 編寫。與標準 json 模組相比，它提供 2-10 倍的解析和序列化速度。它輸出 bytes 而非字串，並原生支援 dataclasses、datetime、numpy 和 UUID 類型。

ujson

快速

ujson（UltraJSON）是一個基於 C 的 JSON 函式庫，比標準 json 模組快 2-5 倍。其 API 與內建 json 模組幾乎完全相同，可以作為直接替代品。它是相容性和速度之間的良好平衡。

串流處理大型 JSONL 檔案

當處理數 GB 大小的 JSONL 檔案時，您需要一種串流方式來分批讀取、轉換和寫入資料。這能保持記憶體使用量恆定，並提供進度追蹤功能。

串流處理大型 JSONL 檔案

import json
import sys

def process_large_jsonl(
    input_path: str,
    output_path: str,
    batch_size: int = 1000
) -> int:
    """Stream-process a large JSONL file in batches."""
    processed = 0
    batch: list[dict] = []

    with open(input_path, 'r') as fin, \
         open(output_path, 'w') as fout:
        for line in fin:
            line = line.strip()
            if not line:
                continue
            record = json.loads(line)
            # Transform the record
            record['processed'] = True
            batch.append(record)

            if len(batch) >= batch_size:
                for r in batch:
                    fout.write(json.dumps(r) + '\n')
                processed += len(batch)
                batch.clear()
                print(f'\rProcessed {processed} records...', end='')

        # Write remaining records
        for r in batch:
            fout.write(json.dumps(r) + '\n')
        processed += len(batch)

    print(f'\nDone. Processed {processed} records total.')
    return processed

# Usage
process_large_jsonl('input.jsonl', 'output.jsonl', batch_size=5000)

此模式透過以固定大小的批次處理記錄來使用恆定記憶體。batch_size 參數控制記憶體使用量和 I/O 效率之間的權衡。對於大多數系統，1,000 到 10,000 筆記錄的批次效果良好。進度指示器有助於監控長時間執行的任務。

試試我們的免費 JSONL 工具

不想寫程式？使用我們的免費線上工具，直接在瀏覽器中檢視、驗證和轉換 JSONL 檔案。

large JSONL files guide

JSONL validator

JSONL best practices

線上處理 JSONL 檔案

直接在瀏覽器中檢視、驗證和轉換高達 1GB 的 JSONL 檔案。無需上傳，100% 私密。

Python 處理 JSONL：讀取、寫入與解析

為什麼使用 Python 處理 JSONL？

在 Python 中讀取 JSONL 檔案

在 Python 中寫入 JSONL 檔案

Python 的 JSONL 函式庫

json（標準函式庫）

orjson

ujson

串流處理大型 JSONL 檔案

試試我們的免費 JSONL 工具

線上處理 JSONL 檔案

常見問題

如何在 Python 中讀取 JSONL 檔案？

如何在 Python 中寫入 JSONL 檔案？

如何在 Python 中將 JSON 轉換為 JSONL？

Python 中最快的 JSONL 函式庫是什麼？

如何在 Python 中處理大型 JSONL 檔案？

如何使用 pandas 解析 JSONL？