Python IO 编程

Python 文件读写

本章节将详细介绍 Python 中文件读写的核心知识，涵盖文本文件和二进制文件的操作、字符编码问题、文件打开常见模式及最佳实践。

1. 基础文件操作

主要是指：打开、读取、写入和关闭

Python 使用内置的open()函数来操作文件。文件操作通常包括打开、读写和关闭三个步骤。以下是一个简单的文本文件读写示例。

# 写入文本文件
with open('example.txt', 'w') as file:
    file.write('你好，世界！\n')
    file.write('这是第二行内容。')

# 读取文本文件
with open('example.txt', 'r') as file:
    content = file.read()
    print(content)

open(file, mode)：file是文件路径，mode指定操作模式（如'r'读、'w'写、'a'追加）。
with语句：自动关闭文件，避免资源泄漏，推荐使用，不使用 with 的话需要手动 file.close() 文件。

输出结果：

你好，世界！
这是第二行内容。

2. 文件操作模式

文件操作模式决定文件的读写方式和处理类型。常用模式包括：

模式	说明
`r`	只读（默认），文件不存在时抛出错误
`w`	只写，文件存在时覆盖，不存在时创建
`a`	追加，写入数据追加到文件末尾
`rb`	二进制只读
`wb`	二进制只写，覆盖或创建
`ab`	二进制追加

示例：追加内容到文件。

with open('example.txt', 'a') as file:
    file.write('\n追加一行新内容。')

with open('example.txt', 'r') as file:
    print(file.read())

输出：

你好，世界！
这是第二行内容。
追加一行新内容。

3. 读取文件内容的多种方式

Python 提供多种方法读取文件，适合不同场景。

# 一次性读取整个文件
with open('example.txt', 'r') as file:
    content = file.read()
    print('整个文件内容：')
    print(content)

# 按行读取为列表
with open('example.txt', 'r') as file:
    lines = file.readlines()
    print('按行列表：')
    print(lines)

# 逐行读取
with open('example.txt', 'r') as file:
    print('逐行读取：')
    for line in file:
        print(line.strip())  # strip()移除换行符

file.read()：读取整个文件为字符串，适合小文件。
file.readlines()：返回包含所有行的列表，适合需要处理每行的场景。
for line in file：逐行迭代，适合大文件，内存效率高。

4. 字符编码问题

文件操作常涉及字符编码，尤其是处理中文等非ASCII字符。Python默认使用系统编码（如Windows上可能是gbk，Linux上是utf-8）。若编码不匹配，可能导致UnicodeDecodeError。

4.1 指定编码打开文件

使用encoding参数明确指定编码。UTF-8是推荐的通用编码。

# 写入文件，指定UTF-8编码
with open('chinese.txt', 'w', encoding='utf-8') as file:
    file.write('你好，欢迎学习Python文件操作！')

# 读取文件，指定UTF-8编码
with open('chinese.txt', 'r', encoding='utf-8') as file:
    print(file.read())

输出：

你好，欢迎学习Python文件操作！

4.2 处理编码错误

若文件编码未知或不匹配，可使用errors参数处理错误。

# 忽略解码错误
with open('chinese.txt', 'r', encoding='ascii', errors='ignore') as file:
    print(file.read())  # 忽略无法解码的字符

# 替换解码错误
with open('chinese.txt', 'r', encoding='ascii', errors='replace') as file:
    print(file.read())  # 用�替换无法解码的字符

errors='ignore'：忽略无法解码的字符。
errors='replace'：用占位符（如�）替换无法解码的字符。

4.3 检测文件编码

可以使用chardet或charset-normalizer库检测文件编码。

import charset_normalizer

# 检测文件编码
with open('chinese.txt', 'rb') as file:
    result = charset_normalizer.detect(file.read())
    print(result)  # 输出编码信息，如{'encoding': 'utf-8', ...}

5. 二进制文件操作

二进制文件（如图片、视频）需要使用二进制模式（rb、wb等）。二进制模式不涉及字符编码，直接操作字节。

# 复制图片文件
with open('input.jpg', 'rb') as source:
    data = source.read()

with open('output.jpg', 'wb') as destination:
    destination.write(data)

二进制模式适合非文本文件，如图片、音频、视频或压缩文件。
使用rb和wb模式，避免字符编码问题。

6. 处理大文件

对于大文件，逐行或分块读取可避免内存溢出。

6.1 逐行读取大文本文件

with open('large_file.txt', 'r', encoding='utf-8') as file:
    for line in file:
        print(line.strip())  # 处理每行

6.2 分块读取二进制文件

chunk_size = 1024  # 每次读取1KB
with open('large_file.bin', 'rb') as source:
    with open('copy_file.bin', 'wb') as destination:
        while True:
            chunk = source.read(chunk_size)
            if not chunk:
                break
            destination.write(chunk)

read(chunk_size)：每次读取指定字节数，适合大文件复制或处理。

7. 文件路径与跨平台兼容性

文件路径在不同操作系统中可能不同（Windows用\，Linux/Mac用/）。使用os或pathlib模块处理路径更可靠。

import os

# 跨平台路径
file_path = os.path.join('data', 'example.txt')

# 写入文件
with open(file_path, 'w', encoding='utf-8') as file:
    file.write('跨平台路径示例')

# 使用pathlib
from pathlib import Path
path = Path('data') / 'example.txt'
with path.open('r', encoding='utf-8') as file:
    print(file.read())

os.path.join：生成跨平台兼容的路径。
pathlib.Path：更现代的方式，推荐使用。

8. 异常处理

文件操作可能抛出异常，如文件不存在、权限不足等。使用try-except捕获异常。

try:
    with open('non_existent.txt', 'r', encoding='utf-8') as file:
        print(file.read())
except FileNotFoundError:
    print('文件不存在！')
except PermissionError:
    print('没有权限访问文件！')
except UnicodeDecodeError:
    print('文件编码错误！')

常见异常：

FileNotFoundError：文件不存在。
PermissionError：无权限操作文件。
UnicodeDecodeError：编码不匹配。

9. 最佳实践

始终使用with语句：自动关闭文件，防止资源泄漏。
明确指定编码：文本文件操作时，推荐encoding='utf-8'。
处理大文件：使用逐行或分块读取，避免内存问题。
异常处理：捕获可能出现的错误，提供用户友好的提示。
跨平台路径：使用os.path或pathlib处理文件路径。

Python 安装

Python 基础

Python 函数

Python 迭代器

Python 模块

Python 面向对象

Python 错误处理

Python 代码测试

Python 代码调试

Python IO 编程

Python 进程与线程

Python 正则表达式

Python 常用内置模块

Python 常用三方模块

Python 图形界面

Python 网络编程

Python 电子邮件

Python 数据库使用

Python Web开发

Python 异步 IO