File I/O Examples¶
This section presents three mini-projects that synthesize multiple I/O concepts: pathlib for path management, with for safe resource handling, error handling for robustness, and csv/json for structured data.
Mental Model
These examples show how individual I/O skills combine into real programs. Each mini-project follows the same pattern: open input, process data, write output, handle errors. Seeing the full pipeline from reading to writing helps you connect isolated concepts into practical workflows.
1. Config-Driven Text Processing¶
This mini-project reads a JSON configuration file that specifies an input file, an output file, and a processing mode. It then processes the input text file according to the config and writes a summary report.
Configuration file¶
Create a file called config.json with the following contents:
json
{
"input_file": "source.txt",
"output_file": "report.txt",
"mode": "words"
}
The mode field accepts "words" (count words) or "lines" (count lines).
Input file¶
Create a file called source.txt with some sample text:
text
Python makes file handling straightforward.
The with statement ensures files are closed properly.
Structured formats like JSON and CSV are widely used.
Processing script¶
```python import json from pathlib import Path
def load_config(config_path): """Load and validate a JSON configuration file.""" path = Path(config_path)
if not path.exists():
raise FileNotFoundError(f"Config file not found: {path}")
with open(path) as f:
config = json.load(f)
required_keys = {"input_file", "output_file", "mode"}
missing = required_keys - config.keys()
if missing:
raise KeyError(f"Missing config keys: {missing}")
if config["mode"] not in ("words", "lines"):
raise ValueError(f"Invalid mode: {config['mode']}")
return config
def process_file(input_path, mode): """Read a text file and return a count based on the mode.""" path = Path(input_path)
if not path.exists():
raise FileNotFoundError(f"Input file not found: {path}")
with open(path) as f:
text = f.read()
if mode == "words":
return len(text.split())
else:
return len(text.strip().splitlines())
def write_report(output_path, input_name, mode, count): """Write a summary report to the output file.""" path = Path(output_path)
with open(path, "w") as f:
f.write(f"Report for: {input_name}\n")
f.write(f"Mode: {mode}\n")
f.write(f"Count: {count}\n")
Main workflow¶
config = load_config("config.json")
count = process_file(config["input_file"], config["mode"]) write_report(config["output_file"], config["input_file"], config["mode"], count)
print(f"Report written to {config['output_file']}") ```
Running this script produces a file called report.txt:
text
Report for: source.txt
Mode: words
Count: 18
The workflow ties together JSON parsing, text file reading, and file writing. The load_config function validates the configuration before processing begins, so errors are caught early.
2. Safe File Copy with Pathlib¶
This mini-project copies all .txt files from a source folder to a destination folder. It uses pathlib throughout, creates the destination directory if needed, and skips files that already exist to avoid accidental overwrites.
```python from pathlib import Path
def copy_txt_files(source_dir, dest_dir, overwrite=False): """Copy all .txt files from source_dir to dest_dir.
Parameters
----------
source_dir : str or Path
Directory to search for .txt files.
dest_dir : str or Path
Directory to copy files into. Created if it does not exist.
overwrite : bool
If False (default), skip files that already exist in dest_dir.
Returns
-------
dict
Counts of files copied and skipped.
"""
src = Path(source_dir)
dst = Path(dest_dir)
if not src.is_dir():
raise FileNotFoundError(f"Source directory not found: {src}")
# Create destination directory (and parents) if needed
dst.mkdir(parents=True, exist_ok=True)
copied = 0
skipped = 0
for src_file in sorted(src.glob("*.txt")):
dst_file = dst / src_file.name
if dst_file.exists() and not overwrite:
print(f"Skipped (already exists): {dst_file.name}")
skipped += 1
continue
with open(src_file) as f_in:
content = f_in.read()
with open(dst_file, "w") as f_out:
f_out.write(content)
print(f"Copied: {src_file.name}")
copied += 1
return {"copied": copied, "skipped": skipped}
Main workflow¶
results = copy_txt_files("source_folder", "backup_folder")
print(f"\nDone. Copied: {results['copied']}, Skipped: {results['skipped']}") ```
Sample output when the source folder contains three files and one already exists in the backup:
``` Copied: notes.txt Skipped (already exists): readme.txt Copied: todo.txt
Done. Copied: 2, Skipped: 1 ```
Key points:
Path.glob("*.txt")returns only matching files without manual filtering.mkdir(parents=True, exist_ok=True)safely creates any missing directories.- Checking
dst_file.exists()before writing prevents silent data loss. - Reading and writing with
withensures files are always closed properly.
3. CSV to JSON Summary¶
This mini-project reads a CSV file containing numeric data, validates and cleans the values, computes summary statistics, and saves the result as a pretty-printed JSON file.
Input CSV¶
Create a file called sales.csv:
csv
product,quantity,price
Widget A,10,2.50
Widget B,25,1.75
Widget C,,3.00
Widget D,15,
Widget E,30,4.50
Some rows have missing values. The script handles these gracefully.
Processing script¶
```python import csv import json from pathlib import Path
def load_csv(csv_path): """Load a CSV file and return a list of dictionaries.""" path = Path(csv_path)
if not path.exists():
raise FileNotFoundError(f"CSV file not found: {path}")
with open(path, newline="") as f:
reader = csv.DictReader(f)
return list(reader)
def parse_numeric(value, field_name): """Convert a string to a float, returning None for missing or invalid values.""" if value is None or value.strip() == "": return None
try:
return float(value)
except ValueError:
print(f"Warning: invalid value '{value}' in field '{field_name}'")
return None
def compute_summary(rows, field): """Compute min, max, and average for a numeric field.""" values = []
for row in rows:
num = parse_numeric(row.get(field, ""), field)
if num is not None:
values.append(num)
if not values:
return {"field": field, "count": 0, "min": None, "max": None, "average": None}
return {
"field": field,
"count": len(values),
"min": min(values),
"max": max(values),
"average": round(sum(values) / len(values), 2),
}
def save_summary(summary, output_path): """Save a summary dictionary as pretty-printed JSON.""" path = Path(output_path)
with open(path, "w") as f:
json.dump(summary, f, indent=2)
print(f"Summary saved to {path}")
Main workflow¶
rows = load_csv("sales.csv")
summary = { "source_file": "sales.csv", "total_rows": len(rows), "statistics": [ compute_summary(rows, "quantity"), compute_summary(rows, "price"), ], }
save_summary(summary, "sales_summary.json") ```
Running this script produces sales_summary.json:
json
{
"source_file": "sales.csv",
"total_rows": 5,
"statistics": [
{
"field": "quantity",
"count": 4,
"min": 10.0,
"max": 30.0,
"average": 20.0
},
{
"field": "price",
"count": 4,
"min": 1.75,
"max": 4.5,
"average": 2.94
}
]
}
The script skips rows with missing values rather than crashing. The count field in each summary block shows how many valid values were used, making it clear when data was excluded.
Summary¶
These mini-projects demonstrate how multiple I/O concepts work together in realistic workflows:
pathlibfor portable path construction, directory creation, and file discoverywithstatements for safe, exception-proof file handlingjsonfor reading configuration and writing structured outputcsv.DictReaderfor parsing tabular data into dictionaries- Validation and error handling to catch problems before they propagate
Each project combines several of these techniques, reflecting the way file I/O is used in practice: rarely in isolation, usually as part of a larger data pipeline.
Exercises¶
Exercise 1.
Write a program that creates a file called names.txt containing five names (one per line), then reads the file back and prints each name in uppercase.
Solution to Exercise 1
```python
Write names¶
with open("names.txt", "w") as f: for name in ["Alice", "Bob", "Carol", "Dave", "Eve"]: f.write(name + "\n")
Read and print in uppercase¶
with open("names.txt") as f: for line in f: print(line.strip().upper()) ```
Output:
ALICE
BOB
CAROL
DAVE
EVE
strip() removes the trailing newline from each line before calling upper().
Exercise 2.
Predict the output of the following program. What does the get method do when the key is not found?
```python counts = {} words = ["the", "cat", "sat", "on", "the", "mat", "the"]
for word in words: counts[word] = counts.get(word, 0) + 1
print(counts) ```
Solution to Exercise 2
Output:
{'the': 3, 'cat': 1, 'sat': 1, 'on': 1, 'mat': 1}
counts.get(word, 0) returns the current count for word if it exists, or 0 if the key is not yet in the dictionary. Adding 1 and assigning back increments the count. This is a standard word-counting pattern.
Exercise 3. Write a program that reads a CSV string (not a file) and converts it into a list of dictionaries. Use the first row as headers.
python
csv_text = """name,age,city
Alice,30,Paris
Bob,25,London
Carol,35,Tokyo"""
Solution to Exercise 3
```python csv_text = """name,age,city Alice,30,Paris Bob,25,London Carol,35,Tokyo"""
lines = csv_text.strip().split("\n") headers = lines[0].split(",") records = []
for line in lines[1:]: values = line.strip().split(",") record = {} for h, v in zip(headers, values): record[h] = v records.append(record)
print(records) ```
Output:
[{'name': 'Alice', 'age': '30', 'city': 'Paris'},
{'name': 'Bob', 'age': '25', 'city': 'London'},
{'name': 'Carol', 'age': '35', 'city': 'Tokyo'}]
The first line provides the keys. Each subsequent line is split and zipped with the headers to form a dictionary.
Exercise 4.
Explain why the with statement is preferred over manually calling f.close(). Describe a scenario where forgetting to close a file could cause problems.
Solution to Exercise 4
The with statement is preferred because it guarantees the file is closed when the block exits, even if an exception occurs. For example:
python
with open("data.txt") as f:
data = f.read()
# If an error occurs here, the file is still closed
Without with, you must call f.close() manually:
python
f = open("data.txt")
data = f.read()
f.close() # Forgotten if an exception occurs above
A scenario where this causes problems: if a program writes to a file without closing it, the data may remain in a memory buffer and never be flushed to disk. If the program crashes before close() is called, the file may be empty or incomplete. On some operating systems, an unclosed file also holds a lock that prevents other programs from accessing it.
Exercise 5.
Write a program that uses the json module to save a dictionary to a file and then load it back. Verify that the loaded data is equal to the original.
```python import json
original = {"name": "Alice", "scores": [90, 85, 92], "active": True} ```
Solution to Exercise 5
```python import json
original = {"name": "Alice", "scores": [90, 85, 92], "active": True}
Save to file¶
with open("data.json", "w") as f: json.dump(original, f)
Load from file¶
with open("data.json") as f: loaded = json.load(f)
print(loaded) print(original == loaded) ```
Output:
{'name': 'Alice', 'scores': [90, 85, 92], 'active': True}
True
json.dump serializes the dictionary to JSON format and writes it to the file. json.load reads the file and deserializes the JSON back into a Python dictionary. The == comparison confirms the round-trip preserves the data.