Path Handling (pathlib)¶
Python provides the pathlib module for working with file system paths. It offers a modern, object-oriented interface that eliminates the most common mistakes programmers make when handling paths: hard-coded separators, fragile string concatenation, and scripts that break when run from a different working directory.
flowchart TD
A[pathlib.Path]
A --> B[file path operations]
A --> C[filesystem queries]
````
!!! tip "Mental Model"
`pathlib.Path` treats file paths as objects, not raw strings. Instead of gluing strings together with `/` or `\\`, you build paths with the `/` operator and ask the Path object questions like "do you exist?" or "what is your extension?" This eliminates platform-specific bugs and makes path manipulation readable and safe.
---
## 1. Creating Paths
```python
from pathlib import Path
p = Path("data.txt")
A relative path like Path("data.txt") is resolved against the current
working directory at the time of use. This is fine in interactive sessions,
but scripts are often launched from a different directory than the one
containing the script file.
2. Script-Relative Paths¶
A common bug: a script opens data.txt assuming it sits next to the script,
but the user runs it from another directory and gets FileNotFoundError.
```python
Wrong -- depends on where the user runs the script¶
p = Path("data.txt")
Right -- always relative to the script's own directory¶
script_dir = Path(file).parent p = script_dir / "data.txt" ```
__file__ is a built-in variable that holds the path to the currently
executing script. Path(__file__).parent gives the directory that contains
the script, regardless of where the user launched it from.
3. Absolute Paths with .resolve()¶
.resolve() converts any path -- relative or containing .. segments --
into an absolute, canonical path.
python
p = Path("data") / ".." / "data" / "file.txt"
print(p) # data/../data/file.txt
print(p.resolve()) # /home/alice/project/data/file.txt
This is useful for logging, error messages, and comparing two paths that might refer to the same file through different relative routes.
4. Checking File Existence¶
python
p.exists()
p.is_file()
p.is_dir()
5. Joining Paths¶
python
p = Path("data") / "file.txt"
The / operator replaces fragile string concatenation like
"data" + "/" + "file.txt", which breaks on Windows where the separator
is \.
6. Creating Directories¶
python
output = Path("results") / "2024" / "january"
output.mkdir(parents=True, exist_ok=True)
parents=True creates intermediate directories (like mkdir -p in the
shell). exist_ok=True silences the error if the directory already exists.
7. Finding Files with .glob() and .rglob()¶
.glob() searches a single directory for files matching a pattern.
.rglob() searches recursively through all subdirectories.
```python data = Path("data")
All .csv files in data/¶
for csv_file in data.glob("*.csv"): print(csv_file)
All .csv files in data/ and every subdirectory¶
for csv_file in data.rglob("*.csv"): print(csv_file) ```
Common patterns:
| Pattern | Matches |
|---|---|
"*.txt" |
all .txt files in the directory |
"**/*.txt" |
all .txt files recursively (same as .rglob("*.txt")) |
"data_*" |
files whose names start with data_ |
8. Reading and Writing¶
```python p = Path("hello.txt")
p.write_text("Hello") print(p.read_text()) ```
9. Iterating Through Directories¶
```python p = Path("data")
for item in p.iterdir(): print(item) ```
10. Advantages of pathlib¶
Each advantage below addresses a real failure case.
Problem 1 -- Wrong working directory.
A script opens "config.json" assuming the user runs it from the project
root. When someone runs it from their home directory, the file is not found.
```python
Fragile: depends on working directory¶
config = open("config.json")
Robust: always relative to the script¶
config_path = Path(file).parent / "config.json" config = open(config_path) ```
Problem 2 -- Hard-coded separators.
A Windows developer writes "data\\results\\output.csv". The path is
meaningless on macOS or Linux.
```python
Breaks on other platforms¶
path = "data\results\output.csv"
Works everywhere¶
path = Path("data") / "results" / "output.csv" ```
Problem 3 -- Fragile string concatenation.
Building paths with + is error-prone: a missing / silently produces
the wrong path.
```python
Bug: "datafile.txt" instead of "data/file.txt"¶
path = "data" + "file.txt"
Correct by construction¶
path = Path("data") / "file.txt" ```
11. Summary¶
Key ideas:
Path(__file__).parentanchors paths to the script's location.resolve()produces an absolute, canonical path.mkdir(parents=True, exist_ok=True)safely creates directory trees.glob()and.rglob()find files by pattern- the
/operator replaces string concatenation and hard-coded separators - methods like
.read_text(),.write_text(),.exists(), and.is_file()keep path operations on a single object
Exercises¶
Exercise 1.
pathlib.Path uses the / operator for path joining. Explain why this works:
```python from pathlib import Path
p = Path("data") / "subfolder" / "file.txt" print(p) print(type(p)) ```
What Python mechanism allows / to join paths? Why is this preferred over string concatenation ("data" + "/" + "subfolder" + "/" + "file.txt")?
Solution to Exercise 1
Output (on a Unix-like system):
text
data/subfolder/file.txt
<class 'PosixPath'>
The / operator works because Path defines the __truediv__ method, which Python calls when / is used. This is operator overloading -- the same mechanism that lets + mean addition for numbers and concatenation for strings.
Path("/") is preferred over string concatenation because:
- Platform independence:
Pathuses the correct separator (/on Unix,\on Windows) automatically. - Type safety: the result is a
Pathobject with path-specific methods, not a raw string. - Readability:
Path("data") / "file.txt"is cleaner thanos.path.join("data", "file.txt"). - Validation:
Pathunderstands path semantics (e.g., resolving.and..).
Exercise 2.
Path objects provide methods that combine multiple os module calls. Compare:
```python import os from pathlib import Path
os module approach¶
if os.path.exists("data.txt") and os.path.isfile("data.txt"): size = os.path.getsize("data.txt")
pathlib approach¶
p = Path("data.txt") if p.exists() and p.is_file(): size = p.stat().st_size ```
What advantage does the object-oriented pathlib approach have? Why does Path have methods like .read_text() and .write_text() when open() already exists?
Solution to Exercise 2
The object-oriented approach groups all path-related operations on a single object. Instead of calling free functions from os.path and passing the path string each time, you call methods on the Path object.
Advantages:
- Discoverability: all path operations are methods on the
Pathobject, so IDE autocompletion shows available operations. - Chaining:
Path("dir").mkdir(parents=True, exist_ok=True)reads naturally. - Consistency: one object carries the path state, reducing the chance of passing the wrong string to a function.
.read_text() and .write_text() exist for convenience:
```python
Instead of three lines:¶
with open("data.txt") as f: content = f.read()
One line:¶
content = Path("data.txt").read_text() ```
They are shorthand for the common pattern of "open, read/write, close" when you want the entire file content. For line-by-line processing or binary data, open() with with is still needed.
Exercise 3. A programmer writes platform-specific path code:
python
path = "C:\\Users\\alice\\data\\file.txt" # Windows only
Explain why this fails on macOS/Linux. Show how pathlib.Path solves the cross-platform problem. What does Path.home() return, and why is it useful for writing portable code?
Solution to Exercise 3
"C:\\Users\\alice\\data\\file.txt" is a Windows-specific path. On macOS/Linux, this is treated as a single directory name containing backslashes, not as a directory hierarchy. The file would not be found.
pathlib.Path solves this:
python
from pathlib import Path
p = Path.home() / "data" / "file.txt"
Path.home() returns the current user's home directory as a Path object:
- Windows:
C:\Users\alice - macOS:
/Users/alice - Linux:
/home/alice
The / operator joins paths using the platform-appropriate separator. This code works identically on all platforms, eliminating hardcoded paths and OS-specific separators.