What Is the __pycache__ Folder in Python?

What Is the __pycache__ Folder in Python?
by:
blow post content copied from  Real Python
click here to view original post


When you develop a self-contained Python script, you might not notice anything unusual about your directory structure. However, as soon as your project becomes more complex, you’ll often decide to extract parts of the functionality into additional modules or packages. That’s when you may start to see a __pycache__ folder appearing out of nowhere next to your source files in seemingly random places:

project/
│
├── mathematics/
│   │
│   ├── __pycache__/
│   │
│   ├── arithmetic/
│   │   ├── __init__.py
│   │   ├── add.py
│   │   └── sub.py
│   │
│   ├── geometry/
│   │   │
│   │   ├── __pycache__/
│   │   │
│   │   ├── __init__.py
│   │   └── shapes.py
│   │
│   └── __init__.py
│
└── calculator.py

Notice that the __pycache__ folder can be present at different levels in your project’s directory tree when you have multiple subpackages nested in one another. At the same time, other packages or folders with your Python source files may not contain this mysterious cache directory.

You may encounter a similar situation after you clone a remote Git repository with a Python project and run the underlying code. So, what causes the __pycache__ folder to appear, and for what purpose?

Take the Quiz: Test your knowledge with our interactive “What Is the __pycache__ Folder in Python?” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

What Is the __pycache__ Folder in Python?

In this quiz, you'll have the opportunity to test your knowledge of the __pycache__ folder, including when, where, and why Python creates these folders.

In Short: It Makes Importing Python Modules Faster

Even though Python is an interpreted programming language, its interpreter doesn’t operate directly on your Python code, which would be very slow. Instead, when you run a Python script or import a Python module, the interpreter compiles your high-level Python source code into bytecode, which is an intermediate binary representation of the code.

This bytecode enables the interpreter to skip recurring steps, such as lexing and parsing the code into an abstract syntax tree and validating its correctness every time you run the same program. As long as the underlying source code hasn’t changed, Python can reuse the intermediate representation, which is immediately ready for execution. This saves time, speeding up your script’s startup time.

Remember that while loading the compiled bytecode from __pycache__ makes Python modules import faster, it doesn’t affect their execution speed!

Why bother with bytecode at all instead of compiling the code straight to the low-level machine code? While machine code is what executes on the hardware, providing the ultimate performance, it’s not as portable or quick to produce as bytecode.

Machine code is a set of binary instructions understood by your specific CPU architecture, wrapped in a container format like EXE, ELF, or Mach-O, depending on the operating system. In contrast, bytecode provides a platform-independent abstraction layer and is typically quicker to compile.

Python uses local __pycache__ folders to store the compiled bytecode of imported modules in your project. On subsequent runs, the interpreter will try to load precompiled versions of modules from these folders, provided they’re up-to-date with the corresponding source files. Note that this caching mechanism only gets triggered for modules you import in your code rather than executing as scripts in the terminal.

In addition to this on-disk bytecode caching, Python keeps an in-memory cache of modules, which you can access through the sys.modules dictionary. It ensures that when you import the same module multiple times from different places within your program, Python will use the already imported module without needing to reload or recompile it. Both mechanisms work together to reduce the overhead of importing Python modules.

Next, you’re going to find out exactly how much faster Python loads the cached bytecode as opposed to compiling the source code on the fly when you import a module.

How Much Faster Is Loading Modules From Cache?

The caching happens behind the scenes and usually goes unnoticed since Python is quite rapid at compiling the bytecode. Besides, unless you often run short-lived Python scripts, the compilation step remains insignificant when compared to the total execution time. That said, without caching, the overhead associated with bytecode compilation could add up if you had lots of modules and imported them many times over.

To measure the difference in import time between a cached and uncached module, you can pass the -X importtime option to the python command or set the equivalent PYTHONPROFILEIMPORTTIME environment variable. When this option is enabled, Python will display a table summarizing how long it took to import each module, including the cumulative time in case a module depends on other modules.

Suppose you had a calculator.py script that imports and calls a utility function from a local arithmetic.py module:

Python calculator.py
from arithmetic import add

add(3, 4)

The imported module defines a single function:

Python arithmetic.py
def add(a, b):
    return a + b

As you can see, the main script delegates the addition of two numbers, three and four, to the add() function imported from the arithmetic module.

The first time you run your script, Python compiles and saves the bytecode of the module you imported into a local __pycache__ folder. If such a folder doesn’t already exist, then Python automatically creates one before moving on. Now, when you execute your script again, Python should find and load the cached bytecode as long as you didn’t alter the associated source code.

Read the full article at https://realpython.com/python-pycache/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]


May 13, 2024 at 07:30PM
Click here for more details...

=============================
The original post is available in Real Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce