Python News Roundup: July 2024 :

Python News Roundup: July 2024
blow post content copied from  Real Python
click here to view original post

Summer isn’t all holidays and lazy days at the beach. Over the last month, two important players in the data science ecosystem released new major versions. NumPy published version 2.0, which comes with several improvements but also some breaking changes. At the same time, Polars reached its version 1.0 milestone and is now considered production-ready.

PyCon US was hosted in Pittsburgh, Pennsylvania in May. The conference is an important meeting spot for the community and sparked some new ideas and discussions. You can read about some of these in PSF’s coverage of the Python Language Summit, and watch some of the videos posted from the conference.

Dive in to learn more about the most important Python news from the last month.

NumPy Version 2.0

NumPy is a foundational package in the data science space. The library provides in-memory N-dimensional arrays and many functions for fast operations on those arrays.

Many libraries in the ecosystem use NumPy under the hood, including pandas, SciPy, and scikit-learn. The NumPy package has been around for close to twenty years and has played an important role in the rising popularity of Python among data scientists.

The new version 2.0 of NumPy is an important milestone, which adds an improved string type, cleans up the library, and improves performance. However, it comes with some changes that may affect your code.

The biggest breaking changes happen in the C-API of NumPy. Typically, this won’t affect you directly, but it can affect other libraries that you rely on. The community has rallied strongly and most of the bigger packages already support NumPy 2.0. You can check NumPy’s table of ecosystem support for details.

One of the main reasons for using NumPy is that the library can do fast and convenient array operations. For a simple example, the following code calculates square numbers:

>>> numbers = range(10)
>>> [number**2 for number in numbers]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

>>> import numpy as np
>>> numbers = np.arange(10)
>>> numbers**2
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

First, you use range() and a list comprehension to calculate the first ten square numbers in pure Python. Then, you repeat the calculation with NumPy. Note that you don’t need to explicitly spell out the loop. NumPy handles that for you under the hood.

Furthermore, the NumPy version will be considerably faster, especially for bigger arrays of numbers. One of the secrets to this speed is that NumPy arrays are limited to having one data type, while a Python list can be heterogeneous. One list can contain elements as different as integers, floats, strings, and even nested lists. That’s not possible in a NumPy array.

Improved String Handling

By enforcing all elements to be of the same type that take up the same number of bytes in memory, NumPy can quickly find and work with individual elements. One downside to this has been that strings can be awkward to work with:

>>> words = np.array(["numpy", "python"])
>>> words
array(['numpy', 'python'], dtype='<U6')

>>> words[1] = "monty python"
>>> words
array(['numpy', 'monty '], dtype='<U6')

You first create an array consisting of two strings. Note that NumPy automatically detects that the longest string is six characters long, so it sets aside space for each string to be six characters long. The 6 in the data type string, <U6, indicates this.

Next, you try to replace the second string with a longer string. Unfortunately, only the first six characters are stored since that’s how much space NumPy has set aside for each string in this array. There are ways to work around these limitations, but in NumPy 2.0, you can take advantage of variable length strings instead:

Read the full article at »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

July 08, 2024 at 07:30PM
Click here for more details...

The original post is available in Real Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.