Python Async Requests: Getting URLS Concurrently via HTTP(S) : Emily Rosemary Collins

Python Async Requests: Getting URLS Concurrently via HTTP(S)
by: Emily Rosemary Collins
blow post content copied from  Be on the Right Side of Change
click here to view original post


5/5 - (1 vote)

As a Python developer, you may often deal with making HTTP requests to interact with APIs or to retrieve information from web pages. By default, these requests can be slow and block your program’s execution, making your code less efficient.

This is where Python’s async requests come to the rescue. Asynchronous HTTP requests allow your program to continue executing other tasks while waiting for the slower request operations to complete, improving your code’s overall performance and response time significantly.

The core of this non-blocking approach in Python relies on the asyncio and aiohttp libraries, which provide the necessary tools to perform efficiently and asynchronously. Using these libraries, you can build powerful async HTTP clients to handle multiple requests concurrently without stalling your program’s main thread.

Incorporating Python async requests into your projects can help you tackle complex web scraping scenarios, handling tasks like rate limiting and error recovery.

First Things First: Understanding Asynchronous Requests

Basic Principles of Asynchronous Requests

🐍🐍🐍 Asynchronous requests play a crucial role in improving the efficiency of your code when dealing with network tasks.

When you send an asynchronous request, your program can continue executing other tasks without waiting for the request to complete.

This is possible because of the async/await syntax in Python, which allows you to write asynchronous code more easily. In essence, this keyword pair breaks down asynchronous code into smaller, manageable pieces to provide better readability and maintainability.

Here’s a brief explanation of async and await:

Here’s a simple example showcasing the async/await syntax:

import asyncio

async def example_async_function():
    print("Task is starting")
    await asyncio.sleep(1)
    print("Task is complete")

async def main():
    task = asyncio.create_task(example_async_function())
    await task

asyncio.run(main())

Synchronous vs Asynchronous Requests

When working with network requests, it’s important to understand the difference between synchronous and asynchronous requests.

👉 Synchronous requests involve waiting for the response of each request before proceeding, and it’s a typical way to handle requests in Python. However, this can lead to slower execution times, especially when dealing with numerous requests or slow network responses.

👉 Asynchronous requests allow you to send multiple requests at the same time, without waiting for their individual responses. This means your program can continue with other tasks while the requests are being processed, significantly improving performance in network-intensive scenarios.

Here’s a basic comparison between synchronous and asynchronous requests:

  • Synchronous Requests:
    • Send a request and wait for its response
    • Block the execution of other tasks while waiting
    • Can cause delays if there are many requests or slow network responses
  • Asynchronous Requests:
    • Send multiple requests concurrently
    • Don’t block the execution of other tasks while waiting for responses
    • Improve performance in network-heavy scenarios

For example, the popular requests library in Python handles synchronous requests, while libraries like aiohttp handle asynchronous requests. If you’re working with multiple network requests in your code, it’s highly recommended to implement async/await for optimal efficiency and performance.

Python and Asyncio

Understanding Asyncio

Asyncio is a library introduced in Python 3.4 and has evolved rapidly, especially till Python 3.7. It provides a foundation for writing asynchronous code using the async/await syntax. With asyncio, you can execute concurrent programming in Python, making your code more efficient and responsive.

The library is structured around coroutines, an approach that allows concurrent execution of multiple tasks within an event loop. A coroutine is a specialized version of a Python generator function that can suspend and resume its execution. By leveraging coroutines, you can execute multiple tasks concurrently without threading or multiprocessing.

Asyncio makes use of futures to represent the results of computations that may not have completed yet. Using asyncio’s coroutine function, you can create coroutines that perform asynchronous tasks, like making HTTP requests or handling I/O operations.

Using Asyncio in Python

To utilize asyncio in your Python projects, your code must incorporate the asyncio library. The primary method of executing asynchronous tasks is by using an event loop. In Python 3.7 and later, you can use asyncio.run() to create and manage the event loop for you.

With asyncio, you can declare a function as a coroutine by using the async keyword. To call a coroutine, use the await keyword, which allows the coroutine to yield control back to the event loop and continue with other tasks.

Here’s an example of using asyncio:

import asyncio

async def greet(name, delay):
    await asyncio.sleep(delay)
    print(f"Hello, {name}!")

async def main():
    task1 = asyncio.ensure_future(greet("Alice", 1))
    task2 = asyncio.ensure_future(greet("Bob", 2))

    await task1
    await task2

asyncio.run(main())

In the example above, we created two asyncio tasks and added them to the event loop using asyncio.ensure_future(). When await is encountered, the coroutine is suspended, and the event loop can switch to another task. This continues until all tasks in the event loop are complete.

Now let’s get to the meat. 🥩👇

Using the Requests Library for Synchronous HTTP Requests

The requests library is a popular choice for making HTTP requests in Python. However, it’s primarily designed for synchronous operations, which means it may not be the best choice for handling asynchronous requests.

To make a simple synchronous GET request using the requests library, you would do the following:

import requests

response = requests.get('https://api.example.com/data')
print(response.content)

While the requests library is powerful and easy to use, it doesn’t natively support asynchronous requests. This can be a limitation when you have to make multiple requests concurrently to improve performance and reduce waiting time.

Asynchronous HTTP Requests with HTTPX

HTTPX is a fully featured HTTP client for Python, providing both synchronous and asynchronous APIs. With support for HTTP/1.1 and HTTP/2, it is a modern alternative to the popular Python requests library.

Why Use HTTPX?

HTTPX offers improved efficiency, performance, and additional features compared to other HTTP clients. Its interface is similar to requests, making it easy to switch between the two libraries. Moreover, HTTPX supports asynchronous HTTP requests, allowing your application to perform better in scenarios with numerous concurrent tasks.

HTTPX Asynchronous Requests

To leverage the asynchronous features of HTTPX, you can use the httpx.AsyncClient class. This enables you to make non-blocking HTTP requests using Python’s asyncio library. Asynchronous requests can provide significant performance benefits and enable the use of long-lived network connections, such as WebSockets.

Here is an example to demonstrate how async requests can be made using httpx.AsyncClient:

import httpx
import asyncio

async def fetch(url):
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.text

async def main():
    urls = ['https://www.google.com', 'https://www.example.com']
    tasks = [fetch(url) for url in urls]
    contents = await asyncio.gather(*tasks)
    for content in contents:
        print(content[:1000])  # Print the first 1000 characters of each response

asyncio.run(main())

Here’s a breakdown of the code:

  1. fetch: This asynchronous function fetches the content of a given URL.
  2. main: This asynchronous function initializes the tasks to fetch content from a list of URLs and then gathers the results.
  3. asyncio.run(main()): This runs the main asynchronous function.

The code will fetch the content of the URLs in urls concurrently and print the first 1000 characters of each response. Adjust as needed for your use case!

Managing Sessions and Connections

Session Management in Async Requests

When working with asynchronous requests in Python, you can use sessions to manage connections. The aiohttp.ClientSession class is designed to handle multiple requests and maintain connection pools.

To get started, create an instance of the aiohttp.ClientSession class:

import aiohttp

async with aiohttp.ClientSession() as session:
    # Your asynchronous requests go here

Using the with statement ensures that the session is properly closed when the block is exited. Within the async with block, you can send multiple requests using the same session object. This is beneficial if you are interacting with the same server or service, as it can reuse connections and reduce overhead.

Connection Management with TCPConnector

Besides sessions, one way to manage connections is by using the aiohttp.TCPConnector class. The TCPConnector class helps in controlling the behavior of connections, such as limiting the number of simultaneous connections, setting connection timeouts, and configuring SSL settings.

Here is how you can create a custom TCPConnector and use it with your ClientSession:

import aiohttp

connector = aiohttp.TCPConnector(limit=10, ssl=True)
async with aiohttp.ClientSession(connector=connector) as session:
    # Your asynchronous requests go here

In this example, the TCPConnector is set to limit the number of concurrent connections to 10 and enforce SSL connections to ensure secure communication.

Implementing Concurrency and Threading

Concurrency in Async Requests

Concurrency for efficient and fast execution of your Python programs involves overlapping the execution of multiple tasks, which is especially useful for I/O-bound tasks, where waiting for external resources can slow down your program.

One way to achieve concurrency in Python is by using asyncio. This module, built specifically for asynchronous I/O operations, allows you to use async and await keywords to manage concurrent execution of tasks without the need for threads or processes.

For example, to make multiple HTTP requests concurrently, you can use an asynchronous library like aiohttp. Combined with asyncio, your code might look like this:

import aiohttp
import asyncio

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

async def main():
    urls = ['https://example.com', 'https://another.example.com']
    tasks = [fetch(url) for url in urls]
    responses = await asyncio.gather(*tasks)

asyncio.run(main())

Threading in Async Requests

Another way to implement concurrency in Python is by using threads. Threading is a technique that allows your code to run concurrently by splitting it into multiple lightweight threads of execution. The threading module provides features to create and manage threads easily.

For instance, if you want to use threads to make multiple HTTP requests simultaneously, you can employ the ThreadPoolExecutor from the concurrent.futures module combined with the requests library:

import requests
from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    response = requests.get(url)
    return response.text

def main():
    urls = ['https://example.com', 'https://another.example.com']
    with ThreadPoolExecutor(max_workers=len(urls)) as executor:
        responses = list(executor.map(fetch, urls))

main()

In this example, the ThreadPoolExecutor creates a pool of worker threads that execute the fetch function concurrently. The number of threads is determined by the length of the urls list, ensuring that all requests are handled in parallel.

Working with URLs in Async Requests

When managing and manipulating URLs in async requests, you might need to handle various tasks such as encoding parameters, handling redirects, and constructing URLs properly. Thankfully, Python provides the urllib.parse module for handling URL manipulations.

For instance, you may want to add query parameters to a URL. To do this, you can use the urllib.parse.urlencode function:

from urllib.parse import urlencode, urljoin

base_url = "https://api.example.com/data?"
params = {"key1": "value1", "key2": "value2"}

url = urljoin(base_url, urlencode(params))

After constructing the URL with query parameters, you can pass it to your async request function:

async def main():
    url = urljoin(base_url, urlencode(params))
    data = await fetch_data(url)
    print(data)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

By properly handling URLs and leveraging async requests, you can efficiently fetch data in Python while maintaining a clear and organized code structure.

Handling Errors and Timeouts

Error Handling in Async Requests

When working with asynchronous requests in Python, it’s important to properly handle errors and exceptions that might occur. To do this, you can use the try and except statements. When a request fails or encounters an error, the exception will be caught in the except block, allowing you to handle the error gracefully.

For example, when using the asyncio and aiohttp libraries, you might structure your request and error handling like this:

import asyncio
import aiohttp

async def fetch_url(url):
    try:
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                data = await response.text()
                return data
    except Exception as e:
        print(f"An error occurred while fetching {url}: {str(e)}")
        return None

results = await asyncio.gather(*[fetch_url(url) for url in urls])

In this example, if an exception is encountered during the request, the error message will be printed and the function will return None, allowing your program to continue processing other URLs.

Managing Timeouts in Async Requests

Managing timeouts in async requests is crucial to ensure requests don’t run indefinitely, consuming resources and blocking progress in your program. Setting timeouts can help prevent long waits for unresponsive servers or slow connections.

To set a timeout for your async requests, you can use the asyncio.wait_for() function. This function takes a coroutine object and a timeout value as its arguments and will raise asyncio.TimeoutError if the timeout is reached.

Here’s an example using the asyncio and aiohttp libraries:

import asyncio
import aiohttp

async def fetch_url(url, timeout):
    try:
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                data = await asyncio.wait_for(response.text(), timeout=timeout)
                return data
    except asyncio.TimeoutError:
        print(f"Timeout reached while fetching {url}")
        return None
    except Exception as e:
        print(f"An error occurred while fetching {url}: {str(e)}")
        return None

results = await asyncio.gather(*[fetch_url(url, 5) for url in urls])

In this example, the requests will time out after 5 seconds, and the function will print a message indicating a timeout, then return None. This way, your program can continue processing other URLs after encountering a timeout without getting stuck in an endless wait.

Frequently Asked Questions

How do I send async HTTP requests in Python?

To send asynchronous HTTP requests in Python, you can use a library like aiohttp. This library allows you to make HTTP requests using the async and await keywords, which are built into Python 3.7 and later versions. To start, you’ll need to install aiohttp and then use it to write asynchronous functions for sending HTTP requests.

Which library should I use for asyncio in Python requests?

While the popular Requests library doesn’t support asyncio natively, you can use alternatives like aiohttp or httpx that were designed specifically for asynchronous programming. Both aiohttp and httpx allow you to utilize Python’s asyncio capabilities while providing a simple and familiar API similar to Requests.

What are the differences between aiohttp and requests?

The main differences between aiohttp and Requests lie in their approach to concurrency. aiohttp was built to work with Python’s asyncio library and uses asynchronous programming to allow for concurrent requests. On the other hand, Requests is a regular, synchronous HTTP library, which means it doesn’t inherently support concurrent requests or asynchronous programming.

How can I call multiple APIs asynchronously in Python?

By using an async-enabled HTTP library like aiohttp, you can call multiple APIs asynchronously in your Python code. First, define separate async functions for the API calls you want to make, and then use the asyncio.gather() function to combine and execute these functions concurrently. This allows you to perform several API calls at once, reducing the overall time to process the requests.

What is the use of async with statement in Python?

The async with statement in Python is an asynchronous version of the regular with statement, which is used for managing resources such as file I/O or network connections. In an async context, the async with statement allows you to enter a context manager that expects an asynchronous exit, clean up resources upon exit, and use the await keyword to work with asynchronous operations.

When should I use asynchronous programming in Python?

Asynchronous programming in Python is beneficial when you’re working with I/O-bound tasks, such as network requests, web scraping, or file operations. By using async techniques, you can execute these tasks concurrently, thus reducing the overall execution time and improving performance. However, for CPU-bound tasks, using Python’s built-in multiprocessing module or regular multi-threading might be more suitable.

🐍 Recommended: Python Async Function

The post Python Async Requests: Getting URLS Concurrently via HTTP(S) appeared first on Be on the Right Side of Change.


September 17, 2023 at 07:52PM
Click here for more details...

=============================
The original post is available in Be on the Right Side of Change by Emily Rosemary Collins
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce