Python Become a MasterChapter 61

Advance Level Exercises

Section 1 of 2-~ 12 min read-Synced from Cuantum content

Exercise 1: File Parsing

Concepts:

  • File I/O
  • Regular expressions

Description: Write a Python script that reads a text file and extracts all URLs that are present in the file. The output should be a list of URLs.

Solution:

import re # Open the file for readingwith open('input_file.txt', 'r') as f:    # Read the file contents    file_contents = f.read()     # Use regular expression to extract URLs    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents) # Print the list of URLsprint(urls)

Exercise 2: Data Analysis

Concepts:

  • File I/O
  • Data manipulation
  • Pandas library

Description: Write a Python script that reads a CSV file containing sales data and calculates the total sales revenue for each product category.

Solution:

import pandas as pd # Read the CSV file into a pandas dataframedf = pd.read_csv('sales_data.csv') # Group the data by product category and sum the sales revenuetotal_revenue = df.groupby('Product Category')['Sales Revenue'].sum() # Print the total revenue for each product categoryprint(total_revenue)

Exercise 3: Web Scraping

Concepts

  • Web scraping
  • Requests library
  • Beautiful Soup library
  • CSV file I/O

Description: Write a Python script that scrapes the title and price of all products listed on an e-commerce website and stores them in a CSV file.

Solution:

import requestsfrom bs4 import BeautifulSoupimport csv # Define the target URLurl = 'https://www.example.com/products' # Headers to mimic a real browser requestheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'} # Make a GET request to the websiteresponse = requests.get(url, headers=headers) # Check if the request was successfulif response.status_code == 200:    # Parse the HTML content using Beautiful Soup    soup = BeautifulSoup(response.content, 'html.parser')     # Find all product titles and prices    titles = [title.get_text(strip=True) for title in soup.find_all('h3', class_='product-title')]    prices = [price.get_text(strip=True) for price in soup.find_all('div', class_='product-price')]     # Zip the titles and prices together    data = list(zip(titles, prices))     # Write the data to a CSV file with headers    with open('product_data.csv', 'w', newline='', encoding='utf-8') as f:        writer = csv.writer(f)        writer.writerow(['Product Title', 'Price'])  # Add headers        writer.writerows(data)     print("Scraping completed. Data saved to 'product_data.csv'.") else:    print(f"Failed to retrieve the webpage. Status code: {response.status_code}")

Exercise 4: Multithreading

Concepts:

  • Multithreading
  • Requests library
  • Threading library

Description: Write a Python script that uses multithreading to download multiple images from a URL list simultaneously.

Solution:

import requestsimport threading # URL list of images to downloadurl_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg'] # Function to download an image from a URLdef download_image(url):    response = requests.get(url)    with open(url.split('/')[-1], 'wb') as f:        f.write(response.content) # Create a thread for each URL and start them all simultaneouslythreads = []for url in url_list:    thread = threading.Thread(target=download_image, args=(url,))    threads.append(thread)    thread.start() # Wait for all threads to finishfor thread in threads:    thread.join()

Exercise 5: Machine Learning

Concepts:

  • Machine learning
  • Scikit-learn library

Description: Write a Python script that trains a machine learning model on a dataset and uses it to predict the output for new data.

Solution:

import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegression # Read the dataset into a pandas dataframedf = pd.read_csv('dataset.csv') # Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42) # Train a linear regression model on the training datamodel = LinearRegression()model.fit(X_train, y_train) # Use the model to predict the output for the testing datay_pred = model.predict(X_test) # Evaluate the model performance using the mean squared error metricmse = ((y_test - y_pred) ** 2).mean()print("Mean squared error:", mse)

In this exercise, we first read a dataset into a pandas dataframe. Then, we split the data into training and testing sets using the traintestsplit function from the sklearn.modelselection module. We trained a linear regression model on the training data using the LinearRegression class from the sklearn.linearmodel module. Finally, we used the trained model to predict the output for the testing data and evaluated the model performance using the mean squared error metric.

Exercise 6: Natural Language Processing

Concepts:

  • Natural Language Processing
  • Sentiment Analysis
  • NLTK library

Description: Write a Python script that reads a text file and performs sentiment analysis on the text using a pre-trained NLP model.

Solution:

import nltkfrom nltk.sentiment.vader import SentimentIntensityAnalyzer # Ensure the VADER lexicon is downloadednltk.download('vader_lexicon') # Read the text file into a stringwith open('input_file.txt', 'r', encoding='utf-8') as f:    text = f.read() # Create a SentimentIntensityAnalyzer objectsid = SentimentIntensityAnalyzer() # Perform sentiment analysis on the textscores = sid.polarity_scores(text) # Print the sentiment scoresprint(scores)

In this exercise, we first read a text file into a string. Then, we create a SentimentIntensityAnalyzer object from the nltk.sentiment.vader module. We use the polarity_scores method of the SentimentIntensityAnalyzer object to perform sentiment analysis on the text and get a dictionary of sentiment scores.

Exercise 7: Web Development

Concepts:

  • Web Development
  • Flask framework
  • File Uploads

Description: Write a Python script that creates a web application using the Flask framework that allows users to upload a file and performs some processing on the file.

Solution:

from flask import Flask, render_template, requestimport os app = Flask(__name__) # Set the path for file uploadsUPLOAD_FOLDER = 'uploads'app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER # Ensure the upload directory existsif not os.path.exists(UPLOAD_FOLDER):    os.makedirs(UPLOAD_FOLDER) # Route for the home page@app.route('/')def index():    return render_template('index.html') # Route for file uploads@app.route('/upload', methods=['POST'])def upload():    if 'file' not in request.files:        return 'No file part', 400     file = request.files['file']     if file.filename == '':        return 'No selected file', 400     # Save the file to the uploads folder    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))     return 'File uploaded successfully' if __name__ == '__main__':    app.run(debug=True)

In this exercise, we first import the Flask module and create a Flask application. We set up a route for the home page that returns an HTML template. We set up a route for file uploads that receives an uploaded file and saves it to a designated uploads folder. We can perform processing on the uploaded file inside the upload function.

Exercise 8: Data Visualization

Concepts:

  • Data Visualization
  • Matplotlib library
  • Candlestick Charts

Description: Write a Python script that reads a CSV file containing stock market data and plots a candlestick chart of the data.

Solution:

import pandas as pdimport matplotlib.pyplot as pltimport mplfinance as mpf # Read the CSV file into a pandas dataframedf = pd.read_csv('stock_data.csv', parse_dates=['Date'])df.set_index('Date', inplace=True)  # Set Date as index # Plot the candlestick chart using mplfinancempf.plot(df, type='candle', style='charles', title='Stock Market Data', ylabel='Price') # Display the chartplt.show()

In this exercise, we first read a CSV file containing stock market data into a pandas dataframe. We convert the date column to Matplotlib dates format and create a figure and axis objects. We plot the candlestick chart using the candlestickohlc function from the mplfinance module. We format the x-axis as dates and set the axis labels and title. Finally, we display the chart using the show function from the matplotlib.pyplot module.

Exercise 9: Machine Learning

Concepts:

  • Machine Learning
  • Scikit-learn library

Description: Write a Python script that reads a dataset containing information about different types of flowers and trains a machine learning model to predict the type of a flower based on its features.

Solution:

import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import accuracy_score # Read the dataset into a pandas dataframedf = pd.read_csv('flower_data.csv') # Check for missing valuesif df.isnull().sum().sum() > 0:    df = df.dropna()  # Drop rows with missing values # Define feature columns and target columnX = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]y = df['species'] # Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Standardize the feature valuesscaler = StandardScaler()X_train = scaler.fit_transform(X_train)X_test = scaler.transform(X_test) # Train a logistic regression model on the training datamodel = LogisticRegression(solver='saga', max_iter=5000)  # Increased iterations & changed solvermodel.fit(X_train, y_train) # Use the model to predict the output for the testing datay_pred = model.predict(X_test) # Evaluate the model performance using the accuracy score metricaccuracy = accuracy_score(y_test, y_pred)print("Accuracy:", accuracy)

In this exercise, we first read a dataset containing information about different types of flowers into a pandas dataframe. We split the data into training and testing sets using the traintestsplit function from the sklearn.modelselection module. We trained a logistic regression model on the training data using the LogisticRegression class from the sklearn.linearmodel module. Finally, we used the trained model to predict the output for the testing data and evaluated the model performance using the accuracy score metric.

Exercise 10: Data Analysis

Concepts:

  • Data Analysis
  • Recommendation Systems
  • Collaborative Filtering
  • Surprise library

Description: Write a Python script that reads a CSV file containing customer purchase data and generates a recommendation system that recommends products to customers based on their purchase history.

Solution:

import pandas as pdfrom surprise import Dataset, Reader, SVD, accuracyfrom surprise.model_selection import train_test_split # Read the CSV file into a pandas dataframedf = pd.read_csv('purchase_data.csv') # Ensure that the dataset has no missing valuesdf = df.dropna(subset=['customer_id', 'product_id', 'rating']) # Convert the pandas dataframe to a Surprise datasetreader = Reader(rating_scale=(1, 5))data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader) # Split the data into training and testing setstrainset, testset = train_test_split(data, test_size=0.2) # Train an SVD model on the training datamodel = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)model.fit(trainset) # Use the model to predict the output for the testing datapredictions = model.test(testset) # Evaluate the model performance using the root mean squared error metricrmse = accuracy.rmse(predictions)print("RMSE:", rmse) # Recommend products to customers based on their purchase historycustomer_ids = df['customer_id'].unique()product_ids = df['product_id'].unique() recommendations = {} for customer_id in customer_ids:    purchased_products = set(df[df['customer_id'] == customer_id]['product_id'].values)    potential_recommendations = []     for product_id in product_ids:        if product_id not in purchased_products:            pred = model.predict(customer_id, product_id)            potential_recommendations.append((product_id, pred.est))     # Sort by predicted rating and take the top 5 recommendations    top_recommendations = sorted(potential_recommendations, key=lambda x: x[1], reverse=True)[:5]    recommendations[customer_id] = top_recommendations # Display recommendationsfor customer, recs in recommendations.items():    print(f"Customer {customer} recommended products: {recs}")

In this exercise, we first read a CSV file containing customer purchase data into a pandas dataframe. We convert the pandas dataframe to a surprise dataset using the Reader and Dataset classes from the surprise module. We split the data into training and testing sets using the traintestsplit function from the surprise.model_selection module. We trained an SVD model on the training data using the SVD class from the surprise module. We used the trained model to predict the output for the testing data and evaluated the model performance using the root mean squared error metric. Finally, we recommended products to customers based on their purchase history using the trained model.

Exercise 11: Computer Vision

Concepts:

  • Computer Vision
  • Object Detection
  • OpenCV library
  • Pre-trained models

Description: Write a Python script that reads an image and performs object detection on the image using a pre-trained object detection model.

Solution:

import cv2import numpy as np # Read the image fileimg = cv2.imread('image.jpg') # Check if the image is loaded correctlyif img is None:    raise FileNotFoundError("Error: Image file not found or unable to load.") # Load the pre-trained object detection modelmodel = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt') # Prepare the input image for the modelblob = cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False)model.setInput(blob) # Perform object detectionoutput = model.forward() # Loop through detected objects and draw bounding boxesh, w, _ = img.shape  # Get image dimensionsfor detection in output[0, 0, :, :]:    confidence = float(detection[2])    if confidence > 0.5:        x1 = int(detection[3] * w)        y1 = int(detection[4] * h)        x2 = int(detection[5] * w)        y2 = int(detection[6] * h)         # Draw bounding box with label and confidence score        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)        label = f'Confidence: {confidence:.2f}'        cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) # Display the image with detectionscv2.imshow('Object Detection', img)cv2.waitKey(0)cv2.destroyAllWindows()

In this exercise, we first read an image file into a NumPy array using the imread function from the cv2 module of OpenCV. We load a pre-trained object detection model using the readNetFromTensorflow function from the cv2.dnn module. We set the input image to the model and perform object detection using the setInput and forward methods of the model object. Finally, we loop through the detected objects and draw bounding boxes around them using the rectangle function from the cv2 module.

Exercise 12: Natural Language Processing

Concepts:

  • Natural Language Processing
  • Topic Modeling
  • Latent Dirichlet Allocation
  • Gensim library

Description: Write a Python script that reads a text file and performs topic modeling on the text using Latent Dirichlet Allocation (LDA).

Solution:

import gensimfrom gensim import corporafrom gensim.models import LdaModel # Read the text file into a list of stringswith open('input_file.txt', 'r') as f:    text = f.readlines() # Remove newlines and convert to lowercasetext = [line.strip().lower() for line in text] # Tokenize the text into wordstokens = [line.split() for line in text] # Create a dictionary of words and their frequencydictionary = corpora.Dictionary(tokens) # Create a bag-of-words representation of the textcorpus = [dictionary.doc2bow(token) for token in tokens] # Train an LDA model on the textmodel = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10) # Print the topics and their associated wordsfor topic in model.print_topics(num_words=5):    print(topic)

In this exercise, we first read a text file into a list of strings. We preprocess the text by removing newlines, converting to lowercase, and tokenizing into words using the split method. We create a dictionary of words and their frequency and create a bag-of-words representation of the text using the doc2bow method of the dictionary object. We train an LDA model on the corpus using the LdaModel class from the gensim.models module. Finally, we print the topics and their associated words using the print_topics method of the model object.

Exercise 13: Web Scraping

Concepts:

  • Web Scraping
  • Beautiful Soup library
  • Requests library
  • CSV file handling

Description: Write a Python script that scrapes a website for product information and saves the information to a CSV file.

Solution:

import requestsfrom bs4 import BeautifulSoupimport csv # Define the URL of the website to scrapeurl = 'https://www.example.com/products' # Add headers to mimic a browser requestheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'} # Send a request to the websiteresponse = requests.get(url, headers=headers) # Check if the request was successfulif response.status_code != 200:    print(f"Failed to fetch data. Status Code: {response.status_code}")    exit() # Parse the HTML content of the response using BeautifulSoupsoup = BeautifulSoup(response.content, 'html.parser') # Find all the product listings on the pagelistings = soup.find_all('div', class_='product-listing') # Write the product information to a CSV filewith open('products.csv', 'w', newline='', encoding='utf-8') as f:    writer = csv.writer(f)    writer.writerow(['Product Name', 'Price', 'Description'])     for listing in listings:        name = listing.find('h3')        price = listing.find('span', class_='price')        description = listing.find('p')         # Extract text safely, handling missing elements        name = name.get_text(strip=True) if name else 'N/A'        price = price.get_text(strip=True) if price else 'N/A'        description = description.get_text(strip=True) if description else 'N/A'         writer.writerow([name, price, description]) print("Scraping completed. Data saved to 'products.csv'.")

In this exercise, we first define the URL of the website to scrape and send a request to the website using the get function from the requests module. We parse the HTML content of the response using Beautiful Soup and find all the product listings on the page using the find_all method. We write the product information to a CSV file using the csv module.

Exercise 14: Big Data Processing

Concepts:

  • Big Data Processing
  • PySpark
  • Data Transformations
  • Aggregation
  • Parquet file format

Description: Write a PySpark script that reads a CSV file containing customer purchase data, performs some data transformations and aggregation, and saves the results to a Parquet file.

Solution:

from pyspark.sql import SparkSession # Create a SparkSession objectspark = SparkSession.builder.appName('customer-purchases').getOrCreate() # Verify if the file exists before reading (optional but useful)import osif not os.path.exists('customer_purchases.csv'):    raise FileNotFoundError("Error: The file 'customer_purchases.csv' does not exist.") # Read the CSV file into a Spark DataFramedf = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True) # Perform some data transformationsdf = df.filter((df['purchase_date'] >= '2020-01-01') & (df['purchase_date'] <= '2020-12-31'))df = df.select('customer_id', 'product_id', 'price') # Group by customer and calculate total spendingdf = df.groupBy('customer_id').sum('price').withColumnRenamed('sum(price)', 'total_spent') # Save the results to a Parquet filedf.write.mode('overwrite').parquet('customer_spending.parquet') print("Processing completed. Data saved to 'customer_spending.parquet'.") 

In this exercise, we first create a SparkSession object using the SparkSession class from the pyspark.sql module. We read a CSV file containing customer purchase data into a Spark DataFrame using the read.csv method. We perform some data transformations on the DataFrame using the filter, select, and groupBy methods. Finally, we save the results to a Parquet file using the write.parquet method.

Exercise 15: DevOps

Concepts:

  • DevOps
  • Fabric library

Description: Write a Python script that automates the deployment of a web application to a remote server using the Fabric library.

Solution:

from fabric import Connectionimport getpass # Define the host and user credentials for the remote serverhost = 'example.com'user = 'user'password = getpass.getpass("Enter SSH password: ")  # Secure password entry # Define the path to the web application on the local machine and the remote serverlocal_path = '/path/to/local/app'remote_path = '/path/to/remote/app' # Create a connection to the remote serverc = Connection(host=host, user=user, connect_kwargs={'password': password}) # Ensure the remote directory existsc.run(f'mkdir -p {remote_path}') # Upload the local files to the remote serverc.put(local_path, remote_path, recursive=True)  # Enables recursive copy # Change to the application directorywith c.cd(remote_path):    # Install required dependencies    c.run('sudo apt-get update && sudo apt-get install -y python3-pip')    c.run('pip3 install -r requirements.txt')     # Start the web application in the background    c.run('nohup python3 app.py > app.log 2>&1 &', pty=False) print("Deployment completed successfully.")

In this exercise, we first define the host and user credentials for the remote server. We define the path to the web application on the local machine and the remote server. We create a connection to the remote server using the Connection class from the fabric module. We upload the local files to the remote server using the put method of the connection object. We install any required dependencies on the remote server using the run method of the connection object. Finally, we start the web application on the remote server using the run method.

Exercise 16: Reinforcement Learning

Concepts:

  • Reinforcement Learning
  • Q-Learning
  • OpenAI Gym library

Description: Write a Python script that implements a reinforcement learning algorithm to teach an agent to play a simple game.

Solution:

import gymimport numpy as npimport time # Create the FrozenLake environmentenv = gym.make("FrozenLake-v1", is_slippery=True) # Initialize the Q-tableQ = np.zeros([env.observation_space.n, env.action_space.n]) # Set hyperparametersalpha = 0.8  # Learning rategamma = 0.95  # Discount factorepsilon = 0.1  # Exploration probabilitynum_episodes = 2000  # Training episodes # Train the agent using Q-learningfor episode in range(num_episodes):    state, _ = env.reset()    done = False     while not done:        # Choose action using epsilon-greedy policy        if np.random.uniform() < epsilon:            action = env.action_space.sample()  # Random action (exploration)        else:            action = np.argmax(Q[state, :])  # Best action from Q-table         # Take the action and observe the next state        next_state, reward, done, _, _ = env.step(action)         # Update Q-value using the Bellman equation        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))         # Move to the next state        state = next_state # Test the agent by playing the gamestate, _ = env.reset()done = Falseprint("\nTesting trained agent:\n") while not done:    action = np.argmax(Q[state, :])    next_state, reward, done, _, _ = env.step(action)        # Render the environment    env.render()    time.sleep(0.5)  # Pause for visibility        state = next_state print("\nGame Over!")

In this exercise, we first create an OpenAI Gym environment for the game using the make function from the gym module. We define the Q-table for the agent as a NumPy array and set the hyperparameters for the Q-learning algorithm. We train the agent using the Q-learning algorithm by looping through a specified number of episodes and updating the Q-table based on the rewards and next states. Finally, we test the agent by playing the game using the Q-table and visualizing the game using the render method.

Exercise 17: Time Series Analysis

Concepts:

  • Time Series Analysis
  • Data Preprocessing
  • Data Visualization
  • ARIMA model
  • Statsmodels library

Description: Write a Python script that reads a CSV file containing time series data, performs some data preprocessing and visualization, and fits a time series model to the data.

Solution:

import pandas as pdimport matplotlib.pyplot as pltimport statsmodels.api as sm # Read the CSV file into a pandas dataframedf = pd.read_csv('time_series.csv') # Convert the date column to a datetime object and set it as the indexdf['date'] = pd.to_datetime(df['date'])df.set_index('date', inplace=True) # Check for missing values before resamplingif df.isnull().values.any():    df = df.fillna(method='ffill') # Ensure the column name is correcttarget_col = df.columns[0]  # Assuming first column is the time series value # Resample the data to a monthly frequencydf = df.resample('M').mean() # Plot the time series dataplt.figure(figsize=(10, 5))plt.plot(df.index, df[target_col], label="Time Series")plt.xlabel("Date")plt.ylabel("Value")plt.title("Time Series Visualization")plt.legend()plt.grid()plt.show() # Fit an ARIMA modelmodel = sm.tsa.ARIMA(df[target_col].dropna(), order=(1, 1, 1))  # Use dropna() to avoid errorsresults = model.fit() # Print the model summaryprint(results.summary()) 

In this exercise, we first read a CSV file containing time series data into a pandas dataframe. We convert the date column to a datetime object and set it as the index. We resample the data to a monthly frequency and fill any missing values using forward fill. We visualize the data using the plot function from the matplotlib.pyplot module. Finally, we fit an ARIMA model to the data using the ARIMAfunction from the statsmodels.api module and print the summary of the model using the summary method of the results object.

Exercise 18: Computer Networking

Concepts:

  • Computer Networking
  • TCP/IP Protocol
  • Socket Programming

Description: Write a Python script that implements a simple TCP server that accepts client connections and sends and receives data.

Solution:

import socket # Define the host and port for the serverhost = 'localhost'port = 12345 # Create a socket objects = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Bind the socket to the host and ports.bind((host, port)) # Listen for incoming connectionss.listen(1)print('Server listening on', host, port) # Accept a client connectionconn, addr = s.accept()print('Connected by', addr) # Send data to the clientconn.sendall(b'Hello, client!') # Receive data from the clientdata = conn.recv(1024)print('Received:', data.decode()) # Close the connectionconn.close()

In this exercise, we first define the host and port for the server. We create a socket object using the socket function from the socket module and bind the socket to the host and port using the bind method. We listen for incoming connections using the listen method and accept a client connection using the accept method, which returns a connection object and the address of the client. We send data to the client using the sendall method of the connection object and receive data from the client using the recv method. Finally, we close the connection using the close method.

Exercise 19: Data Analysis and Visualization

Concepts:

  • Data Analysis
  • Data Visualization
  • PDF Report Generation
  • Pandas library
  • Matplotlib library
  • ReportLab library

Description: Write a Python script that reads a CSV file containing sales data for a retail store, performs some data analysis and visualization, and saves the results to a PDF report.

Solution:

import pandas as pdimport matplotlib.pyplot as pltfrom reportlab.lib.pagesizes import letterfrom reportlab.pdfgen import canvasimport os # Read the CSV file into a pandas dataframedf = pd.read_csv('sales_data.csv') # Calculate the total sales by category and monthtotals = df.groupby(['category', 'month'])['sales'].sum() # Get unique categoriescategories = df['category'].unique() # Create subplots dynamically based on the number of categoriesfig, axes = plt.subplots(nrows=len(categories), ncols=1, figsize=(8.5, 11)) # Ensure `axes` is always iterable (even if there's only one category)if len(categories) == 1:    axes = [axes] # Plot total sales by category and monthfor i, category in enumerate(categories):    totals.loc[category].plot(ax=axes[i], kind='bar', title=f"Category: {category}")    axes[i].set_ylabel("Sales") plt.tight_layout()plt.savefig('sales_plot.png')  # Save the figureplt.close(fig)  # Close to free memory # Create a PDF reportpdf_filename = 'sales_report.pdf'c = canvas.Canvas(pdf_filename, pagesize=letter) # Add title and descriptionc.setFont("Helvetica-Bold", 16)c.drawString(50, 750, 'Sales Report') c.setFont("Helvetica", 12)c.drawString(50, 730, 'Total Sales by Category and Month') # Add the image to the PDF if it existsif os.path.exists('sales_plot.png'):    c.drawImage('sales_plot.png', 50, 450, width=500, height=300) # Save and close the PDFc.showPage()c.save() print(f"Report saved as {pdf_filename}")

In this exercise, we first read a CSV file containing sales data for a retail store into a pandas dataframe. We calculate the total sales by category and month using the groupby and sum methods. We plot the total sales by category and month using the plot function from the matplotlib.pyplot module and save the plot to a PNG file. Finally, we generate a PDF report using the Canvas and Image functions from the reportlab module.

Exercise 20: Machine Learning

Concepts:

  • Machine Learning
  • Convolutional Neural Networks
  • Keras library
  • MNIST dataset

Description: Write a Python script that trains a machine learning model to classify images of handwritten digits from the MNIST dataset.

Solution:

import tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layers # Load the MNIST dataset(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() # Normalize the pixel values and reshape the datax_train = x_train.astype('float32') / 255.0x_test = x_test.astype('float32') / 255.0x_train = x_train.reshape(-1, 28, 28, 1)x_test = x_test.reshape(-1, 28, 28, 1) # Define the CNN modelmodel = keras.Sequential([    layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),    layers.MaxPooling2D((2, 2)),     layers.Conv2D(64, (3, 3), activation='relu', padding='same'),    layers.MaxPooling2D((2, 2)),     layers.Conv2D(128, (3, 3), activation='relu', padding='same'),    layers.MaxPooling2D((2, 2)),     layers.Flatten(),    layers.Dense(128, activation='relu'),  # Added a fully connected layer    layers.Dropout(0.5),  # Prevent overfitting    layers.Dense(10, activation='softmax')]) # Compile the modelmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Train the modelmodel.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test), batch_size=64) # Evaluate the model on the test datatest_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)print('Test accuracy:', test_acc)

In this exercise, we first load the MNIST dataset using the load_data function from the keras.datasets.mnist module. We normalize the pixel values and reshape the data using NumPy. We define a convolutional neural network model using the Sequential class and various layers from the layers module of Keras. We compile the model using the compile method with the Adam optimizer and sparse categorical crossentropy loss function. We train the model using the fit method and evaluate the model on the test data using the evaluate method.

Exercise 21: Natural Language Processing

Concepts:

  • Natural Language Processing
  • Text Preprocessing
  • Text Representation
  • Topic Modeling
  • Latent Dirichlet Allocation
  • Gensim library

Description: Write a Python script that uses natural language processing techniques to analyze a corpus of text data and extract useful insights.

Solution:

import gensimfrom gensim import corporafrom gensim.models import LdaModelimport pandas as pdimport nltkfrom nltk.corpus import stopwordsfrom nltk.tokenize import word_tokenize # Download required resourcesnltk.download('stopwords')nltk.download('punkt') # Read the text data into a pandas dataframedf = pd.read_csv('text_data.csv') # Handle missing valuesdf['text'] = df['text'].fillna('') # Define stop words and clean textstop_words = set(stopwords.words('english')) def preprocess_text(text):    tokens = word_tokenize(text.lower())  # Tokenization & lowercasing    return [word for word in tokens if word.isalnum() and word not in stop_words]  # Remove punctuation & stopwords df['cleaned_text'] = df['text'].apply(preprocess_text) # Create a document-term matrixtexts = df['cleaned_text'].tolist()dictionary = corpora.Dictionary(texts)corpus = [dictionary.doc2bow(text) for text in texts] # Train LDA modelnum_topics = 5lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10) # Print topics and top words for eachfor topic_id, words in lda_model.show_topics(num_topics=num_topics, formatted=False):    print(f'Topic {topic_id}:', ', '.join(word for word, _ in words)) # Convert topic distributions into a structured DataFrametopic_dists = [{f"Topic_{topic}": prob for topic, prob in lda_model.get_document_topics(doc, minimum_probability=0)} for doc in corpus]topic_df = pd.DataFrame(topic_dists) # Merge topic distributions with original datadf = pd.concat([df, topic_df], axis=1) # Save the resultsdf.to_csv('text_data_topics.csv', index=False)print("Saved processed data to 'text_data_topics.csv'.") 

In this exercise, we first read a corpus of text data into a pandas dataframe. We define the stop words using the stopwords function from the nltk.corpus module and remove them from the text data using list comprehension and apply method of pandas. We create a document-term matrix from the text data using the Dictionary and corpus functions from the gensim module. We perform topic modeling using latent Dirichlet allocation (LDA) using the LdaModel function and extract the topic distributions for each document. Finally, we save the results to a CSV file using the to_csv method of pandas.

Exercise 22: Web Scraping

Concepts:

  • Web Scraping
  • HTML Parsing
  • BeautifulSoup library
  • CSV File I/O

Description: Write a Python script that scrapes data from a website using the BeautifulSoup library and saves it to a CSV file.

Solution:

import requestsfrom bs4 import BeautifulSoupimport csv # Define the URL to scrapeurl = 'https://www.example.com' # Headers to mimic a real browser requestheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'} # Send a GET requestresponse = requests.get(url, headers=headers) # Check if the request was successfulif response.status_code != 200:    print(f"Error: Unable to fetch data (Status Code: {response.status_code})")    exit() # Parse the HTML contentsoup = BeautifulSoup(response.content, 'html.parser') # Extract datadata = []for item in soup.find_all('div', class_='item'):    name_tag = item.find('h3')    price_tag = item.find('span', class_='price')     # Extract text safely, handling missing elements    name = name_tag.get_text(strip=True) if name_tag else 'N/A'    price = price_tag.get_text(strip=True) if price_tag else 'N/A'     data.append([name, price]) # Save to CSVcsv_filename = 'data.csv'with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:    writer = csv.writer(csvfile)    writer.writerow(['Name', 'Price'])  # Add headers    writer.writerows(data) print(f"Scraping completed. Data saved to '{csv_filename}'.") 

In this exercise, we first define the URL to scrape using the requests library and parse the HTML content using the BeautifulSoup library. We extract the data from the HTML content using the find_all and find methods of the soup object. Finally, we save the data to a CSV file using the csv module.

Exercise 23: Database Interaction

Concepts:

  • Database Interaction
  • SQLite database
  • SQL queries
  • SQLite3 module

Description: Write a Python script that interacts with a database to retrieve and manipulate data.

Solution:

import sqlite3 # Connect to the databaseconn = sqlite3.connect('example.db') # Create a cursor objectc = conn.cursor() # Execute an SQL query to create a tablec.execute('''CREATE TABLE IF NOT EXISTS customers             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''') # Execute an SQL query to insert data into the tablec.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')") # Execute an SQL query to retrieve data from the tablec.execute("SELECT * FROM customers")rows = c.fetchall()for row in rows:    print(row) # Execute an SQL query to update data in the tablec.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'") # Execute an SQL query to delete data from the tablec.execute("DELETE FROM customers WHERE name='John Smith'") # Commit the changes to the databaseconn.commit() # Close the database connectionconn.close()

In this exercise, we first connect to an SQLite database using the connect function from the sqlite3 module. We create a cursor object using the cursor method of the connection object and execute SQL queries using the execute method of the cursor object. We retrieve data from the table using the fetchall method and print the results. We update data in the table using the UPDATE statement and delete data from the table using the DELETE statement. Finally, we commit the changes to the database and close the connection.

Exercise 24: Parallel Processing

Concepts:

  • Parallel Processing
  • Multiprocessing
  • Process Pool
  • CPU-bound tasks

Description: Write a Python script that performs a time-consuming computation using parallel processing to speed up the computation.

Solution:

import timeimport multiprocessing # Define an optimized CPU-bound functiondef compute(num):    return num * (num - 1) // 2  # Uses O(1) formula instead of a loop if __name__ == '__main__':    # Create a process pool with the number of CPUs available    num_cpus = multiprocessing.cpu_count()    pool = multiprocessing.Pool(num_cpus)     # Generate a list of numbers to compute    num_list = [10000000] * num_cpus     # Compute the results using parallel processing    start_time = time.time()    results = pool.map(compute, num_list)     # Close the pool properly    pool.close()    pool.join()     end_time = time.time()     # Print the results and computation time    print('Results:', results)    print('Computation time:', end_time - start_time, 'seconds')

In this exercise, we first define a CPU-bound function that takes a long time to compute. We then create a process pool using the Pool function from the multiprocessing module with the number of CPUs available. We generate a list of numbers to compute and compute the results using the map method of the process pool. Finally, we print the results and computation time.

Exercise 25: Image Processing

Concepts:

  • Image Processing
  • Pillow library
  • Image Manipulation
  • Image Filtering

Description: Write a Python script that performs basic image processing operations on an image file.

Solution:

from PIL import Image, ImageFilterimport os # Define image pathsinput_path = 'example.jpg'output_path = 'processed.jpg' # Check if the input file existsif not os.path.exists(input_path):    raise FileNotFoundError(f"Error: The file '{input_path}' was not found.") try:    # Open the image file using a context manager    with Image.open(input_path) as image:        # Display the original image (optional, may not work in all environments)        image.show()         # Resize the image        image = image.resize((500, 500))         # Convert the image to grayscale        image = image.convert('L')         # Apply a Gaussian blur filter        image = image.filter(ImageFilter.GaussianBlur(radius=2))         # Save the processed image to a file        image.save(output_path)         # Display the processed image        image.show()        print(f"Processed image saved as '{output_path}'.") except Exception as e:    print(f"An error occurred: {e}") 

In this exercise, we first open an image file using the Image class from the Pillow library. We resize the image using the resize method and convert it to grayscale using the convert method with the 'L' mode. We apply a Gaussian blur filter using the filter method with the GaussianBlur class from the ImageFilter module. Finally, we save the processed image to a file using the save method and display it using the show method.

I hope you find these exercises useful! Let me know if you have any further questions.