Python Become a MasterChapter 62

Advance Level Exercises

Section 2 of 2-~ 12 min read-Synced from Cuantum content

Exercise 26: Machine Learning

Concepts:

Machine Learning

Scikit-Learn library

Data Preprocessing

Feature Engineering

Model Training

Model Evaluation

Description: Write a Python script that uses machine learning techniques to train a model and make predictions on new data.

Solution:

import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score # Read the data into a pandas dataframedf = pd.read_csv('data.csv') # Check for missing valuesif df.isnull().sum().sum() > 0:    print("Warning: Missing values detected. Filling with mean values.")    df = df.fillna(df.mean())  # Alternatively, df.dropna() to remove rows with NaN values # Ensure target column existsif 'target' not in df.columns:    raise ValueError("Error: 'target' column not found in dataset.") # Split the data into features and labelsX = df.drop(columns=['target'])y = df['target'] # Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y) # Scale the data using standardizationscaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test) # Train a logistic regression model with class balancingmodel = LogisticRegression(random_state=42, class_weight='balanced')model.fit(X_train_scaled, y_train) # Make predictions on the test sety_pred = model.predict(X_test_scaled) # Evaluate the model performanceaccuracy = accuracy_score(y_test, y_pred)precision = precision_score(y_test, y_pred, average='weighted')  # Supports multi-classrecall = recall_score(y_test, y_pred, average='weighted')f1 = f1_score(y_test, y_pred, average='weighted') # Print evaluation metricsprint('Accuracy:', round(accuracy, 4))print('Precision:', round(precision, 4))print('Recall:', round(recall, 4))print('F1 score:', round(f1, 4))

In this exercise, we first read a dataset into a pandas dataframe. We split the data into training and testing sets using the traintestsplit function from the sklearn.modelselection module. We scale the data using standardization using the StandardScaler class from the sklearn.preprocessing module. We train a logistic regression model using the LogisticRegression class from the sklearn.linearmodel module and make predictions on the test set. Finally, we evaluate the performance of the model using metrics such as accuracy, precision, recall, and F1 score using the appropriate functions from the sklearn.metrics module.

Exercise 27: Web Development

Concepts:

Web Development

Flask framework

HTML templates

Routing

HTTP methods

Form handling

Description: Write a Python script that creates a web application using the Flask framework.

Solution:

from flask import Flask, render_template, request app = Flask(__name__) # Define a route for the home page@app.route('/')def home():    return render_template('home.html') # Define a route for the contact page@app.route('/contact', methods=['GET', 'POST'])def contact():    if request.method == 'POST':        name = request.form['name']        email = request.form['email']        message = request.form['message']        # TODO: Process the form data        return 'Thanks for contacting us!'    else:        return render_template('contact.html') if __name__ == '__main__':    app.run(debug=True)

In this exercise, we first import the Flask class from the flask module and create a new Flask application. We define routes for the home page and contact page using the route decorator. We use the render_template function to render HTML templates for the home page and contact page. We handle form submissions on the contact page using the request object and the POST method. Finally, we start the Flask application using the run method.

Exercise 28: Data Streaming

Concepts:

Data Streaming

Kafka

PyKafka library

Stream Processing

Description: Write a Python script that streams data from a source and processes it in real-time.

Solution:

from pykafka import KafkaClientimport json # Kafka broker configurationKAFKA_BROKER = 'localhost:9092'TOPIC_NAME = 'test' try:    # Connect to Kafka broker    client = KafkaClient(hosts=KAFKA_BROKER)     # Get a reference to the topic    topic = client.topics[TOPIC_NAME]     # Create a consumer    consumer = topic.get_simple_consumer()     print(f"Connected to Kafka broker at {KAFKA_BROKER}, consuming messages from topic '{TOPIC_NAME}'...")     # Process messages in real-time    for message in consumer:        if message is not None:            try:                data = json.loads(message.value.decode('utf-8'))  # Decode & parse JSON safely                print("Received message:", data)                 # TODO: Process the data in real-time             except json.JSONDecodeError as e:                print(f"Error decoding JSON: {e} - Raw message: {message.value}") except Exception as e:    print(f"Kafka connection error: {e}") finally:    if 'consumer' in locals():        consumer.stop()  # Ensure consumer is properly stopped        print("Kafka consumer stopped.")

In this exercise, we first connect to a Kafka broker using the KafkaClient class from the pykafka library. We get a reference to a topic and create a consumer for the topic using the getsimpleconsumer method. We process messages in real-time using a loop and the value attribute of the messages. We parse the message data using the json.loads function and process the data in real-time.

Exercise 29: Natural Language Processing

Concepts:

Natural Language Processing

NLTK library

Tokenization

Stemming

Stop Words Removal

Description: Write a Python script that performs natural language processing tasks on a text corpus.

Solution:

import nltkfrom nltk.tokenize import word_tokenizefrom nltk.stem import PorterStemmerfrom nltk.corpus import stopwords # Download NLTK datanltk.download('punkt')nltk.download('stopwords') # Load the text corpuswith open('corpus.txt', 'r') as f:    corpus = f.read() # Tokenize the corpustokens = word_tokenize(corpus) # Remove stop wordsstop_words = set(stopwords.words('english'))filtered_tokens = [token for token in tokens if token.lower() not in stop_words] # Stem the tokensstemmer = PorterStemmer()stemmed_tokens = [stemmer.stem(token) for token in filtered_tokens] # Print the resultsprint('Original tokens:', tokens[:10])print('Filtered tokens:', filtered_tokens[:10])print('Stemmed tokens:', stemmed_tokens[:10])

In this exercise, we first download the necessary data from the NLTK library using the nltk.download function. We load a text corpus from a file and tokenize the corpus using the word_tokenize function from the nltk.tokenize module. We remove stop words using the stopwords corpus from the NLTK library and stem the tokens using the PorterStemmer class from the nltk.stem module. Finally, we print the results for the original, filtered, and stemmed tokens.

Exercise 30: Distributed Systems

Concepts:

Distributed Systems

Pyro library

Remote Method Invocation

Client-Server Architecture

Description: Write a Python script that implements a distributed system using the Pyro library.

Solution:

import Pyro4 # Define a remote object class@Pyro4.exposeclass MyObject:    def method1(self, arg1):        return f"Processed method1 with argument: {arg1}"     def method2(self, arg2):        return f"Processed method2 with argument: {arg2}" # Start the serverif __name__ == '__main__':    # Locate the name server    ns = Pyro4.locateNS()     # Create a Pyro daemon    daemon = Pyro4.Daemon()     # Register the remote object with the daemon    uri = daemon.register(MyObject)     # Register the object with the name server    ns.register('myobject', uri)     print(f"MyObject is now available. URI: {uri}")     # Run the server loop    daemon.requestLoop()

In this exercise, we first define a remote object class using the expose decorator from the Pyro4 library. We implement two methods that can be invoked remotely by a client. We register the remote object using the register method of a Pyro4 daemon. We start the name server using the locateNS function from the Pyro4 library and register the remote object with a name. Finally, we start the server using the requestLoop method of the daemon.

I hope you find these exercises helpful! Let me know if you have any further questions.

Exercise 31: Data Visualization

Concepts:

Data Visualization

Plotly library

Line Chart

Scatter Chart

Bar Chart

Heatmap

Subplots

Description: Write a Python script that creates interactive visualizations of data using the Plotly library.

Solution:

import plotly.graph_objs as goimport pandas as pdfrom plotly.subplots import make_subplots  # Correct import # Load the datadf = pd.read_csv('data.csv') # Ensure 'quarter' is a string (for heatmap y-axis)df['quarter'] = df['quarter'].astype(str) # Create tracestrace1 = go.Scatter(x=df['year'], y=df['sales'], mode='lines', name='Sales')trace2 = go.Scatter(x=df['year'], y=df['profit'], mode='markers', name='Profit')trace3 = go.Bar(x=df['year'], y=df['expenses'], name='Expenses')trace4 = go.Heatmap(x=df['year'], y=df['quarter'], z=df['revenue'], colorscale='Viridis', name='Revenue') # Create subplotsfig = make_subplots(rows=2, cols=2, subplot_titles=('Sales', 'Profit', 'Expenses', 'Revenue')) # Add traces correctlyfig.add_trace(trace1, row=1, col=1)fig.add_trace(trace2, row=1, col=2)fig.add_trace(trace3, row=2, col=1)fig.add_trace(trace4, row=2, col=2) # Update layout for better visualizationfig.update_layout(title='Financial Performance', height=800, width=1000) # Display the chartfig.show()

In this exercise, we first load a dataset into a pandas dataframe. We create several chart objects using the Scatter, Bar, and Heatmap classes from the plotly.graphobjs module. We create subplots using the makesubplots function from the plotly.subplots module and add the chart objects to the subplots using the appendtrace method. We set the layout of the chart using the updatelayout method and display the chart using the show method.

Exercise 32: Data Engineering

Concepts:

Data Engineering

SQLite

Pandas library

Data Transformation

Data Integration

Description: Write a Python script that processes data from multiple sources and stores it in a database.

Solution:

import sqlite3import pandas as pd # Load data from multiple sources into pandas DataFramesdf1 = pd.read_csv('data1.csv')df2 = pd.read_excel('data2.xlsx')df3 = pd.read_json('data3.json') # Standardize column names across datasetsexpected_columns = ['date', 'amount', 'description']  # Adjust based on actual datasetdf1 = df1.reindex(columns=expected_columns, fill_value=None)df2 = df2.reindex(columns=expected_columns, fill_value=None)df3 = df3.reindex(columns=expected_columns, fill_value=None) # Data Cleaning & Transformationdf1['date'] = pd.to_datetime(df1['date'], errors='coerce')  # Handle invalid datesdf2['amount'] = df2['amount'].astype(float) / 100  # Convert to proper currency formatdf3['description'] = df3['description'].astype(str).str.upper()  # Ensure consistency # Merge DataFrames while handling missing valuesdf = pd.concat([df1, df2, df3], axis=0).fillna({'amount': 0, 'description': 'UNKNOWN'}) # Store the data in a SQLite database safelydb_file = 'mydb.db'table_name = 'mytable' with sqlite3.connect(db_file) as conn:    df.to_sql(table_name, conn, if_exists='replace', index=False) print(f"Data successfully saved to SQLite table '{table_name}' in '{db_file}'.")

In this exercise, we first load data from multiple sources into pandas dataframes using functions such as readcsv, readexcel, and readjson. We transform the data using pandas functions such as todatetime, str.upper, and arithmetic operations. We combine the data into a single pandas dataframe using the concat function. Finally, we store the data in a SQLite database using the to_sql method of the pandas dataframe.

Exercise 33: Natural Language Generation

Concepts:

Natural Language Generation

Markov Chains

NLTK library

Text Corpus

Description: Write a Python script that generates text using natural language generation techniques.

Solution:

import nltkimport randomimport os # Download necessary NLTK resourcesnltk.download('punkt') # Define corpus filecorpus_file = 'corpus.txt' # Ensure the corpus file existsif not os.path.exists(corpus_file):    raise FileNotFoundError(f"Error: The file '{corpus_file}' was not found.") # Load the text corpuswith open(corpus_file, 'r', encoding='utf-8') as f:    corpus = f.read() # Tokenize the corpustokens = nltk.word_tokenize(corpus) # Build a dictionary of word transitions (Markov Chain)chain = {}for i in range(len(tokens) - 1):    word1 = tokens[i]    word2 = tokens[i + 1]    if word1 in chain:        chain[word1].append(word2)    else:        chain[word1] = [word2] # Generate text using Markov chainsstart_word = random.choice(list(chain.keys()))sentence = [start_word.capitalize()] while len(sentence) < 100:  # Limit by word count    last_word = sentence[-1].lower()  # Ensure consistent lookup    if last_word in chain:        next_word = random.choice(chain[last_word])        sentence.append(next_word)    else:        break  # Stop if there are no next words # Print the generated textprint(' '.join(sentence))

In this exercise, we first download the necessary data from the NLTK library using the nltk.download function. We load a text corpus from a file and tokenize the corpus using the word_tokenize function from the nltk library. We build a dictionary of word transitions using a loop and generate text using Markov chains. We start by selecting a random word from the dictionary and then randomly select a next word from the list of possible transitions. We continue to add words to the sentence until it reaches a specified length. Finally, we print the generated text.

Exercise 34: Machine Learning

Concepts:

Machine Learning

Scikit-learn library

Decision Tree Classifier

Model Training

Model Evaluation

Description: Write a Python script that trains a machine learning model using the scikit-learn library.

Solution:

from sklearn import datasetsfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score, classification_report # Load the iris datasetiris = datasets.load_iris() # Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42, stratify=iris.target) # Train a decision tree classifier with hyperparameter tuningclf = DecisionTreeClassifier(max_depth=4, min_samples_split=5, random_state=42)clf.fit(X_train, y_train) # Make predictionsy_pred = clf.predict(X_test) # Evaluate the modelaccuracy = accuracy_score(y_test, y_pred)print('Accuracy:', round(accuracy, 4))print('\nClassification Report:\n', classification_report(y_test, y_pred, target_names=iris.target_names)) # Feature Importance Analysisfeature_importances = dict(zip(iris.feature_names, clf.feature_importances_))print("\nFeature Importances:", feature_importances)

In this exercise, we first load the iris dataset from the scikit-learn library using the loadiris function. We split the data into training and testing sets using the traintestsplit function. We train a decision tree classifier using the DecisionTreeClassifier class and the fit method. We evaluate the model using the predict method and the accuracyscore function from the sklearn.metrics module.

Exercise 35: Computer Vision

Concepts:

Computer Vision

OpenCV library

Image Loading

Image Filtering

Image Segmentation

Description: Write a Python script that performs computer vision tasks on images using the OpenCV library.

Solution:

import cv2import os # Load an image safelyimage_path = 'image.jpg'if not os.path.exists(image_path):    raise FileNotFoundError(f"Error: '{image_path}' not found.") img = cv2.imread(image_path) # Convert to grayscalegray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Apply median filterfiltered = cv2.medianBlur(gray, 5) # Apply adaptive thresholdingthresh = cv2.adaptiveThreshold(filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2) # Apply morphological operationskernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))closed = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel) # Find contourscontours_info = cv2.findContours(closed, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)contours = contours_info[0] if len(contours_info) == 2 else contours_info[1]  # Safe unpacking # Draw contours on the original imagecv2.drawContours(img, contours, -1, (0, 0, 255), 2) # Save and display the processed imagescv2.imwrite('output_contours.jpg', img)cv2.imwrite('output_thresholded.jpg', thresh)cv2.imwrite('output_closed.jpg', closed) print("Processing complete. Images saved as 'output_contours.jpg', 'output_thresholded.jpg', and 'output_closed.jpg'.") # Display the images (comment out if running on a headless system)cv2.imshow('Original', img)cv2.imshow('Thresholded', thresh)cv2.imshow('Closed', closed)cv2.waitKey(0)cv2.destroyAllWindows()

In this exercise, we first load an image using the imread function from the OpenCV library. We convert the image to grayscale using the cvtColor function and apply a median filter to the image using the medianBlur function. We apply adaptive thresholding to the image using the adaptiveThreshold function and morphological operations to the image using the getStructuringElement and morphologyEx functions. We find contours in the image using the findContours function and draw the contours on the original image using the drawContours function. Finally, we display the images using the imshow function.

I hope you find these exercises helpful! Let me know if you have any further questions.

Exercise 36: Network Programming

Concepts:

Network Programming

Socket library

Client-Server Architecture

Protocol Implementation

Description: Write a Python script that communicates with a remote server using the socket library.

Solution:

import socket # Create a socket objects = socket.socket() # Define the server address and port numberhost = 'localhost'port = 12345 # Connect to the servers.connect((host, port)) # Send data to the servers.send(b'Hello, server!') # Receive data from the serverdata = s.recv(1024) # Close the sockets.close() # Print the received dataprint('Received:', data.decode())

In this exercise, we first create a socket object using the socket function from the socket library. We define the address and port number of the server we want to connect to. We connect to the server using the connect method of the socket object. We send data to the server using the send method and receive data from the server using the recv method. Finally, we close the socket using the close method and print the received data.

Exercise 37: Cloud Computing

Concepts:

Cloud Computing

Heroku

Flask

Web Application Deployment

Description: Write a Python script that deploys a Flask web application to the Heroku cloud platform.

Solution:

from flask import Flask # Create a Flask applicationapp = Flask(__name__) # Define a route@app.route('/')def hello():    return 'Hello, world!' # Run the application (for development only)if __name__ == '__main__':    app.run(host='0.0.0.0', port=5000, debug=True)  # Set debug=True only for development

In this exercise, we first install the required libraries for deploying a Flask web application to the Heroku cloud platform. We create a simple Flask application that defines a single route. We use the run method of the Flask object to run the application locally. To deploy the application to the Heroku cloud platform, we need to follow the instructions provided by Heroku and push our code to a remote repository.

Exercise 38: Natural Language Processing

Concepts:

Natural Language Processing

spaCy library

Named Entity Recognition

Text Processing

Description: Write a Python script that performs named entity recognition on text using the spaCy library.

Solution:

import spacy # Ensure the model is installed before running the script:# Run: python -m spacy download en_core_web_sm # Load the English language modeltry:    nlp = spacy.load('en_core_web_sm')except OSError:    raise OSError("Spacy model 'en_core_web_sm' not found. Run 'python -m spacy download en_core_web_sm' and try again.") # Define some text to processtext = 'Barack Obama was born in Hawaii.' # Process the textdoc = nlp(text) # Extract named entities from the textentities = [(ent.text, ent.label_) for ent in doc.ents] # Display resultsif entities:    print("\nNamed Entities Found:")    for text, label in entities:        print(f" - {text}: {label}")else:    print("\nNo named entities found in the text.")

In this exercise, we first load the English language model using the load function from the spaCy library. We define some text to process and process the text using the nlp function from the spaCy library. We extract named entities from the text using the ents attribute of the processed text and print the text and label of each named entity.

Exercise 39: Deep Learning

Concepts:

Deep Learning

TensorFlow library

Convolutional Neural Network

Model Training

Model Evaluation

Description: Write a Python script that trains a deep learning model using the TensorFlow library.

Solution:

import tensorflow as tffrom tensorflow.keras import datasets, layers, modelsfrom tensorflow.keras.preprocessing.image import ImageDataGeneratorfrom tensorflow.keras.callbacks import EarlyStopping # Load the CIFAR-10 dataset(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() # Normalize the pixel valuestrain_images, test_images = train_images / 255.0, test_images / 255.0 # Data Augmentation to prevent overfittingdatagen = ImageDataGenerator(    rotation_range=15,    width_shift_range=0.1,    height_shift_range=0.1,    horizontal_flip=True) # Define the model architecturemodel = models.Sequential([    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),    layers.MaxPooling2D((2, 2)),     layers.Conv2D(64, (3, 3), activation='relu'),    layers.MaxPooling2D((2, 2)),     layers.Conv2D(128, (3, 3), activation='relu'),    layers.MaxPooling2D((2, 2)),     layers.Flatten(),    layers.Dense(128, activation='relu'),    layers.Dropout(0.5),  # Prevent overfitting    layers.Dense(10, activation='softmax')  # Use Softmax for probabilities]) # Compile the modelmodel.compile(optimizer='adam',              loss=tf.keras.losses.SparseCategoricalCrossentropy(),              metrics=['accuracy']) # Define early stopping to stop training if no improvementearly_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True) # Train the model with data augmentationmodel.fit(datagen.flow(train_images, train_labels, batch_size=64),          validation_data=(test_images, test_labels),          epochs=30, callbacks=[early_stopping]) # Evaluate the modeltest_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)print('Test accuracy:', round(test_acc * 100, 2), '%')

In this exercise, we first load the CIFAR-10 dataset from the TensorFlow library using the load_data function. We normalize the pixel values of the images by dividing them by 255.0. We define a deep learning model architecture using the Sequential class from the TensorFlow library and various layers such as Conv2D, MaxPooling2D, Flatten, and Dense. We compile the model using the compile method and train the model using the fit method. We evaluate the model using the evaluate method and print the test accuracy.

Exercise 40: Data Analysis

Concepts:

Data Analysis

Pandas library

Data Cleaning

Data Manipulation

Data Visualization

Description: Write a Python script that analyzes data using the pandas library.

Solution:

import pandas as pdimport matplotlib.pyplot as plt # Load the datadf = pd.read_csv('data.csv') # Convert 'date' column to datetime formatdf['date'] = pd.to_datetime(df['date'], errors='coerce') # Drop rows with missing or invalid datesdf.dropna(subset=['date'], inplace=True) # Convert 'price' and 'quantity' to numeric values (if not already)df['price'] = pd.to_numeric(df['price'], errors='coerce')df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce') # Drop rows with missing or invalid price/quantitydf.dropna(subset=['price', 'quantity'], inplace=True) # Compute total salesdf['total_sales'] = df['price'] * df['quantity'] # Set date as index for proper resamplingdf.set_index('date', inplace=True) # Group by month and sum salesmonthly_sales = df.resample('M').sum() # Visualize the dataplt.figure(figsize=(10, 5))plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-')plt.xlabel('Month')plt.ylabel('Total Sales')plt.title('Monthly Sales Trend')plt.grid()plt.xticks(rotation=45)plt.show()

In this exercise, we first load data from a CSV file using the read_csv function from the pandas library. We clean the data by removing any rows with missing values using the dropna method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby method. We visualize the data by plotting the total sales for each month using the plot function from the matplotlib library.

Exercise 41: Data Science

Concepts:

Data Science

NumPy library

pandas library

Matplotlib library

Data Cleaning

Data Manipulation

Data Visualization

Description: Write a Python script that performs data analysis on a dataset using the NumPy, pandas, and Matplotlib libraries.

Solution:

import numpy as npimport pandas as pdimport matplotlib.pyplot as plt # Load the datadf = pd.read_csv('data.csv') # Convert 'date' column to datetime formatdf['date'] = pd.to_datetime(df['date'], errors='coerce') # Drop rows with missing or invalid datesdf.dropna(subset=['date'], inplace=True) # Convert 'price' and 'quantity' to numeric values (if not already)df['price'] = pd.to_numeric(df['price'], errors='coerce')df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce') # Drop rows with missing or invalid price/quantitydf.dropna(subset=['price', 'quantity'], inplace=True) # Compute total salesdf['total_sales'] = df['price'] * df['quantity'] # Set date as index for proper resamplingdf.set_index('date', inplace=True) # Group by month and sum salesmonthly_sales = df.resample('M').sum() # Analyze the dataprint('Total Sales:', round(df['total_sales'].sum(), 2))print('Average Price:', round(df['price'].mean(), 2))print('Median Quantity:', df['quantity'].median()) # Visualize the dataplt.figure(figsize=(10, 5))plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-', color='b', label='Total Sales')plt.xlabel('Month')plt.ylabel('Total Sales')plt.title('Monthly Sales Trend')plt.legend()plt.grid()plt.xticks(rotation=45)plt.show()

In this exercise, we first load data from a CSV file using the read_csv function from the pandas library. We clean the data by removing any rows with missing values using the dropna method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby method. We perform some basic data analysis by calculating the total sales, average price, and median quantity. We visualize the data by plotting the total sales for each month using the plot function from the matplotlib library.

Exercise 42: Machine Learning

Concepts:

Machine Learning

scikit-learn library

Support Vector Machines

Model Training

Model Evaluation

Description: Write a Python script that trains a machine learning model using the scikit-learn library.

Solution:

import numpy as npfrom sklearn import datasets, svmfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.metrics import accuracy_score, classification_report, confusion_matrix # Load the iris datasetiris = datasets.load_iris() # Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42, stratify=iris.target) # Standardize the data (SVMs perform better with scaled data)scaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test) # Train a Support Vector Machine classifierclf = svm.SVC(kernel='linear', C=1.0, random_state=42)clf.fit(X_train_scaled, y_train) # Predict the labelsy_pred = clf.predict(X_test_scaled) # Evaluate the classifieraccuracy = accuracy_score(y_test, y_pred)print(f'Accuracy: {accuracy:.4f}\n') # Print detailed evaluation metricsprint("Classification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

In this exercise, we first load the iris dataset from the scikit-learn library using the loadiris function. We split the data into training and testing sets using the traintest_split function from the scikit-learn library. We train a support vector machine classifier using the SVC class from the scikit-learn library with a linear kernel. We evaluate the classifier using the score method and print the accuracy.

Exercise 43: Web Scraping

Concepts:

Web Scraping

BeautifulSoup library

HTML Parsing

Data Extraction

Description: Write a Python script that scrapes data from a website using the BeautifulSoup library.

Solution:

import requestsfrom bs4 import BeautifulSoup # Define the target URLurl = 'https://en.wikipedia.org/wiki/Python_(programming_language)' # Add headers to prevent request blockingheaders = {'User-Agent': 'Mozilla/5.0'} # Fetch the HTML content of the websiteresponse = requests.get(url, headers=headers) # Check if the request was successfulif response.status_code != 200:    print(f"Error: Unable to fetch the page (Status Code: {response.status_code})")    exit() # Parse the HTML content using BeautifulSoupsoup = BeautifulSoup(response.text, 'html.parser') # Extract the page titletitle = soup.title.stringprint(f"\nPage Title: {title}\n") # Extract all valid linksbase_url = 'https://en.wikipedia.org'links = [] for link in soup.find_all('a', href=True):  # Ensures 'href' exists    href = link.get('href')     # Convert relative Wikipedia links to absolute URLs    if href.startswith('/wiki/'):        full_url = base_url + href        links.append(full_url)    elif href.startswith('http'):  # Keep only valid external links        links.append(href) # Print the first 10 links for brevityprint("Extracted Links:")for l in links[:10]:  # Limit output for readability    print(l) print(f"\nTotal Links Found: {len(links)}")

In this exercise, we first fetch the HTML content of a website using the get function from the requests library. We parse the HTML content using the BeautifulSoup class from the BeautifulSoup library. We extract data from the HTML content using various methods such as title and find_all.

Exercise 44: Database Programming

Concepts:

Database Programming

SQLite library

Data Retrieval

Data Manipulation

Description: Write a Python script that interacts with a database using the SQLite library.

Solution:

import sqlite3 # Connect to the database using a context managerwith sqlite3.connect('data.db') as conn:    cursor = conn.cursor()     # Create a table (if it doesn't exist)    cursor.execute('''CREATE TABLE IF NOT EXISTS users (                      id INTEGER PRIMARY KEY AUTOINCREMENT,                      name TEXT NOT NULL,                      age INTEGER NOT NULL);''')     # Insert data into the table (use parameterized queries)    cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('John Doe', 30))    cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('Jane Doe', 25))     # Retrieve data from the table    cursor.execute('SELECT * FROM users')    users = cursor.fetchall()  # Fetch all rows     print("\nUsers in database:")    for user in users:        print(user)     # Update data using a parameterized query    cursor.execute("UPDATE users SET age = ? WHERE name = ?", (35, 'John Doe'))     # Delete data using a parameterized query    cursor.execute("DELETE FROM users WHERE name = ?", ('Jane Doe',))     # Commit the changes (happens automatically with `with` statement)    conn.commit() print("\nDatabase operations completed successfully.")

In this exercise, we first connect to a SQLite database using the connect function from the SQLite library. We create a table using SQL commands and insert data into the table using SQL commands. We retrieve data from the table using SQL commands and print the data. We update data in the table and delete data from the table using SQL commands. Finally, we commit the changes to the database and close the connection.

Exercise 45: Cloud Computing

Concepts:

Cloud Computing

Flask library

Boto3 library

Web Application Deployment

Description: Write a Python script that deploys a web application to the AWS cloud platform using the Flask and Boto3 libraries.

Solution:

from flask import Flaskimport boto3import os # Create a Flask applicationapp = Flask(__name__) # AWS S3 ConfigurationAWS_BUCKET_NAME = 'my-bucket'AWS_REGION = 'us-east-1'  # Change to your region # Upload function for AWS S3def upload_to_s3(file_name, bucket_name, object_name=None):    """Uploads a file to S3"""    try:        s3 = boto3.client('s3')  # Ensure credentials are configured        object_name = object_name or file_name  # Default object name         # Upload file        s3.upload_file(file_name, bucket_name, object_name)        print(f"File '{file_name}' uploaded successfully to S3 bucket '{bucket_name}'.")     except Exception as e:        print(f"Error uploading to S3: {e}") # Define a route@app.route('/')def hello():    return 'Hello, world! Flask is running!' # Run the applicationif __name__ == '__main__':    # Upload a file to S3 before starting Flask (Optional)    if os.path.exists('app.py'):        upload_to_s3('app.py', AWS_BUCKET_NAME)     # Run Flask server    app.run(host='0.0.0.0', port=5000, debug=True)

In this exercise, we first install the required libraries for deploying a Flask web application to the AWS cloud platform. We create a simple Flask application that defines a single route. We use the upload_file method from the Boto3 library to upload the application to an AWS S3 bucket. Note that this is only a basic example and there are many additional steps involved in deploying a web application to the AWS cloud platform, such as creating an EC2 instance, setting up a load balancer, configuring security groups, and more.

Exercise 46: Natural Language Processing

Concepts:

Natural Language Processing

NLTK library

Tokenization

Part-of-Speech Tagging

Named Entity Recognition

Description: Write a Python script that performs natural language processing on text data using the NLTK library.

Solution:

import nltk # Download required NLTK modelsnltk.download('punkt')nltk.download('maxent_ne_chunker')nltk.download('words')nltk.download('averaged_perceptron_tagger') # Load the text datatext = '''Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. The company's hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, the Apple TV digital media player, and the HomePod smart speaker. Apple's software includes the macOS and iOS operating systems, the iTunes media player, the Safari web browser, and the iLife and iWork creativity and productivity suites. Its online services include the iTunes Store, the iOS App Store, and Mac App Store, Apple Music, and iCloud.''' # Tokenize the texttokens = nltk.word_tokenize(text) # Perform part-of-speech taggingpos_tags = nltk.pos_tag(tokens) # Perform named entity recognitionne_tags = nltk.ne_chunk(pos_tags) # Extract named entitiesnamed_entities = {} for chunk in ne_tags:    if hasattr(chunk, 'label'):        entity_type = chunk.label()  # Get entity type (e.g., ORGANIZATION, PERSON)        entity_name = ' '.join(c[0] for c in chunk)  # Join words in entity         if entity_type not in named_entities:            named_entities[entity_type] = []        named_entities[entity_type].append(entity_name) # Print structured named entitiesprint("\nNamed Entities Found:")for entity_type, names in named_entities.items():    print(f"{entity_type}: {', '.join(set(names))}")  # Use `set()` to remove duplicates

In this exercise, we first load some text data. We tokenize the text using the wordtokenize function from the NLTK library. We perform part-of-speech tagging using the postag function from the NLTK library. We perform named entity recognition using the ne_chunk function from the NLTK library. We print the named entities in the text data by checking if each chunk has a label of 'ORGANIZATION' or 'PERSON' using the hasattr function and label attribute.

Exercise 47: Big Data

Concepts:

Big Data

PySpark

Apache Spark

Data Processing

MapReduce

Description: Write a PySpark script that processes data using the Spark framework.

Solution:

from pyspark.sql import SparkSessionimport re # Initialize Spark sessionspark = SparkSession.builder.appName('WordCount').getOrCreate()sc = spark.sparkContext # Load the text datatext = sc.textFile('data.txt') # Process and count wordsword_counts = (    text.flatMap(lambda line: line.split())  # Split on any whitespace    .map(lambda word: re.sub(r'\W+', '', word.lower()))  # Remove punctuation & normalize to lowercase    .filter(lambda word: word)  # Remove empty strings    .map(lambda word: (word, 1))  # Convert words into (word, 1) pairs    .reduceByKey(lambda a, b: a + b)  # Sum occurrences of each word) # Collect and print resultsfor word, count in word_counts.collect():    print(word, count) # Stop Spark sessionsc.stop()

In this exercise, we first configure the Spark context using the SparkConf and SparkContext classes from the PySpark library. We load some text data using the textFile method. We split the text into words and count the occurrences of each word using the flatMap, map, and reduceByKey methods. We print the word counts using the collect method. Finally, we stop the Spark context using the stop method.

Exercise 48: Cybersecurity

Concepts:

Cybersecurity

Scapy library

Network Analysis

Packet Sniffing

Description: Write a Python script that performs security analysis on a network using the Scapy library.

Solution:

from scapy.all import sniff, TCP, conf # Ensure non-root users can run it in a limited wayconf.sniff_promisc = False # Define a packet handler functiondef packet_handler(packet):    if packet.haslayer(TCP) and packet[TCP].flags & 2:  # Check for SYN flag        print('SYN packet detected:', packet.summary()) # Start the packet sniffer with exception handlingtry:    print("Sniffing TCP packets (Press Ctrl+C to stop)...")    sniff(prn=packet_handler, filter='tcp', store=0, count=100)  # Capture only 100 packetsexcept KeyboardInterrupt:    print("\nPacket sniffing stopped by user.")

In this exercise, we use the Scapy library to perform security analysis on a network. We define a packet handler function that is called for each packet that is sniffed. We check if the packet is a TCP packet and if it has the SYN flag set. If so, we print a message indicating that a SYN packet has been detected, along with a summary of the packet.

Exercise 49: Machine Learning

Concepts:

Machine Learning

Scikit-learn library

Model Training

Cross-Validation

Grid Search

Description: Write a Python script that trains a machine learning model using the scikit-learn library.

Solution:

from sklearn import datasetsfrom sklearn.model_selection import GridSearchCV, StratifiedKFoldfrom sklearn.preprocessing import StandardScalerfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.pipeline import Pipeline # Load the datasetiris = datasets.load_iris()X, y = iris.data, iris.target # Define the hyperparameter gridparam_grid = {    'knn__n_neighbors': [1, 3, 5, 7, 9, 11, 15],    'knn__weights': ['uniform', 'distance']} # Create a pipeline with feature scaling and KNNpipeline = Pipeline([    ('scaler', StandardScaler()),    ('knn', KNeighborsClassifier())]) # Perform a grid search with stratified cross-validationcv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)grid_search = GridSearchCV(pipeline, param_grid, cv=cv, n_jobs=-1, scoring='accuracy')grid_search.fit(X, y) # Print best hyperparameters and accuracy scoreprint('Best Hyperparameters:', grid_search.best_params_)print(f'Best Accuracy Score: {grid_search.best_score_:.4f}')

In this exercise, we use the scikit-learn library to train a machine learning model. We load a dataset using the loadiris function from the datasets module. We split the dataset into features and target. We define a dictionary of hyperparameters to search over using the paramgrid variable. We create a KNN classifier using the KNeighborsClassifier class. We perform a grid search with cross-validation using the GridSearchCV class. We print the best hyperparameters and the accuracy score using the bestparams and bestscore attributes.

Exercise 50: Computer Vision

Concepts:

Computer Vision

OpenCV library

Image Processing

Object Detection

Description: Write a Python script that performs image processing using the OpenCV library.

Solution:

import cv2import os # Define pathsimage_path = 'image.jpg'cascade_path = 'haarcascade_frontalface_default.xml' # Ensure the image file existsif not os.path.exists(image_path):    raise FileNotFoundError(f"Error: Image file '{image_path}' not found.") # Load the imageimg = cv2.imread(image_path) # Convert the image to grayscalegray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Load the face detection cascade classifierface_cascade = cv2.CascadeClassifier(cascade_path) # Ensure the cascade file is loaded properlyif face_cascade.empty():    raise FileNotFoundError(f"Error: Haar Cascade XML file '{cascade_path}' not found or failed to load.") # Detect faces in the imagefaces = face_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=5, minSize=(30, 30)) # Draw rectangles around detected facesfor (x, y, w, h) in faces:    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) # Display the image with detected facescv2.imshow('Detected Faces', img) # Ensure proper cleanuptry:    cv2.waitKey(0)    cv2.destroyAllWindows()except Exception as e:    print("Error while closing windows:", e)

In this exercise, we use the OpenCV library to perform image processing. We load an image using the imread function. We convert the image to grayscale using the cvtColor function. We define a classifier for face detection using the CascadeClassifier class and a pre-trained classifier file. We detect faces in the image using the detectMultiScale function. We draw rectangles around the detected faces using the rectangle function. We display the image with the detected faces using the imshow, waitKey, and destroyAllWindows functions.