Scenario: HR Analytics for a Software Company
Problem Statement :
A software company wants to improve its employee retention rate and boost overall employee satisfaction.
How Python Libraries Can Help:
NumPy and Pandas:
Data Cleaning and Preparation:
1. Load employee data from various sources (e.g.,HRIS,performance reviews) into Pandas DataFrames.
Exploratory Data Analysis (EDA):
1. Calculate summary statistics (e.g., mean, median, standard deviation) using NumPy.
2. Visualize data distributions, correlations, and trends using Matplotlib or Seaborn.
2. Visualize data distributions, correlations, and trends using Matplotlib or Seaborn.
Example code:
- import pandas as pd
- import numpy as np
- # Load data
- df = pd.read_csv('employee_data.csv')
- # Clean data
- df.fillna(method='ffill', inplace=True)
- df['Age'] = df['Age'].astype(int)
- # Exploratory Data Analysis
- print(df.describe())
- print(df.corr())
Matplotlib and Seaborn:
● Visualizing Employee Attrition:Create bar charts to compare attrition rates across departments or job roles.
Use heatmaps to visualize correlations between variables like tenure, salary, and satisfaction.
● Performance Analysis:
box plots to compare performance ratings across different demographic groups.
Example code:
- import matplotlib.pyplot as plt
- import seaborn as sns
- # Attrition by Department
- sns.countplot(x='Department', hue='Attrition', data=df)
- plt.show()
- # Correlation Matrix
- corr_matrix = df.corr()
- sns.heatmap(corr_matrix, annot=True)
- plt.show()
Scikit-Learn:
Predictive Modeling:
Build a machine learning model (e.g., logistic regression, random forest) to predict employee attrition based on factors like tenure, salary, and job satisfaction.
Train and evaluate the model using techniques like cross-validation.Clustering:
Group employees into segments based on similar characteristics (e.g., demographics, performance) using clustering algorithms (e.g., K-Means, hierarchical clustering).Example code :
- from sklearn.model_selection import train_test_split
- from sklearn.linear_model import LogisticRegression
- from sklearn.metrics import accuracy_score
- # Prepare data
- X = df[['Age', 'JobLevel', 'Salary']]
- y = df['Attrition']
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
- # Create and train model
- model = LogisticRegression()
- model.fit(X_train, y_train)
- # Make predictions and evaluate
- y_pred = model.predict(X_test)
- accuracy = accuracy_score(y_test, y_pred)
- print(f'Accuracy: {accuracy}')
Requests and Beautiful Soup:
External Data Integration:
Fetch economic indicators (e.g., unemployment rates, GDP growth) from APIs using Requests.
Scrape industry-specific job boards to analyze job market trends using Beautiful Soup.Example code:
- import requests
- from bs4 import BeautifulSoup
- url = 'https://www.indeed.com/jobs?q=software+engineer&l=Bengaluru'
- response = requests.get(url)
- soup = BeautifulSoup(response.text, 'html.parser')
- job_listings = soup.find_all('div', class_='jobsearch-SerpJobCard')
- for job in job_listings:
- title = job.find('h2').text.strip()
- company = job.find('span', class_='company').text.strip()
- print(f'Title: {title}\nCompany: {company}\n')
Flask:
Building a Dashboard: Create a web application using Flask to display key HR metrics and visualizations in a user-friendly dashboard. Allow HR managers to interact with the data and generate custom reports.
Example code:
- from flask import Flask, render_template, request
- app = Flask(__name__)
- @app.route('/')
- def index():
- return render_template('index.html')
- @app.route('/submit', methods=['POST'])
- def submit():
- name = request.form['name']
- email = request.form['email']
- # Process the form data (e.g., save to a database)
- return 'Form submitted successfully!'
- if __name__ == '__main__':
- app.run(debug=True)
OpenCV:
Facial Emotion Analysis:Analyze employee facial expressions during meetings or training sessions to gauge engagement and satisfaction levels.Use OpenCV to detect and classify emotions (e.g., happy, sad, angry) from video or image data.
Example code :
- import cv2
- # Load pre-trained facial emotion recognition model
- face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
- emotion_model = cv2.dnn.readNetFromCaffe('deploy.prototxt.txt', 'res10_300x300_ssd_iter_140000.caffemodel')
- # Capture video from webcam
- cap = cv2.VideoCapture(0)
- while True:
- ret, frame = cap.read()
- # ... (process frame, detect faces, predict emotions)
- cv2.imshow('Video', frame)
- if cv2.waitKey(1) & 0xFF == ord('q'):
- break
- cap.release()
- cv2.destroyAllWindows()
Pytest:
Testing Data Pipelines: Write unit tests to ensure the accuracy and reliability of data cleaning, transformation, and modeling pipelines. Test the functionality of the web application and its components.
Example code :
- import pytest
- def test_data_cleaning():
- # Create a sample DataFrame with missing values
- df = pd.DataFrame({'Age': [25, np.nan, 30]})
- df.fillna(method='ffill', inplace=True)
- assert df['Age'].isnull().sum() == 0
By leveraging these Python libraries, the HR team can gain valuable insights into employee behavior, identify potential issues, and implement targeted strategies to improve retention and satisfaction.
- import pandas as pd
- import numpy as np
- # Load data
- df = pd.read_csv('employee_data.csv')
- # Clean data
- df.fillna(method='ffill', inplace=True)
- df['Age'] = df['Age'].astype(int)
- # Exploratory Data Analysis
- print(df.describe())
- print(df.corr())
Matplotlib and Seaborn:
● Visualizing Employee Attrition:Use heatmaps to visualize correlations between variables like tenure, salary, and satisfaction.
● Performance Analysis:
box plots to compare performance ratings across different demographic groups.
Example code:
- import matplotlib.pyplot as plt
- import seaborn as sns
- # Attrition by Department
- sns.countplot(x='Department', hue='Attrition', data=df)
- plt.show()
- # Correlation Matrix
- corr_matrix = df.corr()
- sns.heatmap(corr_matrix, annot=True)
- plt.show()
Scikit-Learn:
Predictive Modeling:Build a machine learning model (e.g., logistic regression, random forest) to predict employee attrition based on factors like tenure, salary, and job satisfaction.
Train and evaluate the model using techniques like cross-validation.Clustering:
Group employees into segments based on similar characteristics (e.g., demographics, performance) using clustering algorithms (e.g., K-Means, hierarchical clustering).
Example code :
Fetch economic indicators (e.g., unemployment rates, GDP growth) from APIs using Requests.
Scrape industry-specific job boards to analyze job market trends using Beautiful Soup.
- from sklearn.model_selection import train_test_split
- from sklearn.linear_model import LogisticRegression
- from sklearn.metrics import accuracy_score
- # Prepare data
- X = df[['Age', 'JobLevel', 'Salary']]
- y = df['Attrition']
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
- # Create and train model
- model = LogisticRegression()
- model.fit(X_train, y_train)
- # Make predictions and evaluate
- y_pred = model.predict(X_test)
- accuracy = accuracy_score(y_test, y_pred)
- print(f'Accuracy: {accuracy}')
Requests and Beautiful Soup:
External Data Integration:Fetch economic indicators (e.g., unemployment rates, GDP growth) from APIs using Requests.
Scrape industry-specific job boards to analyze job market trends using Beautiful Soup.
Example code:
Flask:
- import requests
- from bs4 import BeautifulSoup
- url = 'https://www.indeed.com/jobs?q=software+engineer&l=Bengaluru'
- response = requests.get(url)
- soup = BeautifulSoup(response.text, 'html.parser')
- job_listings = soup.find_all('div', class_='jobsearch-SerpJobCard')
- for job in job_listings:
- title = job.find('h2').text.strip()
- company = job.find('span', class_='company').text.strip()
- print(f'Title: {title}\nCompany: {company}\n')
Flask:
Building a Dashboard: Create a web application using Flask to display key HR metrics and visualizations in a user-friendly dashboard. Allow HR managers to interact with the data and generate custom reports.
Example code:
OpenCV:
- from flask import Flask, render_template, request
- app = Flask(__name__)
- @app.route('/')
- def index():
- return render_template('index.html')
- @app.route('/submit', methods=['POST'])
- def submit():
- name = request.form['name']
- email = request.form['email']
- # Process the form data (e.g., save to a database)
- return 'Form submitted successfully!'
- if __name__ == '__main__':
- app.run(debug=True)
OpenCV:
Facial Emotion Analysis:
Analyze employee facial expressions during meetings or training sessions to gauge engagement and satisfaction levels.
Use OpenCV to detect and classify emotions (e.g., happy, sad, angry) from video or image data.
Example code :
- import cv2
- # Load pre-trained facial emotion recognition model
- face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
- emotion_model = cv2.dnn.readNetFromCaffe('deploy.prototxt.txt', 'res10_300x300_ssd_iter_140000.caffemodel')
- # Capture video from webcam
- cap = cv2.VideoCapture(0)
- while True:
- ret, frame = cap.read()
- # ... (process frame, detect faces, predict emotions)
- cv2.imshow('Video', frame)
- if cv2.waitKey(1) & 0xFF == ord('q'):
- break
- cap.release()
- cv2.destroyAllWindows()
Pytest:
Testing Data Pipelines: Write unit tests to ensure the accuracy and reliability of data cleaning, transformation, and modeling pipelines. Test the functionality of the web application and its components.
Example code :
- import pytest
- def test_data_cleaning():
- # Create a sample DataFrame with missing values
- df = pd.DataFrame({'Age': [25, np.nan, 30]})
- df.fillna(method='ffill', inplace=True)
- assert df['Age'].isnull().sum() == 0
By leveraging these Python libraries, the HR team can gain valuable insights into employee behavior, identify potential issues, and implement targeted strategies to improve retention and satisfaction.