Project Portfolio - Jacob Mazurkiewicz Tech

Neural Network Web App to Identify Foods and Nutrional Profiles

Every year Americans suffer from poor health outcomes due to suboptimal dietary choices. This project aims to combat this with a computer vision application that can scan images, recognize and label food, provide their nutritional profiles, suggest recipes based on the identified ingredients, and implement a chatbot that can generate text about the food based on user input. The application is hosted on a streamlit server, allowing users to upload images for analysis. Checkout a video demo to the left!

Tracking Pharmaceutical Grants with Boosted Decision Trees

Every year, doctors receive millions of dollars from big pharmaceutical companies. This can lead to unethical conflicts of interest, yet tracking these funds is a difficult data challenge. This project focuses on connecting grants to doctors across varied datasets to solve this problem. I crafted a classifier that predicts matches between records using Jaro-Winkler distances and word embeddings from huggingface models and fasttext. To streamline deployment and inference, I harnessed Hierarchical Navigable Small World graphs for querying nearest neighbors through vector embeddings, reducing unnecessary comparisons and computational expenses. The workflow encompasses data import, cleaning, and preprocessing, classifier creation and training (with simulated initial data), database configuration for storing connections, and deploying the classifier to test data for accurate matching.

Reinforcement Learning for Autonomous Agents in Games

This project implemented RL algorithms like DQN, Distributional RL, and Actor-Critic to train independent agents to master various tasks and games. The implementations utilize custom data structures like a prioritized memory replay buffer that sampled experiences with higher losses for more efficient learning. The project also uses PyTorch convolutional neural networks to parameterize the behavior policy of the agent.

Language Processing to Boost Charity Social Media Campaigns

I wanted to utilize machine learning for a positive community impact. This project used Natural Language Processing (NLP) techniques to identify the top drivers of social media success for a local non-profit organization Bethlehem Haven. We collected social media data, engineered it for suitable use, built Python models to analyze the data use techniques such as sentiment analysis and part of speech tagging, and conducted statistical tests to confirm or reject significant findings. We then communicated our results in actionable advice to Bethlehem Haven to optimize their social media outreach.

Hacking4Humanity Hate Speech-Filtering Chatbot

This project was my submission for the Hacking4Humanity Hackathon. The theme of the project was stopping online hate speech, so our application prototype used a chatbot model with the News API to create an web app that could understand textual requests, retrieve top news articles, and block any hate speech through a specially designed filter. The application was built in Python along with a Flask Server to host a website demonstration.

Optimization Algorithm for Contextual Multi-Armed Bandit Notifications

This project implements the “Recovering Difference SoftMax Algorithm” proposed by Duolingo researcher Kevin Yancey and Burr Settles in their paper “A Sleeping, Recovering Bandit Algorithm for Optimizing Recurring Notifications”. The algorithm is a spin on a contextual multi-armed bandit that uses eligibility rules and novelty bias to improve upon existing solutions.

The project uses off-policy reinforcement learning algorithms to select the best notification for a given user, maximizing the likelihood of getting a user to interact with the app and complete a lesson within 2 hours of sending the notification.

Duquesne University Improvement with Multi-Variate Modelling

This project aimed to produce specific and actionable steps for Duquesne University to improve based on using machine learning techniques on data found at https://collegescorecard.ed.gov/. To accomplish this, regression analysis methods were applied to predict the “Score” of a university, which reflects the earning potential of a graduate, their debt load, and their likelihood to gain employment. Using the best regression model, XGBoost, specific recommendations were made to improve Duquesne’s score. These include offering more programs, expanding their engineering disciplines, and spending more money on classroom instruction rather than construction projects.

Spotify Top Trends Data Mining & Visualization Project

The objective of this project is to analyze the top streamed songs on Spotify, investigating trends, breakdowns of song types, and metrics associated with top-performing songs. I then presented the findings to an audience for a data-storytelling presentation. The analysis is conducted using a dataset obtained from Kaggle: https://www.kaggle.com/datasets/nelgiriyewithana/top-spotify-songs-2023. The project involves data preparation, processing, and visualization using Jupyter notebooks with Python libraries such as Pandas, Seaborn, and Plotly. The final results are compiled into a PowerPoint presentation to present to others.

Python Reporting & Visualization Automation Project

In healthcare, analyzing and reporting health metric data for multiple hospitals is a time-consuming and error-prone task. This project leverages Python automation to streamline this process, significantly reducing the time and effort required for data processing and visualization.

Food Accessibility Data Mining & Dashboard

A data visualization project highlighting some key insights about food availability from the Food Access Research Atlas. The datasets used in the analysis can be found at this link: https://www.ers.usda.gov/data-products/food-access-research-atlas/documentation/.

The project began with basic data cleaning procedures, and importing the necessary files to SQL Server Studio for filtering and querying. After the files were imported and filtered, a series of queries were performed to gain insight into the data. The goal of this project was to gain insights to guide a hypothetical ‘Resource Allocation Committee’ in investing in areas with low food availability. These areas, where supermarkets are located at significant distances away from residences, are also known as food deserts.

After several queries yielded insights, these queries were connected to Tableau where the data visualization dashboard was built. Several insights were garnered such as geographical and ethnic need, as well as trends in improving or declining availability.

UFC Fighter Web Scraping

The objective of this project is to scrape UFC fighter data from a website, clean and preprocess it, load it into a SQL Server database, perform data analysis and querying, and visualize the results using Tableau. The project utilizes Python, Selenium, Edge Driver, SQL Server, and Tableau to accomplish these tasks.

Kaggle Competition: House Prices – Advanced Regression Techniques

A data science project to complete the ‘House Prices – Advanced Regression Techniques.’ within Kaggle. This notebook, employed from the Google Colab IDE, uses data preprocessing techniques, exploratory data analysis and visualization, feature engineering, and model training/evaluation. The goal of this is to accurately predict house prices in a test dataset. The dataset includes over 70 features both numerical and categorical.