OPEN TO OPPORTUNITIES · SAN JOSE, CA

MayureshPramod Pandey

// Data Analyst · Data Engineer · Data Scientist

MS Data Analytics @ SJSU · 3+ years building scalable data pipelines, forecasting models, and analytics solutions across Azure, Snowflake, PySpark & Python.

Years Experience

2B+

Records Processed

15–20%

Forecast Accuracy ↑

Publications

// module_02 · biography

About Me

Data professional passionate about turning raw data into decisions — forecasting models, ETL pipelines, and ML systems at scale.

Core Strengths

📊 Statistical Analysis🔄 ETL/ELT Pipelines🤖 ML Forecasting☁️ Cloud Native🎯 Outcome-Focused

"I bridge the gap between raw data and real decisions — at any scale, across any domain."

Mission Log

2024 – PRESENT

MS Data Analytics

SJSU · May 2026

MAY – AUG 2025

Data Analyst / DE / DS Intern

Schneider National · WI, USA

AUG 2021 – JUN 2024

Data Analyst · Engineer · Scientist

Mu Sigma · Bangalore, India

JUN 2021

BE Information Technology

University of Mumbai

Education

MS Data Analytics

San Jose State University

Big Data · ML · Deep Learning · GenAI · Data Warehouse · Distributed Systems

Certifications

🏅 AWS Certified Solutions Architect – Associate
📄 3 Research Publications (IEEE, Elsevier, AI Journal)

DOWNLOAD RESUME ↓

// module_03 · missions

My Projects

Engineering Mindset

📐 Architecture-First🔁 End-to-End Pipelines📊 Insight-Driven⚡ Performance Obsessed

"Every pipeline I build is designed to scale, every model to generalize."

🔷

Azure E-Commerce Data Pipeline

End-to-end pipeline on Azure using ADF, Data Lake Gen2, Databricks (PySpark). Medallion Architecture. Synapse Analytics + Tableau KPI dashboards.

PySparkADFDatabricksSynapseTableau

View on GitHub

🎵

Spotify Analytics Pipeline & Dashboard

Apache Airflow + Snowflake + dbt pipeline integrating Spotify API historical & real-time data. Power BI dashboards for artist & streaming insights.

AirflowSnowflakedbtPower BISpotify API

View on GitHub

⚡

EV Charging Station Analysis

Power BI dashboard on 78K+ US Energy records. Python EDA + DAX visualizations (geospatial, tree maps, decomposition). ML forecasting for EV expansion.

Power BIPythonDAXML Forecasting

View on GitHub

🧠

LLM-Based RAG Pipeline

Modular RAG pipeline using LangChain, OpenAI & FAISS for legal document summarization. 35% token reduction via dynamic chunking. Claude, Gemini, Mistral benchmarked.

LangChainFAISSOpenAIRAGPython

View on GitHub

🏦

Precision Banking Prediction

94% accuracy & ROC-AUC 0.94 — Random Forest on Bank Marketing dataset with SMOTE, EDA, hyperparameter tuning. Deployed via Streamlit.

Scikit-LearnStreamlitSMOTERandom Forest

View on GitHub

📰

MyNewsMate AI News App

Hybrid NLP recommender (TF-IDF + BART + VADER) with Celery + Redis. 40% CTR boost, 30% engagement up, 25% bounce rate down. Django REST + React + AWS.

DjangoReactAWSNLPCeleryRedis

View on GitHub

// module_04 · systems online

My Skills

Full data stack — from raw ingestion to real-time insights. No percentages, just real experience.

Technical Arsenal

🐍 Python Expert☁️ Azure & AWS⚙️ PySpark & Spark🤖 ML & GenAI📊 BI Dashboards

"From raw ingestion to real-time insight — I own the full data stack."

Languages

PythonPySparkSQLRJavaScriptC / C++

Data Engineering

Apache SparkAirflowdbtDelta LakeHiveSnowflakeDatabricksData FactorySynapse

Databases

MySQLPostgreSQLMongoDBHadoopSpark SQLNeo4j

Cloud & DevOps

AzureAWSDockerKubernetesJenkinsAzure DevOps

Analytics & BI

Power BITableauExcel (Adv)DAXA/B TestingEDA

ML & Forecasting

ProphetARIMA/ARIMAXScikit-LearnTensorFlowPyTorchKalman FilterNeuralProphetSHAP

AI & GenAI

LangChainOpenAI APIFAISSHugging FaceAzure OpenAIRAG Pipelines

Other Tools

Git / GitHubStreamlitFastAPIGreat ExpectationsSphinx

// module_05 · flight log

My Experience

Impact-driven roles across data engineering, analytics, and machine learning at scale.

Impact Metrics

📉 40% Data Prep Time ↓📈 15–20% Forecast Accuracy ↑🎯 30% Manual Work ↓💰 8–12% Margin ↑

"Every metric I move is backed by a pipeline I built and a model I shipped."

Data Analyst / Data Engineer / Data Scientist Intern

Schneider National · WI, USA

MAY 2025 – AUG 2025

Python, PySpark, Spark SQL, Prophet, Kalman Filtering, Azure, Oracle, Snowflake, Hive

›

Click to expand

μΣ

Data Analyst · Data Engineer · Data Scientist

Mu Sigma · Bangalore, India

AUG 2021 – JUN 2024

PySpark, Hive, Azure Data Lake, Azure Databricks, Azure Blob, Azure Key Vaults, Azure DevOps

›

Click to expand

// module_06 · research transmissions

Publications & Certifications

Peer-reviewed research across ML, IoT, and AI systems — published at IEEE, Elsevier, and international AI journals.

Research Focus

🧠 Machine Learning📡 IoT Systems🌐 Social Networks🏅 AWS Certified

"Published at IEEE, Elsevier, and AI journals — research that bridges theory and production."

📄

Mental Health Prediction for Juveniles using Machine Learning

Elsevier Conference · 2021 · SSRN: 3867291

📡

Smart Emergency Vehicle Detection using IoT & Machine Learning

IEEE INAC-4 · 2019

🌐

Identity Resolution in Social Networks using Recommender Systems

Journal of AI & Systems · 2019 · ISSN: 2642-2859

🏅

AWS Certified Solutions Architect – Associate

Amazon Web Services · Active Certification

// module_07 · comms array

Get In Touch

Open to Data Analyst, Data Engineer & Data Scientist roles— full-time or internship. I'd love to hear from you.

Availability

💼 Full-time Roles🎓 Internships🤝 Collaborations🌐 Remote Friendly

"Let's turn your data challenges into measurable business outcomes."

Send a Message

Signal Channels

linkedin.com/in/mayureshpp

GitHub

github.com/mayu99

mayurp.pandey@gmail.com

Phone

+1 (669) 340-6006

Location

San Jose, CA · Open to remote

Resume

Download / View Resume