4+ years of experience in Data Science with strong expertise in machine learning, statistical analysis, data mining, and relational databases. Hands-on experience developing and evaluating Machine Learning, Deep Learning, and NLP models using Python libraries such as scikit-learn, TensorFlow/Keras, NLTK, and SciPy. Proficient in data manipulation using NumPy, Pandas, SQLAlchemy, and data visualization using Matplotlib, Seaborn, Tableau, and Power BI. Strong foundation in statistical modeling, hypothesis testing, A/B testing, and inferential analysis to support data-driven business decisions. Experience with supervised and unsupervised learning techniques, including regression, classification, clustering, and ensemble models. Extensive exposure to cloud platforms (AWS and Azure) and databases including MySQL, SQL Server, Oracle, MongoDB, and Snowflake. Strong collaborator with experience working in Agile environments and using Git/GitHub for version control.
4+ years of experience in Data Science with strong expertise in machine learning, statistical analysis, data mining, and relational databases. Hands-on experience developing and evaluating Machine Learning, Deep Learning, and NLP models using Python libraries such as scikit-learn, TensorFlow/Keras, NLTK, and SciPy. Proficient in data manipulation using NumPy, Pandas, SQLAlchemy, and data visualization using Matplotlib, Seaborn, Tableau, and Power BI. Strong foundation in statistical modeling, hypothesis testing, A/B testing, and inferential analysis to support data-driven business decisions. Experience with supervised and unsupervised learning techniques, including regression, classification, clustering, and ensemble models. Extensive exposure to cloud platforms (AWS and Azure) and databases including MySQL, SQL Server, Oracle, MongoDB, and Snowflake. Strong collaborator with experience working in Agile environments and using Git/GitHub for version control.
- Architected and deployed a Retrieval-Augmented Generation (RAG) system using LangChain and vector databases (Pinecone/Milvus), reducing internal document search time by 60% and improving response accuracy.
- Developed gradient-boosted ensemble models (XGBoost/LightGBM) to forecast quarterly customer lifetime value (CLV), improving targeted marketing efficiency by 15%.
- Standardized model deployment pipelines using Docker and Kubernetes on AWS SageMaker, reducing model time-to-production from 4 weeks to 5 days.
- Designed and executed multi-armed bandit experiments for real-time website personalization, achieving an 8% conversion rate uplift compared to traditional A/B testing.
- Optimized distributed data processing pipelines using Apache Spark (PySpark), reducing ETL latency by 40% on datasets exceeding 10TB.
- Implemented explainability frameworks (SHAP, LIME) for credit-scoring models, ensuring compliance with U.S. fair-lending and ethical AI regulations.
- Leveraged AutoML and GenAI-assisted workflows to automate 40% of routine data cleaning and exploratory analysis tasks.
- Applied causal machine learning techniques to isolate the impact of external economic factors from internal marketing spend.
- Collaborated with engineering teams to support a Data Mesh architecture, improving data ownership and discoverability.
- Built executive-level dashboards in Tableau and Power BI, integrating real-time model metrics with business KPIs.
- Led the development of campaign waterfall and performance reports to support data-driven marketing decisions.
- Conducted data collection, cleaning, profiling, exploratory analysis, and visualization across multiple data sources.
- Delivered ad-hoc analytics and reporting to support marketing, product, and engineering teams.
- Built scalable proof of concept data pipelines to support production ready analytics environments.
- Developed data ingestion pipelines using Azure Data Factory and Databricks, performing EDA and transformations before visualizing insights in Power BI.
- Designed, executed, and analyzed A/B tests to measure campaign effectiveness and optimize engagement strategies.
- Implemented robust data preprocessing workflows including missing value imputation, scaling, encoding, and transformations using Scikit-learn.
- Trained and evaluated machine learning models (Logistic Regression, Random Forest, SVM) for customer churn prediction.
- Performed univariate, bivariate, and multivariate analysis on customer demographics and risk indicators.
- Applied statistical hypothesis testing to assess significance and business impact of key risk factors.
- Optimized relational databases (MySQL, Oracle), improving query performance and data reliability.
- Designed data cleansing algorithms that improved overall data quality by 40%, leading to more accurate forecasting.