Senior Data Scientist with expertise in machine learning, natural language processing, and time series analysis. Passionate about developing AI solutions that make a positive impact.
Greenfield project collaborating with major healthcare provider on sleep apnea screening
- Built time series forecasting models for PAP therapy assessment using ARIMA, LSTM, and tree-based classifiers
- Produced codebase documentation and experimentation management pipeline with DVC and TrueFoundry
- Developed patient clustering system using DBSCAN with novel SHAP-based feature engineering
- Prototyped Contextual Bandit-based recommender system for therapy desistance intervention
LLM-based semi-automated content generation for exams
- Prototyped simple RAG system in LangChain
- Experimented with Enterprise LLM services from GPT 3.5 through GPT-4o
- Parsed PDFS from proprietary ETS documents to indexed markdown with marker package
- Developed early pipeline for generating content that either adhered to or violated guideline documents using Evaluation Driven Development (EDD)
Anti-fraud system for remote exam proctoring
- Developed Transformer-based Automatic Speaker Verification (ASV) system
- Implemented Huggingface X-Vector embeddings for voice analysis
- Deployed using SageMaker Studio with continuous monitoring
- Ensured fairness across demographics using fairlearn
Large-scale hate speech detection platform
- Created Huggingface-based text classification models
- Led development of novel hate speech datasets with community input
- Implemented robust Inter-Rater Reliability (IRR) analyses
- Solo Developer of prototype hate speech NLP classifier based on RoBERTa architecture
- Used to process more than 600,000,000 social media posts
- Co-author of multiple public-facing reports on violative content on social media
Led development of sentiment analysis dashboard
- Built Flask frontend with interactive Bokeh visualizations
- Integrated multiple sentiment APIs (Perspective, Sentropy)
- Developed custom PyTorch classifiers
- Created scalable data collection pipeline using scrapy and BeautifulSoup
CLI tool for analyzing extremist networks on Steam platform
- Developed Python crawler for Steam social network analysis
- Implemented n-degree network traversal algorithm
- Generated data for ADL's public Steam Extremism report
- Senior Data Scientist @ ResMed (2023 - Present)
- Provider Experience Team member
- Senior Data Scientist @ Educational Testing Service (2022 - 2024)
- AI Platform Development Team member
- Data Science Lead @ Tata Consultancy Services - Apple Service Team (2022)
- Contract developer for internal tooks at Apple Global Business Intelligence (GBI)
- Research Software Engineer @ Anti-Defamation League, Center for Technology & Society (2018 - 2022)
- First FTE Data Science Role at ADL
- Mozilla Open Leaders Mentor @ Mozilla, Volunteer (2017-2020)
- Project lead in Mozilla's first external round of the program, returned as mentor four times
- Graduate Research Fellow @ UC San Diego, ECE (2011 - 2013)
- Conducted rotations in cellular neuroscience microscopy, semantic web for neuroscience, and neuroimaging
- Languages & Tools: Python (15+ years), Git/Github (10+ years)
- Data Science: NumPy, Pandas, Scikit-learn, Jupyter (9+ years)
- Machine Learning: PyTorch (3+ years), HuggingFace (2+ years)
- Cloud & DevOps: AWS (S3, EC2, SageMaker), SQL, TrueFoundry
- Visualization: Bokeh, Plotly, Streamlit
- Emerging Tech: LLMs, RAG, Prompt Engineering
- Operational Excellence Award - ETS, for AI authenticity detection development
- Above and Beyond Work Award - ADL, for 2019 Hanau Shooting response
- NSF Graduate Research Fellowship (2010) - For STEM graduate studies
University of California, San Diego (2011-2014)
- Department of Electrical and Computer Engineering (Ph.D. Dropout)
University of California, Irvine (2006-2011)
- B.S. Computer Science
- B.S. General Engineering
- B.A. Philosophy