Projects

Selected publications, research projects, and independent ventures.

Medical Imaging & Clinical AI

Biomedical Imaging

PneumoXttention

IEEE ISPA 2021

PneumoXttention is a convolutional neural network ensemble designed to detect pneumonia in chest X-rays while explicitly reducing human diagnostic error. The model was evaluated against radiologist performance on the RSNA and NIH ChestX-ray datasets, demonstrating strong F1 scores and consistent sensitivity across patient subgroups. The work focuses on reliability and clinical validation rather than purely maximizing accuracy.

Skills
Medical computer visionCNN ensemblesModel validation against cliniciansPyTorchPerformance metrics (F1, sensitivity, specificity)
Mentors
Dr. Colin-Whitby StrevensDr. Sussane Soin
IEEE Paper (coming soon)Code (GitHub) (coming soon)
Biomedical Imaging

Automated Coronary Calcium Scoring

IEEE MIT URTC 2022

Developed a semi-supervised U-Net model to estimate coronary artery calcium scores from non-gated CT scans, enabling cardiovascular risk assessment without specialized imaging protocols. Introduced targeted cropping techniques that reduced mean absolute error by 91% and improved F1 score by 32%, significantly outperforming baseline approaches.

Skills
Medical image segmentationU-Net architecturesSemi-supervised learningCT image preprocessingQuantitative error analysis
Mentors
Weicheng DaiMs. Kirti Tamhane
Biomedical Imaging

CheX-Nomaly

arXiv Preprint

CheX-Nomaly introduces a Siamese U-Net framework with contrastive learning to localize thoracic abnormalities across 14+ disease categories. The model prioritizes cross-dataset generalization and robustness to spurious correlations, addressing perceptual diagnostic errors common in chest X-ray interpretation rather than optimizing single-dataset performance.

Skills
Contrastive learningSiamese networksWeakly supervised localizationMedical imaging generalizationResearch evaluation & ablations
Mentors
Chenyu YouMs. Swetha Bhattacharya
arXiv PreprintCode (GitHub) (coming soon)
Biomedical Imaging

Mask R-CNN for Brain Tumor Segmentation

arXiv Preprint

Applied Mask R-CNN with image subtraction techniques to segment heterogeneous brain tumors from MRI scans. The approach improved tumor boundary delineation and achieved a DICE coefficient of 0.75, outperforming standard segmentation baselines on complex tumor morphologies.

Skills
MRI analysisInstance segmentationMask R-CNNImage preprocessing & subtractionEvaluation metrics (DICE)
Mentors
Dr. Sussane SoinMs. Sunita Gehlot
Biomedical Imaging

Minimization of False Negatives / Positives

arXiv Preprint

Proposed a post-pretraining input-adjustment method to reduce false negatives and false positives in binary classification tasks. The method consistently improved performance across multiple datasets, demonstrating that targeted input perturbations can correct decision boundary biases without retraining large models.

Skills
Classification theoryError analysisModel calibrationAlgorithmic robustnessCross-dataset validation
Mentors
Ms. Sujatha Raghu

Time-Series & Sustainability

Time-Series & Signals

Water Consumption Analysis

Independent Project

Designed unsupervised machine learning pipelines to disaggregate household water consumption into appliance-level usage. Used clustering and distance-based algorithms (K-Means, Dynamic Time Warping) to infer usage patterns from raw meter data, enabling conservation insights and cost-saving recommendations without smart-meter hardware.

Skills
Time-series analysisUnsupervised learningDTWSustainability analyticsFeature engineering
Mentors
Ms. Swetha BhattacharyaWeicheng Dai
Technical write-up (coming soon)Visualizations (coming soon)

Knowledge Systems & NLP

Knowledge Systems

News–to–ArXiv Pipeline

Independent Project | 2025–Present

Built an automated Python pipeline that maps 500+ real-world news queries into arXiv search tasks. The system uses LLMs and heuristic ranking to retrieve top-10 relevant academic papers from a corpus of 50k+ articles, enabling scalable discovery of emerging research topics from noisy media signals.

Skills
NLP pipelinesLLM orchestrationInformation retrievalEmbeddings & rankingSystem design
GitHub (coming soon)Live demo (coming soon)

Enterprise ML & Production Systems

Production ML

Credit Card Fraud Detection

MIT CSAIL · Prof. Amar Gupta

Developed LSTM and Transformer-based fraud detection models trained on over 1M transaction records. Enhanced dataset diversity using synthetic data generation, improving F1 score by ~12% and significantly increasing precision on rare fraud cases.

Skills
Sequential modelingLSTMs & TransformersFraud detectionSynthetic data generationLarge-scale ML training
Mentors
Prof. Amar Gupta
Project summary (coming soon)
Production ML

O-Health Symptom Extractor

MIT CSAIL · Prof. Amar Gupta

Processed 50k+ patient records using unsupervised learning to cluster symptoms and infer medical specializations. Achieved over 60% accuracy in specialization prediction, demonstrating the viability of clustering-based classification in noisy clinical text.

Skills
Clinical NLPUnsupervised learningClustering & feature extractionHealthcare data analysis
Mentors
Prof. Amar Gupta
Internal report (coming soon)
Production ML

Kognitos – Engineering Internship

Summer 2025

Built automated LLM + regex pipelines to improve error messaging in enterprise workflows. The system reduced customer wait times by 89% and achieved 99% structural match accuracy across production logs, directly impacting customer experience at scale.

Skills
LLM engineeringRegex pipelinesProduction ML systemsEnterprise automationMetrics-driven iteration
Case study (coming soon)

Patient Embeddings & Representation Learning

Patient Embeddings

Cardiovascular Disease Embeddings

MIT CSAIL · Prof. Manolis Kellis

Developed personalized latent space embeddings for 6,000+ patient records to model cardiovascular disease progression. Used representation learning and dimensionality reduction to uncover phenotypic similarity clusters and longitudinal disease patterns.

Skills
Representation learningLatent space modelingHealthcare analyticsDimensionality reductionPopulation-scale data analysis
Mentors
Prof. Manolis Kellis
Research summary (coming soon)

Additional Projects

Time-Series & Signals
Research / Product

Water Consumption Disaggregation

Appliance-level inference from smart-meter time series using shape-based clustering.

MethodsK-Means, DTW
DomainUtilities
Impact5–10% water reduction
Knowledge Systems
Founder / Product

Synapses.news Platform

End-to-end system embedding, clustering, and visualizing latent structure in news & research.

MethodsSentenceTransformers, HDBSCAN
DomainScientific discovery
ImpactLive platform
Knowledge Systems
Research

O-Health Symptom Extractor

Unsupervised clustering of 50k+ patient records to infer medical specializations.

MethodsClustering, NLP
DomainHealthcare
Impact>60% accuracy
Production ML
Professional Experience

Kognitos LLM Error Repair Pipelines

LLM-driven exception tracing and semantic error repair in production systems.

MethodsLLMs, heuristics
DomainEnterprise ML
Impact+89% clarity
Production ML
Professional Experience

Django Feature-Flag Microservice

Scalable backend integrating LaunchDarkly for 500+ production tags.

MethodsDjango, Docker
DomainSoftware systems
ImpactLatency ↓35%