Projects

Selected publications, research projects, and independent ventures.

Medical Imaging & Clinical AI

Biomedical Imaging

PneumoXttention

IEEE ISPA 2021

PneumoXttention is a convolutional neural network ensemble designed to detect pneumonia in chest X-rays while explicitly reducing human diagnostic error. The model was evaluated against radiologist performance on the RSNA and NIH ChestX-ray datasets, demonstrating strong F1 scores and consistent sensitivity across patient subgroups. The work focuses on reliability and clinical validation rather than purely maximizing accuracy.

Skills

Medical computer visionCNN ensemblesModel validation against cliniciansPyTorchPerformance metrics (F1, sensitivity, specificity)

Mentors

Dr. Colin-Whitby StrevensDr. Sussane Soin

IEEE Paper (coming soon)Code (GitHub) (coming soon)

Biomedical Imaging

Automated Coronary Calcium Scoring

IEEE MIT URTC 2022

Developed a semi-supervised U-Net model to estimate coronary artery calcium scores from non-gated CT scans, enabling cardiovascular risk assessment without specialized imaging protocols. Introduced targeted cropping techniques that reduced mean absolute error by 91% and improved F1 score by 32%, significantly outperforming baseline approaches.

Skills

Medical image segmentationU-Net architecturesSemi-supervised learningCT image preprocessingQuantitative error analysis

Mentors

Weicheng DaiMs. Kirti Tamhane

IEEE Paper→

Biomedical Imaging

CheX-Nomaly

arXiv Preprint

CheX-Nomaly introduces a Siamese U-Net framework with contrastive learning to localize thoracic abnormalities across 14+ disease categories. The model prioritizes cross-dataset generalization and robustness to spurious correlations, addressing perceptual diagnostic errors common in chest X-ray interpretation rather than optimizing single-dataset performance.

Skills

Contrastive learningSiamese networksWeakly supervised localizationMedical imaging generalizationResearch evaluation & ablations

Mentors

Chenyu YouMs. Swetha Bhattacharya

arXiv Preprint→Code (GitHub) (coming soon)

Biomedical Imaging

Mask R-CNN for Brain Tumor Segmentation

arXiv Preprint

Applied Mask R-CNN with image subtraction techniques to segment heterogeneous brain tumors from MRI scans. The approach improved tumor boundary delineation and achieved a DICE coefficient of 0.75, outperforming standard segmentation baselines on complex tumor morphologies.

Skills

MRI analysisInstance segmentationMask R-CNNImage preprocessing & subtractionEvaluation metrics (DICE)

Mentors

Dr. Sussane SoinMs. Sunita Gehlot

arXiv Preprint→

Biomedical Imaging

Minimization of False Negatives / Positives

arXiv Preprint

Proposed a post-pretraining input-adjustment method to reduce false negatives and false positives in binary classification tasks. The method consistently improved performance across multiple datasets, demonstrating that targeted input perturbations can correct decision boundary biases without retraining large models.

Skills

Classification theoryError analysisModel calibrationAlgorithmic robustnessCross-dataset validation

Mentors

Ms. Sujatha Raghu

arXiv Preprint→

Time-Series & Sustainability

Time-Series & Signals

Water Consumption Analysis

Independent Project

Designed unsupervised machine learning pipelines to disaggregate household water consumption into appliance-level usage. Used clustering and distance-based algorithms (K-Means, Dynamic Time Warping) to infer usage patterns from raw meter data, enabling conservation insights and cost-saving recommendations without smart-meter hardware.

Skills

Time-series analysisUnsupervised learningDTWSustainability analyticsFeature engineering

Mentors

Ms. Swetha BhattacharyaWeicheng Dai

Technical write-up (coming soon)Visualizations (coming soon)

Knowledge Systems & NLP

Knowledge Systems

News–to–ArXiv Pipeline

Independent Project | 2025–Present

Built an automated Python pipeline that maps 500+ real-world news queries into arXiv search tasks. The system uses LLMs and heuristic ranking to retrieve top-10 relevant academic papers from a corpus of 50k+ articles, enabling scalable discovery of emerging research topics from noisy media signals.

Skills

NLP pipelinesLLM orchestrationInformation retrievalEmbeddings & rankingSystem design

GitHub (coming soon)Live demo (coming soon)

Enterprise ML & Production Systems

Production ML

Credit Card Fraud Detection

MIT CSAIL · Prof. Amar Gupta

Developed LSTM and Transformer-based fraud detection models trained on over 1M transaction records. Enhanced dataset diversity using synthetic data generation, improving F1 score by ~12% and significantly increasing precision on rare fraud cases.

Skills

Sequential modelingLSTMs & TransformersFraud detectionSynthetic data generationLarge-scale ML training

Mentors

Prof. Amar Gupta

Project summary (coming soon)

Production ML

O-Health Symptom Extractor

MIT CSAIL · Prof. Amar Gupta

Processed 50k+ patient records using unsupervised learning to cluster symptoms and infer medical specializations. Achieved over 60% accuracy in specialization prediction, demonstrating the viability of clustering-based classification in noisy clinical text.

Skills

Clinical NLPUnsupervised learningClustering & feature extractionHealthcare data analysis

Mentors

Prof. Amar Gupta

Internal report (coming soon)

Production ML

Kognitos – Engineering Internship

Summer 2025

Built automated LLM + regex pipelines to improve error messaging in enterprise workflows. The system reduced customer wait times by 89% and achieved 99% structural match accuracy across production logs, directly impacting customer experience at scale.

Skills

LLM engineeringRegex pipelinesProduction ML systemsEnterprise automationMetrics-driven iteration

Case study (coming soon)

Patient Embeddings & Representation Learning

Patient Embeddings

Cardiovascular Disease Embeddings

MIT CSAIL · Prof. Manolis Kellis

Developed personalized latent space embeddings for 6,000+ patient records to model cardiovascular disease progression. Used representation learning and dimensionality reduction to uncover phenotypic similarity clusters and longitudinal disease patterns.

Skills

Representation learningLatent space modelingHealthcare analyticsDimensionality reductionPopulation-scale data analysis

Mentors

Prof. Manolis Kellis

Research summary (coming soon)

Additional Projects

Time-Series & Signals

Research / Product

Water Consumption Disaggregation

Appliance-level inference from smart-meter time series using shape-based clustering.

MethodsK-Means, DTW

DomainUtilities

Impact5–10% water reduction

Explore Project→

Knowledge Systems

Founder / Product

Synapses.news Platform

End-to-end system embedding, clustering, and visualizing latent structure in news & research.

MethodsSentenceTransformers, HDBSCAN

DomainScientific discovery

ImpactLive platform

Explore Project→

Knowledge Systems

Research

O-Health Symptom Extractor

Unsupervised clustering of 50k+ patient records to infer medical specializations.

MethodsClustering, NLP

DomainHealthcare

Impact>60% accuracy

Explore Project→

Production ML

Professional Experience

Kognitos LLM Error Repair Pipelines

LLM-driven exception tracing and semantic error repair in production systems.

MethodsLLMs, heuristics

DomainEnterprise ML

Impact+89% clarity

Explore Project→

Production ML

Professional Experience

Django Feature-Flag Microservice

Scalable backend integrating LaunchDarkly for 500+ production tags.

MethodsDjango, Docker

DomainSoftware systems

ImpactLatency ↓35%

Explore Project→