Workbench

Projects worth bookmarking.

A blend of production deployments, community contributions, and joyful experiments. Each card includes the stack, context, and where to dive deeper.

Maya

Dec 2024

As part of the Cohere 4 AI community, trained a novel Vision Language Model on a multilingual instruction dataset curated by the team. Published paper on arXiv.

Vision-Language ModelsPyTorchMultilingual AI

🤗HuggingFace arXiv

HuggingFace 🤗 Contributions

2024 – Present

Contributed notebooks and documentation for knowledge distillation in computer vision and PII detection for LLM gateways as part of community initiatives.

Computer VisionKnowledge DistillationLLMsPII Detection

Website 🤗HuggingFace

Morph Chess

2025

A distributed chess system that runs multiple chess agents on morph cloud instances with real-time visualization.

Distributed SystemsCloud ComputingReal-time Visualization

GitHub

Topic Auto-label

Nov 2024

Released a pip package to automatically label text, image, and video data using LLMs for topic identification. Supports local LLMs via Ollama and pydantic for structured output.

PythonLLMsOllamaPydantic

GitHub

Manifest Climate

2023

Led a University of Waterloo Data Science Club team to build a labeling tool, create custom datasets, and fine-tune DistilBERT for climate disclosures—unlocking 16 new signals and slashing LLM API costs by ~99.9%.

DistilBERTData LabelingClimate TechNLP

Text2SQL

2023

Fine-tuned a Llama-based model on synthetic data to answer natural language queries about an SQLite database by generating SQL and interpreting the results. Achieved 86% accuracy on held-out tasks.

LLM Fine-tuningSQLiteNatural Language Processing

DotaLLM

2024

Trained a YOLO model for enemy detection and used the detections to prompt Cohere’s Command-R+ for movement and combat actions.

YOLOObject DetectionCohereGame AI

GitHub

Dreambella

2023

A Dreambooth fine-tune of Stable Diffusion on a very important subject: my dog, Bella.

Stable DiffusionDreamboothFine-tuning

GitHub

Titanic Challenge in Production

2020

Created synthetic data with simulated drift for an introductory lesson covering TensorFlow Extended, drift monitoring, and CTGAN-based tabular generation.

TensorFlow ExtendedCTGANData DriftEducational Content

Watch Video