Workbench

Projects worth bookmarking.

A blend of production deployments, community contributions, and joyful experiments. Each card includes the stack, context, and where to dive deeper.

Maya

Dec 2024

As part of the Cohere 4 AI community, trained a novel Vision Language Model on a multilingual instruction dataset curated by the team. Published paper on arXiv.

Vision-Language ModelsPyTorchMultilingual AI

HuggingFace 🤗 Contributions

2024 – Present

Contributed notebooks and documentation for knowledge distillation in computer vision and PII detection for LLM gateways as part of community initiatives.

Computer VisionKnowledge DistillationLLMsPII Detection

Morph Chess

2025

A distributed chess system that runs multiple chess agents on morph cloud instances with real-time visualization.

Distributed SystemsCloud ComputingReal-time Visualization

Topic Auto-label

Nov 2024

Released a pip package to automatically label text, image, and video data using LLMs for topic identification. Supports local LLMs via Ollama and pydantic for structured output.

PythonLLMsOllamaPydantic

Manifest Climate

2023

Led a University of Waterloo Data Science Club team to build a labeling tool, create custom datasets, and fine-tune DistilBERT for climate disclosures—unlocking 16 new signals and slashing LLM API costs by ~99.9%.

DistilBERTData LabelingClimate TechNLP

Text2SQL

2023

Fine-tuned a Llama-based model on synthetic data to answer natural language queries about an SQLite database by generating SQL and interpreting the results. Achieved 86% accuracy on held-out tasks.

LLM Fine-tuningSQLiteNatural Language Processing

DotaLLM

2024

Trained a YOLO model for enemy detection and used the detections to prompt Cohere’s Command-R+ for movement and combat actions.

YOLOObject DetectionCohereGame AI

Dreambella

2023

A Dreambooth fine-tune of Stable Diffusion on a very important subject: my dog, Bella.

Stable DiffusionDreamboothFine-tuning

Titanic Challenge in Production

2020

Created synthetic data with simulated drift for an introductory lesson covering TensorFlow Extended, drift monitoring, and CTGAN-based tabular generation.

TensorFlow ExtendedCTGANData DriftEducational Content