Maya Chen — ML Research Engineer

01 / how it works

What is a neural network?

Artificial neural networks (ANNs) are the decision-making framework behind large language models like GPT and Claude. They're loosely inspired by the human brain — millions of simple units called neurons arranged in layers, passing signals to each other.

An ANN has three sections: an input layer that receives raw data, one or more hidden layers that transform it, and an output layer that produces a prediction. The hidden layers are where the model learns patterns — adjusting billions of internal weights through training.

Training a network means showing it many examples and letting it adjust until its predictions match reality. Ask it: what is the probability of sharks in the water at the Florida Gulf Coast in summer? The model learns to produce a number like:

P(shark | location=Gulf, season=summer, time=day) = 0.34

That number doesn't come from a hardcoded rule — it emerges from the network being trained on thousands of historical sightings, water temperatures, beach reports, and seasonal patterns. The model finds the relationships humans would miss.

Three families of neural networks

Different architectures specialize in different kinds of data.

CNN

Convolutional Neural Network

Built for images. Scans inputs with filters that detect edges, textures, then shapes — building understanding layer by layer.

RNN

Recurrent Neural Network

Built for sequences. Processes inputs one step at a time while carrying memory of what came before.

TRANSFORMER

Attention-based

The architecture behind modern LLMs. Looks at every position simultaneously through attention.

How does this help you?

These architectures quietly run underneath much of modern life. CNNs power facial recognition on your phone, identify tumors in medical scans, and let autonomous vehicles read road signs. RNNs and transformers handle language and time — translation, transcription, the text you read in chat assistants.

Facial recognition (CNN)

Self-driving cars (CNN)

Medical imaging (CNN)

Speech recognition (RNN)

Stock prediction (RNN)

Language translation (Transformer)

02 / selected work

Research and projects

optimization 2024 · neurips

Loss landscape geometry in deep networks

How gradient descent navigates non-convex surfaces, and why some initializations escape local minima while others get stuck.

interpretability 2024 · iclr

What attention heads actually attend to

A study of induction heads in small transformers, mapping the circuits responsible for in-context learning.

representations 2023 · icml

Emergent clusters in latent space

How concept clusters form during pretraining, and what they reveal about a model's internal taxonomy.

training dynamics 2023 · open source

nano-trace: training observability library

Lightweight tooling for visualizing loss curves, gradient norms, and activation statistics in real time.

04 / contact

Let's work together

Open to research collaborations, advising roles, and speaking engagements.

Email X Book A Call Home

step 1 of 2 · pick a date

May 2026 ‹ ›

SUN

MON

TUE

WED

THU

FRI

SAT

Available times on May :

9:00 AM

11:00 AM

1:00 PM

2:30 PM

4:00 PM

5:00 PM

step 2 of 2 · project details

Tell me about your project

📅 May , 2026 · PT

Your name

Project type

Research collaboration

Advising

Speaking

Other

What you'd like to discuss

inquiry sent

Talk to you soon!

Maya will reply within 2 business days to confirm your meeting on May at .

Training models that reason