maya chen · ml research engineer
Building safer, more interpretable neural networks at the frontier of AI research.
↓ scroll
01 / how it works
Artificial neural networks (ANNs) are the decision-making framework behind large language models like GPT and Claude. They're loosely inspired by the human brain — millions of simple units called neurons arranged in layers, passing signals to each other.
An ANN has three sections: an input layer that receives raw data, one or more hidden layers that transform it, and an output layer that produces a prediction. The hidden layers are where the model learns patterns — adjusting billions of internal weights through training.
Training a network means showing it many examples and letting it adjust until its predictions match reality. Ask it: what is the probability of sharks in the water at the Florida Gulf Coast in summer? The model learns to produce a number like:
That number doesn't come from a hardcoded rule — it emerges from the network being trained on thousands of historical sightings, water temperatures, beach reports, and seasonal patterns. The model finds the relationships humans would miss.
Different architectures specialize in different kinds of data.
CNN
Convolutional Neural Network
Built for images. Scans inputs with filters that detect edges, textures, then shapes — building understanding layer by layer.
RNN
Recurrent Neural Network
Built for sequences. Processes inputs one step at a time while carrying memory of what came before.
TRANSFORMER
Attention-based
The architecture behind modern LLMs. Looks at every position simultaneously through attention.
These architectures quietly run underneath much of modern life. CNNs power facial recognition on your phone, identify tumors in medical scans, and let autonomous vehicles read road signs. RNNs and transformers handle language and time — translation, transcription, the text you read in chat assistants.
02 / selected work
How gradient descent navigates non-convex surfaces, and why some initializations escape local minima while others get stuck.
A study of induction heads in small transformers, mapping the circuits responsible for in-context learning.
How concept clusters form during pretraining, and what they reveal about a model's internal taxonomy.
Lightweight tooling for visualizing loss curves, gradient norms, and activation statistics in real time.
03 / writing and talks
04 / contact
Open to research collaborations, advising roles, and speaking engagements.
Available times on May :
step 2 of 2 · project details
inquiry sent
Maya will reply within 2 business days to confirm your meeting on May at .