01_The Philosophy: "Vibes Based Engineering"
"I don't do hyperparameter sweeps. I just pick them myself... based on vibes. I've read the research papers. I know what I'm doing."
- Kache
The Anti-Academic Approach
Modern engineering—especially at big tech companies (McDonald's)—is paralyzed by "process." They run grid searches (testing 1000 settings) to find the "perfect" number. They hold meetings. They wait.
The Kache Method is about High Agency and Internalized Knowledge.
- Intuition over Brute Force: If you truly understand the math (you read the papers), you don't need a computer to guess the settings for you. You know the learning rate should be 3e-4. You feel it.
- The "Fuck It" Loop: Don't wait for permission. Don't wait for the perfect plan. Start the training run. If it fails, you learn why instantly.
- Speed is Quality: A fast iteration loop (trying 10 things in a day) beats a slow "perfect" loop (trying 1 thing in a week).
"My Shit is Stable"
When he says this, he refers to the stability of his algorithms. In Reinforcement Learning (RL), training often collapses (the robot spins in circles and learns nothing). He has tuned his code so aggressively that he can throw anything at it—randomization, quantization, noise—and it still learns. He has removed the fragility.
02_The AI Stack: Sim2Real & Vision
"Rendered pixels... 100,000 triangles per room... procedurally generated worlds. Is it going to generalize? Of course it's going to generalize."
The Core Problem: Generalization
You want to train a robot to walk or fly. You can't do it in the real world because it takes too long and the robot breaks when it falls. You must use a Simulation (like a video game).
The Gap: The real world is messy. Simulations are clean. When you move the brain from the game to reality, it usually fails.
The Solution: Domain Randomization
Kache's approach is brute-force chaos.
- Procedural Generation: He doesn't train the AI in one room. He writes code to generate infinite rooms with random walls, floors, and obstacles.
- Visual Randomization: He changes the lighting, textures, and colors every few seconds.
- The Result: The AI stops looking at "color of the floor" to figure out where it is (because the color keeps changing). It is forced to learn the geometry of the room. This makes it robust enough for the real world.
The Architecture: CNN + RL
He mentions a "Convolutional Neural Network" (CNN). Here is the pipeline he is likely building:
INPUT [Camera Image] ---> CNN (Visual Cortex) ---> [Features]
INPUT [Sensor Data] ---> MLP (Body State) ---> [State Vector]
[Features + State] ---> PPO Policy (Brain) ---> OUTPUT [Motor Commands]
Quantization: He mentions "doing a little quantization." This means converting the AI's math from 32-bit (heavy, slow) to 8-bit (light, fast). This makes the AI "dumb" mathematically, but allows it to run 4x faster on a small robot chip.
03_The Silicon: Hardware Speedrunning
"Design a stereo camera PCB... KiCad... 24 hour turnaround... It's a point and click adventure game."
The Workflow: KiCad -> JLC
Designing electronics used to be a dark art. Kache treats it like a video game.
- Schematic (The Logic): Draw how the chips connect (CPU pin 1 -> Sensor pin 4).
- Layout (The Art): Arranging the physical chips on the board and drawing the copper wires (traces). This is the "Point and Click Adventure."
- Fabrication (The Speed): He sends the files to a rapid prototyping house (likely JLCPCB in China). They build the board overnight and ship it via air.
The Sensors
Stereo Camera: Two cameras side-by-side. The computer measures the difference between the two images (parallax) to calculate how far away things are. It works just like human eyes.
Time of Flight (ToF): A sensor that shoots a grid of invisible infrared lasers. It measures the time it takes for the light to bounce back. This gives you a perfect "Depth Map" (a 3D image of the world) even in total darkness.
04_Build It Yourself: The BOM
You want to build what he's talking about? A robot that sees in 3D and learns via AI? Here is your shopping list.
The Brain (Compute)
| Component |
Recommendation |
Why? |
| AI Computer |
NVIDIA Jetson Orin Nano (or Raspberry Pi 5) |
You need GPU cores to run the Neural Networks. The Jetson is the gold standard for robotics. |
| Microcontroller |
ESP32 or STM32 |
For low-level motor control. The Jetson thinks; the ESP32 moves the muscles. |
The Eyes (Sensors)
| Component |
Recommendation |
Details |
| Stereo Camera |
IMX219 Dual Camera Module |
Two Sony sensors on one board. Connects directly to Jetson/Pi. |
| ToF Sensor |
VL53L5CX |
An 8x8 zone laser ranging sensor. Gives you a matrix of distances. Cheap and fast. |
The Software Stack
Don't write everything from scratch. Stand on the shoulders of giants (then optimize based on vibes).
- Training: Isaac Gym (NVIDIA) or Mujoco. These are the physics simulators where you generate the "worlds."
- PCB Design: KiCad 8.0. Free, open source, professional grade.
- AI Framework: PyTorch. Don't use TensorFlow. Kache implies PyTorch when he talks about research papers.
05_The Glossary
HYPERPARAMETERS
The "settings" for the AI brain before it starts learning. (e.g., Learning Rate, Batch Size). Choosing these wrong means the AI learns nothing.
INFERENCE
When the AI is actually *working* (predicting), not learning. "Running inference on the edge" means the robot is thinking for itself, not connected to a server.
QUANTIZATION
Compressing the AI. Turning 32-bit floating point numbers (0.123456789) into 8-bit integers (12). It reduces precision slightly but increases speed massively.
SIM2REAL
Simulation to Reality. The hardest problem in robotics. Making code that works in a video game work in the physical world.
PCB (Printed Circuit Board)
The green (or black) board that holds chips. "Spinning a board" means designing and ordering a new one.
TRACES
The copper wires embedded inside a circuit board. Kache calls dragging these around a "point and click adventure."
EPOCH / STEP
One cycle of learning. "3 billion steps" means the AI has tried to solve the problem 3 billion times in the simulation.
CONVOLUTIONAL (CNN)
A type of AI architecture designed specifically for processing grids of data, like images (pixels).