<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Autonomy Loop]]></title><description><![CDATA[Covering the intersection of AI and Robotics.  Sharing learnings as I attempt to build a useful AI-enabled robot.]]></description><link>https://www.theautonomyloop.com</link><image><url>https://substackcdn.com/image/fetch/$s_!eMGY!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0697183a-69ef-4851-931b-fc5107e5c85a_400x400.png</url><title>The Autonomy Loop</title><link>https://www.theautonomyloop.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 11 Apr 2026 20:23:14 GMT</lastBuildDate><atom:link href="https://www.theautonomyloop.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ajay Kapal]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[autonomyloop@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[autonomyloop@substack.com]]></itunes:email><itunes:name><![CDATA[Ajay Kapal]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ajay Kapal]]></itunes:author><googleplay:owner><![CDATA[autonomyloop@substack.com]]></googleplay:owner><googleplay:email><![CDATA[autonomyloop@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ajay Kapal]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Making Computer Vision Stick]]></title><description><![CDATA[Reconstructing the first 3 lectures of CS231n]]></description><link>https://www.theautonomyloop.com/p/making-computer-vision-stick</link><guid isPermaLink="false">https://www.theautonomyloop.com/p/making-computer-vision-stick</guid><dc:creator><![CDATA[Ajay Kapal]]></dc:creator><pubDate>Fri, 05 Dec 2025 13:31:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zbjH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zbjH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zbjH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 424w, https://substackcdn.com/image/fetch/$s_!zbjH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 848w, https://substackcdn.com/image/fetch/$s_!zbjH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 1272w, https://substackcdn.com/image/fetch/$s_!zbjH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zbjH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif" width="500" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114164,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/180406704?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zbjH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 424w, https://substackcdn.com/image/fetch/$s_!zbjH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 848w, https://substackcdn.com/image/fetch/$s_!zbjH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 1272w, https://substackcdn.com/image/fetch/$s_!zbjH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38b324c0-bb31-4648-a5cc-f1b2ae44fc0a_500x375.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Don&#8217;t try this at home.</figcaption></figure></div><p>Richard Feynman says memorizing a bunch of facts, or dissecting the terms of an equation doesn&#8217;t mean you understand something. The real test is whether you can explain it in your own words and minimize the jargon. In that spirit, I&#8217;ll share what I&#8217;ve learned in the first 3 lectures of CS231n.</p><p>Warning! This is a long post and there&#8217;s math involved. If that&#8217;s not your jam, then skip to the end to check out some cool links. </p><h4>Computer Vision - what&#8217;s it good for?</h4><p>The goal for computer vision is to enable a machine to make sense of visual information. &#8216;Make sense&#8217; means:</p><ol><li><p>classifying different elements of an image or video</p></li><li><p>knowing their names</p></li><li><p>relationships between the elements, like a person walking their dog</p></li><li><p>understanding intent and context so we know if an image is funny, if someone is doing something dangerous, navigating through an obstacle course</p></li></ol><p>Computer vision does #1 and #2 well, struggles with #3, and has a ways to go for #4.</p><p>The class has focused on item #1 for the first 3 lectures. That makes sense - you need to classify images properly before you can do anything else. </p><h4>Classification - overview</h4><p>How does classification work? Generally speaking:</p><ol><li><p>Start with a set of correctly classified images, called the &#8216;training set&#8217;</p></li><li><p>Use the training set to train a model</p></li><li><p>Run an image through the model which assigns a class to it. We call this a prediction because we don&#8217;t know if the model is right.</p></li><li><p>Determine how wrong the prediction is by using a &#8216;loss function&#8217;</p></li><li><p>Use the output of the loss function to adjust the model so it is less wrong next time.</p></li></ol><h4>Naive ways to classify an image</h4><p>The first two approaches we covered are Nearest Neighbor (NN) and an extension of it, called k Nearest Neighbor (kNN). These use only the first 3 steps outlined in the previous section. Also, training step (step 2) is super easy because there&#8217;s not much training to speak of: the model is &#8216;trained&#8217; by having copy of every test image. So let&#8217;s talk about how they work.</p><h5>Naive approach 1: Nearest Neighbor</h5><p>This classification approach is pretty simple. Each of the test image&#8217;s pixels is compared to the corresponding pixel in a training image by calculating the average manhattan (L1) or euclidean (L2) distance between the pixels. Repeat for all images in the training set. The test image is classified to the closest training image - the one with the lowest distance score.</p><p>Problems with this approach include:</p><ul><li><p>Two wildly different images can be incorrectly assigned to the same class if they are mostly the same color (e.g. lighthouse near the sea and a boat in the say).</p></li><li><p>Does not handle cases where the same images has different lighting/orientation/zoom conditions.</p></li><li><p>It&#8217;s super fast to train but storing all training images takes a lot of memory. Even worse, predictions take a lot of time because the each test image is compared to all images in the training set. Yuck.</p></li></ul><p>Accuracy on the CIFAR data set is 27% so nothing to write home about.</p><h5>Naive Approach #2: k Nearest Neighbor</h5><p>This approach builds on the previous one by using the n closest scores and tallying their classes as a sort of popularity contest; pick the class with the most votes and assign it to the image you&#8217;re trying to classify.</p><p>Accuracy is incrementally better than NN, with ~28% accuracy on CIFAR for k = 5.</p><p><strong>Jargon alert: hyperparameters, cross validation and more</strong></p><p>The first time I encountered <strong>hyperparameters</strong>, I pictured a bunch of parameters that ran around yelling and waving their arms. The reality is much more boring (peaceful?). Parameters are the things that are set during training. Hyperparameters on the other hand, are set before training starts. The &#8216;k&#8217; in kNN is an example of a hyperparameter, as is the choice of function to calculate distance between images (L1, L2, etc).</p><p>How do you pick a value for a hyperparameter like k? You test out a few different values and pick the one that gives the best results. There&#8217;s a problem though. Your test set is like a beautiful engagement ring - rare and precious. You want to save it for your one true love, and in our case that means saving the test set for the final validation phase, after the optimal k value has been chosen. Otherwise, you&#8217;ll choose a hyperparameter that works well for the test set, but bombs out on new images. </p><p>How do we handle this? Well, we can&#8217;t touch the test set, so that leaves the training set. We split the training set into two parts (called <strong>folds) </strong>consisting of a slightly smaller training set, and a small <strong>validation set</strong> that we treat as a fake test set (remember what a test set is for?) to tune the hyperparameters. Using our trusty CIFAR training set as an example, we might shrink the training set from 50,000 images down to 49,000 and use the remaining 1000 images for validation. </p><p>There&#8217;s another problem if your training set (and associated validation set) are small because it becomes super important to choose a representative validation set. Fortunately there&#8217;s a way to use all of the training set as a validation set. You do this in a piecemeal fashion: first by evaluating against the validation set, then eval against a new validation set picked from a different part of the training set, and repeat until all of the training set has been pressed into service as a validation set. Lastly, average  the interim eval scores to get the final score. This process is called <strong>cross-validation.</strong> </p><p>Here&#8217;s an example: split the training set into 5 equal folds. Use the 1st fold for validation, and the remaining 4 as the training folds. Eval the training folds against the validation fold to get an interim eval score. Repeat with the 2nd fold as validation and the other 4 folds as training, until you&#8217;ve used every fold for validation. Take the average at the end. Here&#8217;s an illustration:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pcnL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pcnL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 424w, https://substackcdn.com/image/fetch/$s_!pcnL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 848w, https://substackcdn.com/image/fetch/$s_!pcnL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 1272w, https://substackcdn.com/image/fetch/$s_!pcnL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pcnL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png" width="894" height="410" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:410,&quot;width&quot;:894,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51291,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/180406704?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08abbd5-a682-461a-aaf3-ae67ea0bbba9_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pcnL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 424w, https://substackcdn.com/image/fetch/$s_!pcnL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 848w, https://substackcdn.com/image/fetch/$s_!pcnL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 1272w, https://substackcdn.com/image/fetch/$s_!pcnL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc44ad6-02e6-426a-993d-8ce0b1c46495_894x410.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Visualizing cross validation</figcaption></figure></div><p>These steps are repeated for each value of hyperparameter you want to test. As a result, cross-validation is computationally expensive. Use it only when necessary - typically when each fold is a few hundred images.</p><p>Let&#8217;s move on to fancier classifiers.</p><h4>Linear Classifiers</h4><p>Okay, now we move to a classifier that uses all 5 elements from our overview. This approach extends beyond Linear Classifiers to neural networks so its worth breaking it down to make sure we understand it really well.</p><p>What is a linear classifier? It&#8217;s a classifier that applies a linear function to an image to predict its output class. </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}f(x_i, W, b)= Wx_i+b \\end{equation} \n&quot;,&quot;id&quot;:&quot;XUAAPCIBJI&quot;}" data-component-name="LatexBlockToDOM"></div><p>W and b are <em>parameters</em> that are applied to a training image x&#7522; to give a prediction. What does that mean really? Let&#8217;s start with terms. I&#8217;ve added in values [in brackets] from the <a href="https://www.cs.toronto.edu/~kriz/cifar.html">CIFAR-10 dataset</a>, a collection 50k categorized images, to make things concrete:</p><pre><code>1. There are <strong>N</strong> images in the training set. <strong>[N = 50,000]</strong>

2. An image can be classified in one of <strong>K</strong> categories. <strong>[K = 10]</strong>

3. <strong>x&#7522;</strong> is the i&#8217;th image in the training set, consisting of <strong>D</strong> pixels. It is represented as a [D x 1] column vector. <strong>[D = 32x32x3 = 3072]</strong>

4. Not referred to above is <strong>y&#7522;</strong> which is the label associated with <strong>x&#7522;</strong>. 

5. <strong>W</strong> is a weight matrix, a matrix of size [K x D].

6. <strong>b</strong> is a bias vector, a column vector of size [K x 1].

7. <strong>f</strong>() is the score function.</code></pre><p>Here&#8217;s a nice example from the CS231n notes, modified to use different colors that are less likely to be mistaken as RGB pixel values.  To simplify things, the image is now 4 monochrome pixels (D=4) and there are only 3 categories (K=3).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SGNY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SGNY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 424w, https://substackcdn.com/image/fetch/$s_!SGNY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 848w, https://substackcdn.com/image/fetch/$s_!SGNY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 1272w, https://substackcdn.com/image/fetch/$s_!SGNY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SGNY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png" width="960" height="287" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:287,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51670,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/180406704?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f903e00-d938-406f-b475-8ec6b70f6547_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SGNY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 424w, https://substackcdn.com/image/fetch/$s_!SGNY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 848w, https://substackcdn.com/image/fetch/$s_!SGNY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 1272w, https://substackcdn.com/image/fetch/$s_!SGNY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a98c549-0592-4d44-bfe0-b577d5fe7fd2_960x287.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 1: Mapping an image to a class score (<a href="https://cs231n.github.io/linear-classify/">from CS231n notes</a>)</figcaption></figure></div><p>You can think of each row of W as a classifier for one of the classes (e.g. the top yellow row of W is the &#8216;cat classifier&#8217;). </p><p>To see why we need the bias vector b, think of each pixel of an image as a dimension (so 4 dimensions in our example).  So each image can be represented as a point on a 4 dimensional grid. A linear classifier classifies images by drawing lines/planes/hyperplanes (depending on the # of dimensions) to break up the 4D space into classes. Changing a row of W rotates this line/hyperplane around, but without the bias vector, each line/plane/hyperplane would have to go thru the origin. So the bias vector lets us move our dividers around.</p><h5>I&#8217;m not biased</h5><p>We know now why a bias vector is useful but it&#8217;s kind of annoying to track. Fortunately we can incorporate it into W by smushing b onto the end of W and appending a &#8216;1&#8217; to our image data:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OcoW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OcoW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 424w, https://substackcdn.com/image/fetch/$s_!OcoW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 848w, https://substackcdn.com/image/fetch/$s_!OcoW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 1272w, https://substackcdn.com/image/fetch/$s_!OcoW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OcoW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png" width="785" height="267" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/558abab1-4804-4262-b616-d2c295bf863e_785x267.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:267,&quot;width&quot;:785,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26782,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/180406704?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aaf1277-0ccb-4d0a-b448-b01cb2f74153_820x455.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OcoW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 424w, https://substackcdn.com/image/fetch/$s_!OcoW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 848w, https://substackcdn.com/image/fetch/$s_!OcoW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 1272w, https://substackcdn.com/image/fetch/$s_!OcoW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F558abab1-4804-4262-b616-d2c295bf863e_785x267.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 2: The &#8216;bias trick&#8217;: appending ones to all image vectors means we only have to learn a single matrix of weights! (<a href="https://cs231n.github.io/linear-classify/">from CS231n notes</a>)</figcaption></figure></div><p></p><p>Now we have only a single matrix to worry about. Going back to our CIFAR-10 example, D is now 3072+1 = 3073, and W is now 10 x 3073.</p><h5>A losing proposition</h5><p>Look back at figure 1 (I&#8217;ll wait). Notice something? Our genius classifier thinks our cat is a dog. Before selling our Nvidia shares in a panic, lets try adjusting the weights (W) to see if that helps. But what to adjust them to? That&#8217;s where <strong>loss functions</strong> come in.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zvGS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zvGS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 424w, https://substackcdn.com/image/fetch/$s_!zvGS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 848w, https://substackcdn.com/image/fetch/$s_!zvGS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 1272w, https://substackcdn.com/image/fetch/$s_!zvGS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zvGS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png" width="2618" height="1527" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1527,&quot;width&quot;:2618,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7952973,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/180406704?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24fa7ee0-900e-4567-abe6-b8f4a668d5c5_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zvGS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 424w, https://substackcdn.com/image/fetch/$s_!zvGS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 848w, https://substackcdn.com/image/fetch/$s_!zvGS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 1272w, https://substackcdn.com/image/fetch/$s_!zvGS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F282427fc-8be4-48fb-89db-13f964b23334_2618x1527.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Wally classifies images while Louise sharpens her red pencil&#8230;.</figcaption></figure></div><p>How does a loss function work? Let&#8217;s have a student Wally represents our weights (W). Wally is taking an exam to classify images (X). He writes down which class each image belongs to, and he also writes how confident he is about his answer. Louise is Wally&#8217;s teacher, and represents our loss function (f). She uses the following marking scheme to grade his exam:</p><ul><li><p>Right answer, high confidence that answer is correct: 4 marks</p></li><li><p>Right answer, low confidence that answer is correct: 3 marks</p></li><li><p>Wrong answer, low confidence that answer is correct: 2 marks</p></li><li><p>Wrong answer, high confidence that answer is correct: 1 mark </p></li></ul><p>Well, almost. I&#8217;ve left something out - the marking scheme is actually <strong>reversed</strong>. So a right answer/high confidence gets a 1, and the wrong answer/high confidence gets a 4. You see, Louise has developed a bitter streak after years of teaching, and prefers grading how <strong>unhappy</strong> she is with each answer. Poor Wally gets his exam back and is excited to see a high score on his exam&#8230;until he realizes lower scores are better. </p><p>Like Louise, a loss function measures our <em><strong>unhappiness</strong></em> with predictions on the training set. Low scores are better!</p><p>What does a loss function look like? Let&#8217;s take a look at a couple of them.</p><h6>Multiclass Support Vector Machine (SVM) Loss</h6><p>SVM is our first loss function. It wants a classifier that picks the right class for every image and be confident in its choice. What does that mean?</p><p>Start with the i&#8217;th training image <strong>x&#7522;</strong> along with its correct class in <strong>y&#7522;</strong>. Then the score function <strong>f(x&#7522;, W)</strong> takes <strong>x&#7522;</strong>&#8217;s pixels to calculate the class score for every class. The scores are in a vector <strong>s, </strong>where <strong>s&#11388;</strong> means the <em>j</em>&#8217;th element of <strong>s</strong> (score for the <em>j</em>&#8217;th class). Assigning a large score to a class means the classifier is confident its found the correct class for the image. Assigning a smaller score to a class means the classifier is confident the class isn&#8217;t the right one.</p><p>So how does our SVN grade our scoring function? It wants the score for the right class for an image to be bigger than the score for a wrong class. How much bigger depends on a hyperparameter called &#120665;. This can be safely set to 1, and we&#8217;ll see why when I cover the concept of regularization in the next section.</p><p>If the SVN gets what it wants, there is no loss, and so the loss value stays the same. Otherwise, the margin and the difference between scores is added to the total loss (remember, a lower loss is better). Here&#8217;s the equation for our SVM loss function:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_i = \\sum\\limits_{j\\neq y_i}max(0, s_j -s_{y_i} + \\triangle)&quot;,&quot;id&quot;:&quot;DUYBXVVWWE&quot;}" data-component-name="LatexBlockToDOM"></div><p>The term inside the sum is known as a <strong>hinge loss</strong>.</p><h6>Too many solutions</h6><p>Recall our linear classifier:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}f(x_i, W, b)= Wx_i+b \\end{equation} \n&quot;,&quot;id&quot;:&quot;EGHFOXTRQS&quot;}" data-component-name="LatexBlockToDOM"></div><p>Also recall that we tweak the weights in W to improve its accuracy (I know I haven&#8217;t explained how yet&#8230;we&#8217;re getting there!). And lets say we do such a great job setting W that SVN gives a loss of zero. Perfect! Until we realize there might be other Ws that give the same result. </p><p>Continuing with the perfect SVN, scaling W by a number gives us the same zero loss. To see how, remember that the zero loss for an image is because each wrong class score is bigger than the right class score by &#120665;. Multiplying W by 2 for example will scale all of the class scores by 2.  Double the wrong class score minus double the right class score exceeds the margin &#120665; by even more than before, giving a zero loss.</p><p>So our example has lots of Ws that work. What&#8217;s the problem? The problem is that the influence of the largest weights in W increases as you multiply W by larger numbers. Since the class score is calculated by multiplying the weights by the image pixels (dimensions), you risk giving a few dimensions an outsize influence on the score.</p><p>But you might say - okay, who cares? We still get a zero loss, right? That&#8217;s true, but remember, we are working with training images. When you feed in new images, the scaled up weights don&#8217;t generalize well and perform poorly with new data. W <strong>overfits</strong> the training data. Now that we have a handle on the problem, lets look at a solution.</p><h6>Regularization</h6><p>Regularization doesn&#8217;t make an ideal song lyric<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> - it just doesn&#8217;t roll of the tongue very easily. But that&#8217;s cool, because it solves the overfitting problem by reducing the influence of large weights in W. No tall poppies here!</p><p>We cut W&#8217;s tall poppies down to size by applying a <strong>regularization penalty</strong> to each of its weights. The penalty is usually calculated by using the squared L2 norm, and its size is controlled by yet another hyperparameter (<strong>&#411;</strong>) called the <strong>regularization strength</strong>. Here&#8217;s what that looks like.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R(W) = \\lambda \\sum_k\\sum_l {W^2_{k,l}}&quot;,&quot;id&quot;:&quot;GMOZLCDKJT&quot;}" data-component-name="LatexBlockToDOM"></div><p>We add this term to our loss function L which now has two terms: <strong>data loss</strong> (the original term) and the new term, called <strong>regularization loss.</strong> The loss function looks like this:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\n\\begin{split}\nL_i &amp;= \\text{data loss + regularization loss}\n\\\\\n\\\\\n&amp;= \\sum\\limits_{j\\neq y_i}max(0, s_j -s_{y_i} + \\triangle) + \\lambda R(W)\n\\\\\n\\\\\n&amp;=  \\sum\\limits_{j\\neq y_i}max(0, s_j -s_{y_i} + \\triangle) + \\lambda\\sum_k\\sum_l {W^2_{k,l}}\n\\end{split}\n\\end{equation}&quot;,&quot;id&quot;:&quot;YCIXZKYYKA&quot;}" data-component-name="LatexBlockToDOM"></div><p>This calculates the loss for one image. For the full loss we have to consider the losses for all images in the training set. We do that by calculating the average data loss for all  images, resulting in the <strong>complete multiclass SVM loss</strong>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\n\\begin{split}\nL &amp;= \\dfrac{1}{N} \\sum_iL_i + \\lambda\\sum_k\\sum_l {W^2_{k,l}}\n\n\\\\\n\n&amp;= \\dfrac{1}{N} \\sum_i\\sum\\limits_{j\\neq y_i}max(0, s_j -s_{y_i} + \\triangle) + \\lambda\\sum_k\\sum_l {W^2_{k,l}}\n\\end{split}\n\\end{equation}&quot;,&quot;id&quot;:&quot;IDKAQFYNEE&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><p>One more thing. Earlier, I had said it was okay to set &#916; to 1.0. That&#8217;s because the hyperparameters &#411; and &#916; control the same tradeoff: how much influence each term has on the loss score. To see why, notice that shrinking &#916; shrinks the data loss score. Similarly, shrinking &#411; reduces the regularization loss score. So you can reduce either of these to reduce the complete loss L. Recognizing this, we set &#916; to 1.0 and adjust the regularization strength &#411; to control how much we allow the weights in W to grow.</p><h6>Softmax Loss</h6><p>SVM is fine, but it spits out loss scores that are hard to interpret. Enter the <strong>softmax function</strong>, which converts the scores into probabilities:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;f_j(z) = \\dfrac{e^z_{j}}{\\sum_k e^z_k}&quot;,&quot;id&quot;:&quot;HCIAYRSDMF&quot;}" data-component-name="LatexBlockToDOM"></div><p>This looks intimidating but all it does is take a bunch of loss scores like 239.2 or -12.9 and squishes them to between 0 and 1 so that if you add them all up, you get 1.</p><p>Here&#8217;s what it looks like in our loss function where <strong>f&#11388;</strong> is the j-th element in vector <strong>f</strong> of our class scores:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\n\\begin{split}\nL_i &amp;= -log\\bigg(\\dfrac{e^{f_{y_{i}}}}{\\sum_j{e^{f_j}}}\\bigg)\n\n\\\\\n\\text{full loss:}\n\\\\\n\nL &amp;= \\dfrac{1}{N}\\sum_i{L_i} + R(W)\n\\end{split}\n\\end{equation}&quot;,&quot;id&quot;:&quot;XBCPDGZDMA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Notice in the first equation that the softmax function is wrapped inside a -log() function. Remember that SVM used a threshold to calculate the loss, which was called the <strong>hinge loss</strong>. You can&#8217;t do that, because softmax turned the scores into a bunch of probabilities between 0 and 1, right? So we have to these scores differently, by using a <strong>cross-entropy loss</strong>, and that&#8217;s where the -log() comes from. This post is already super long, and there&#8217;s one more topic to cover, so I&#8217;ll explain cross-entropy loss next time.</p><p><strong>Practical matter:</strong>  Calculating the exponentials in our loss function can cause an overflow. To work around this, subtract the max value in <strong>f</strong> from each element, shifting their values down so the largest number is 0.</p><p>Before we move to our final topic for today, let&#8217;s check in on our hapless student Wally. After several therapy sessions, he&#8217;s come to terms with failing his classification exam. He&#8217;s ready to learn from his exam to do better next time. But he&#8217;s not sure how. Our classifier is in the same boat as Wally: the loss function indicates how unhappy we should be with our classifier&#8217;s results. How can we use what the loss function tells us to improve our classifier?</p><h5>Optimization</h5><p>To review, the core of our linear classifier is W, made up of parameters called weights. L is a loss function that gives a <strong>loss score</strong> for the quality of the weights. We want to tweak the linear classifier to reduce the loss score (remember lower loss scores are better). That means adjusting the weights in W, since that&#8217;s the only part we can change.</p><p>So we want to find the weights that give the lowest loss score. There are many possible weights, and it would take forever to try them all so we need to try something else. </p><p>We could make small random changes to the weights, but that&#8217;s not efficient either. Instead, think of each weight in W as a dimension (10x3073 is 10,370 dimensions) in a <strong>loss landscape</strong>, and you can take a step in any direction in this landscape by making a small incremental change to some or all of the weights. For each step, you want to step in the direction that reduces the loss score compared to your current location. But how?Just follow the gradient.</p><h6>Follow the yellow brick gradient</h6><p>For our purposes, a gradient points in the direction of steepest descent in our loss landscape. You calculate this by taking partial derivatives of the loss function along each dimension resulting in a vector of partial derivatives for each dimension. This vector points to the direction of steepest ascent. But you want to go down, not up so turn around 180&#176; by negating the direction vector. Then take a step in that direction by multiplying it by your step size. Then add the result to W (&#8216;parameter update&#8217;). </p><p>There are two ways to calculate the gradient :</p><ol><li><p>Incrementally. We step forward in one dimension at a time and calculate how much the loss score changes along that dimension. That is easy to calculate, but it&#8217;s kind of lame because you have to calculate L for every dimension (10,370 times) for each parameter update. </p></li><li><p>Analytically. It&#8217;s much faster if you have the equation for the gradient, since you can calculate the gradient for all the dimensions at once. But it&#8217;s a pain to derive, especially if it&#8217;s been a minute since you&#8217;ve taken calculus. </p></li></ol><p>The overall process is called <strong>gradient descent</strong> and it looks like this:</p><ul><li><p>Calculate the gradient L for W (analytically or iteratively).</p></li><li><p>Update the paramter W to step in the direction L is dropping fastest.</p></li><li><p>Repeat until improvements to loss score are minimal/gradients aren&#8217;t changing/weights aren&#8217;t changing.</p></li></ul><p><strong>Practical points: </strong>It&#8217;s easy to make mistakes with the analytic approach, so check it with the iterative approach too.  Choosing a <strong>step size</strong> involves trial and error; too small and progress takes forever; too large, and you may miss the optimal solution.</p><p>Phew!  That&#8217;s it for this week - congratulations if you&#8217;ve read (or skimmed) this far!</p><h3>What I&#8217;ve Been Reading/Watching/Listening to</h3><p><a href="https://publicservicesalliance.org/wp-content/uploads/2025/11/Evans-AI-Eats-the-World-112025.pdf">The State of AI, November 2025</a>: <a href="https://www.ben-evans.com/presentations">Benedict Evans</a> released the latest version of his semi-annual monster slide deck covering the latest macro trends. I&#8217;ve enjoyed his thoughtful analyses for more than a decade. This deck is a must read for anyone trying to understand where AI is, and where it might go. </p><p>Benedict confirms what I believe - that AI is a technology shift as big as the PC/internet/mobile revolutions I&#8217;ve experienced. Is it bigger than fire? People like warmth, so that&#8217;s a pretty high bar.  </p><p><a href="https://youtu.be/d95J8yzvjbQ?si=uYNHV3V7oARmWAtl">The Thinking Game</a>. An engaging documentary about how Demis Hassabis and his team at Google DeepMind used AI to solve protein folding, a problem that scientists had been trying to solve for 50 years with little success. I wish they had gone into more technical details, but I enjoyed the biographical bits and it gives the layperson an idea of how AI is impacts science.</p><p><a href="https://youtu.be/Rni7Fz7208c?si=9wAiMBaDgOG24qEl">Elon Musk interviewed by Nikhil Kamath</a>. I wasn&#8217;t sure what to expect, but it was a good interview. Lots of personal bits, but the relevant takeaway is that he believes robotics + AI are a big deal and will change society in profound ways (you won&#8217;t need to work, unless you want to. money goes away). I obviously agree with the first part , but I&#8217;m doubtful about the second part. Any big technology shift makes some jobs obsolete (newspaper delivery), while creating new ones (Youtube influencer). </p><p><a href="https://shopify.engineering/tangle">Tangle</a>. Shopify open sources a platform to streamline and speed up ML experimentation. Teams visually build data or ML pipelines that run locally (Docker/podman) or in the cloud (Huggingface). Workflows are specified declaratively (vs. code) and let you track package dependencies which makes it easier to reproduce experiments. </p><p><a href="https://arxiv.org/pdf/2408.03539">Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes</a>. This research paper evaluates how effective RL is for locomotion, navigation, object manipulation, and interaction. There&#8217;s a lot to like here. The robotic capabilities covered are relevant to our prototype. It&#8217;s recently published, so it&#8217;s an up to date snapshot. And last but not least, the authors measure an approach&#8217;s effectiveness by considering real world deployment which is obviously a relevant yardstick for us. I&#8217;ll be referring back to this as design and development progresses.</p><p><a href="https://nips.cc/virtual/2025/">NeurIPS 2025</a> - The big annual AI &amp; ML research conference took place this week, and there are a few workshops that look relevant for the prototype. I&#8217;m capturing them here for future reference.</p><ul><li><p> <a href="https://e-sars.github.io/">Embodied and Safe-Assured Robotic Systems</a> covers topics like &#8216;robust robotic perception in dynamic/uncertain environments&#8217;.</p></li><li><p><a href="https://mozhgan91.github.io/vlm4rwd-neurips25-ws/">Vision Language Models: Challenges of Real World Deployment</a> covers topics like making VLMs run efficiently on embedded systems, and effective training for tasks in visual scenes.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.theautonomyloop.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Autonomy Loop!  Subscribe for free to keep up with the latest developments as I build a smart robotic helper.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><br>Sub for free updates.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Apparently &#8216;regularization&#8217; is used in <a href="http://lyrics.com/lyrics/regularization">two songs</a>. Human ingenuity never ceases to amaze.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Taking over the world]]></title><description><![CDATA[But first you need a map]]></description><link>https://www.theautonomyloop.com/p/taking-over-the-world</link><guid isPermaLink="false">https://www.theautonomyloop.com/p/taking-over-the-world</guid><dc:creator><![CDATA[Ajay Kapal]]></dc:creator><pubDate>Tue, 25 Nov 2025 14:50:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vDxg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vDxg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vDxg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 424w, https://substackcdn.com/image/fetch/$s_!vDxg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 848w, https://substackcdn.com/image/fetch/$s_!vDxg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 1272w, https://substackcdn.com/image/fetch/$s_!vDxg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vDxg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png" width="312" height="308.5061590145577" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:883,&quot;width&quot;:893,&quot;resizeWidth&quot;:312,&quot;bytes&quot;:399666,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/179192903?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vDxg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 424w, https://substackcdn.com/image/fetch/$s_!vDxg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 848w, https://substackcdn.com/image/fetch/$s_!vDxg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 1272w, https://substackcdn.com/image/fetch/$s_!vDxg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3476ad-6dfd-4be5-88cf-1ba96af004b0_893x883.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">World domination with Pinky and The Brain</figcaption></figure></div><p>One of my favorite shows was <a href="https://en.wikipedia.org/wiki/Pinky_and_the_Brain">Pinky and The Brain</a> when I was growing up. In each episode, Brain hatches a plan to achieve his goal of world domination, a plan that inevitably fails, often due to his arrogance. Our robot has different (peaceful!) goals, and as far as I can tell, it isn&#8217;t arrogant. It will however have a hard time succeeding if it can&#8217;t navigate its surroundings. Before we cover navigation, let&#8217;s zoom out a little.  Here are the capabilities our robot needs, along with the supporting software and hardware components:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0DY7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0DY7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 424w, https://substackcdn.com/image/fetch/$s_!0DY7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 848w, https://substackcdn.com/image/fetch/$s_!0DY7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 1272w, https://substackcdn.com/image/fetch/$s_!0DY7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0DY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png" width="957" height="725" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:725,&quot;width&quot;:957,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/179192903?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0DY7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 424w, https://substackcdn.com/image/fetch/$s_!0DY7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 848w, https://substackcdn.com/image/fetch/$s_!0DY7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 1272w, https://substackcdn.com/image/fetch/$s_!0DY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea1c7d8-bfc8-4047-b7bc-4e15ea924f78_957x725.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Robot Function Diagram</figcaption></figure></div><p>This diagram summarizes the core capabilities that our robot needs to accomplish its task. It also illustrates that capabilities are built on top of software that works with the hardware to sense and act. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.theautonomyloop.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Autonomy Loop! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Navigation</h3><p>Now lets talk about Navigation, highlighted in the yellow parts of the diagram above. Navigation means building a map of our robot&#8217;s environment, and determining the robots position on the map. Our robot needs to navigate a changing environment (e.g. lighting conditions or encountering a new obstacle). Our robot should also be able to map all reachable areas and shouldn&#8217;t miss any spots. Fortunately, there is an approach that ticks all the boxes:  SLAM.</p><h5>What is SLAM?</h5><p>SLAM (Simultaneous Location and Mapping) is a technique that uses sensors like <a href="https://en.wikipedia.org/wiki/Lidar">LiDAR</a> (lasers that fire light pulses) or video cameras to build a map and determine the robots position in real time. &#8216;Real time&#8217; means the map is updated quickly enough for the robot to recognize changes in the environment (e.g. opening a door) and adapt accordingly.</p><p>How does it work?  The general approach is summarized below:</p><ol><li><p><em>Record what the sensor &#8216;sees&#8217;.</em>  This involves processing the raw data generated by the sensor and For example, LiDAR generates coordinates for objects in the environment by emitting lots of laser pulses and measuring how long it takes for each pulse to return to the robot. An (x,y,z) coordinate representing the location of the environment object is generated.  There are thousands of these coordinates which constitute a <em>point cloud</em>.</p></li><li><p><em>Update the position when the robot moves.</em> As the robot moves, the sensor measures the environment. The new sensor data is compared to previous sensor data (the existing map) to determine the robot&#8217;s position and orientation.</p></li><li><p>Errors in the robot&#8217;s location can accumulate and need to be managed. This can be done by keeping track of previously visited landmarks and using them to recalibrate.</p></li></ol><p>There are many different variants of SLAM that use different algorithms (filters/graphs/deep learning), sensors (LiDAR, sonar, video, inertial), and how much visual intelligence is used to understand the scene. </p><p>The most likely candidate for our robot is vSLAM, a version of SLAM that uses video cameras. Distance estimation is not as accurate as LiDAR, but it&#8217;s much less expensive, and offers much more visual data to build a semantic understanding of the environment. </p><p>This hopefully gives you a taste of SLAM. It&#8217;s a very deep topic, and one we&#8217;ll be revisiting as our robot progresses.</p><h4><strong>Progress</strong></h4><p>It&#8217;s been a short week as we were traveling last Friday. Most of this week has been spent focusing on the Deep Learning for Computer Vision course CS231n.  The first 4 lectures covered the following:</p><ul><li><p><em>linear and kNN classifiers</em> - to predict which class an image belongs to</p></li><li><p><em>loss functions -</em> to quantify the quality of the model weights</p></li><li><p><em>regularization</em> - to prevent a model from overfitting to the training data so it makes better predictions for new data that isn&#8217;t necessarily like the training data) </p></li><li><p><em>optimization</em> -  to determine model weights that minimize the loss function using gradient descent</p></li><li><p><em>neural networks and backpropogation - </em>on to the exciting stuff!  Multi layer neural networks are non-linear classifiers that enable more complex boundaries to more accurately separate off classes of data points from other classes.</p><ul><li><p>Analytically calculating the optimal weights (that minimize the loss function) for these networks is hard and resource intensive (huge matrices) so lets not do that.  Instead:</p><ul><li><p>Represent the network as a computational graph, where each node is a neuron </p></li><li><p>Use the chain rule to calculate how changes in each neuron&#8217;s inputs (<em>upstream gradients</em>) affect it&#8217;s outputs (<em>downstream gradients</em>). </p></li><li><p><em>back propagation</em>: Apply the chain rule recursively along the graph to compute gradients of all inputs/parameters/intermediates. Compute the gradient of the loss function wrt the inputs.</p></li></ul></li></ul></li></ul><p>It&#8217;s not all slides and lectures. I&#8217;ve been implementing linear and kNN classifiers in NumPy and enjoyed implementing a fully vectorized version of a loss function that resulted in a 100x speed up on my M4 mac mini!</p><h4>Links</h4><ul><li><p>These papers highlight differences between artificial neural nets and biology:</p><ul><li><p><a href="https://openaccess.thecvf.com/content_ICCV_2019/papers/Xie_Exploring_Randomly_Wired_Neural_Networks_for_Image_Recognition_ICCV_2019_paper.pdf">Exploring randomly wired neural networks for image recognition</a></p></li><li><p><a href="https://neurophysics.ucsd.edu/courses/physics_171/annurev.neuro.28.061604.135703.pdf">Dendritic Computation</a> </p></li><li><p><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11754790/">Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning</a></p></li></ul></li><li><p><a href="https://www.cell.com/neuron/fulltext/S0896-6273(21)00501-8?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0896627321005018%3Fshowall%3Dtrue">Single cortical neurons as deep artificial neural networks</a></p></li></ul><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.theautonomyloop.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Autonomy Loop! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Robots vs. babies...and the winner is....]]></title><description><![CDATA[Or, what it takes to be a useful robot]]></description><link>https://www.theautonomyloop.com/p/robots-vs-babiesand-the-winner-is</link><guid isPermaLink="false">https://www.theautonomyloop.com/p/robots-vs-babiesand-the-winner-is</guid><dc:creator><![CDATA[Ajay Kapal]]></dc:creator><pubDate>Fri, 14 Nov 2025 13:03:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!00EL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!00EL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!00EL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 424w, https://substackcdn.com/image/fetch/$s_!00EL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 848w, https://substackcdn.com/image/fetch/$s_!00EL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!00EL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!00EL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg" width="450" height="363" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:363,&quot;width&quot;:450,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!00EL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 424w, https://substackcdn.com/image/fetch/$s_!00EL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 848w, https://substackcdn.com/image/fetch/$s_!00EL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!00EL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f10272-972a-4c6d-a79a-e0fd6408b194_450x363.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">BBC&#8217;s 1938 adaptation of Karel &#268;apek&#8217;s Rossum&#8217;s Universal Robots.<strong> &#169; </strong>BBC</figcaption></figure></div><p>For a robot to be useful, it needs to <em>effectively </em>perform a task with minimal <em>supervision</em>. For example, a robotic vacuum cleaner isn&#8217;t <em>effective</em> if it isn&#8217;t strong enough to suck the dirt from your carpet, or it misses spots, or isn&#8217;t charged when you need it. It needs too much <em>supervision</em> when it needs to be picked up before it falls down the stairs, or it gets stuck while vacuuming a shag rug (this also goes to effectiveness).</p><p>Putting aside task-specific requirements, for a robot to be effective under minimal supervision, it has to be able to:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.theautonomyloop.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Autonomy Loop! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ol><li><p>navigate its environment</p></li><li><p>know where (and where not) to execute the task</p></li><li><p>adapt to changes in its environment</p></li><li><p>learn from experience and feedback</p></li><li><p>maintain the resources it needs to operate</p></li></ol><p>These requirements each merit a future post. For now, I&#8217;ll share some context about  the challenge they represent. How? By starting with an insult of course.</p><h4>Robots aren&#8217;t babies</h4><p>People generally don&#8217;t like it when you tell them they &#8216;act like a baby&#8217;. Shocking, right? But when it comes to robots, it&#8217;s a compliment. You see, robots are no match for a baby when it comes to learning about the physical world. <a href="https://www.youtube.com/watch?v=yUmDRxV0krg&amp;t=1137s">Yann LeCun recently referred to</a> this chart that tracks how infants develop their physical intuition:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rO4r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rO4r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 424w, https://substackcdn.com/image/fetch/$s_!rO4r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 848w, https://substackcdn.com/image/fetch/$s_!rO4r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 1272w, https://substackcdn.com/image/fetch/$s_!rO4r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rO4r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png" width="484" height="325.07462686567163" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1206,&quot;resizeWidth&quot;:484,&quot;bytes&quot;:204944,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/178423097?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rO4r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 424w, https://substackcdn.com/image/fetch/$s_!rO4r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 848w, https://substackcdn.com/image/fetch/$s_!rO4r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 1272w, https://substackcdn.com/image/fetch/$s_!rO4r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937b91c0-9fef-4bf9-98e0-a5b9cdd871ce_1206x810.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9442261">From IntPhys 2019: A Benchmark for Visual Intuitive Physics Understanding</a></figcaption></figure></div><p>Each of these building blocks forms our &#8216;common sense&#8217; about our physical world. And we figure it out without any experience to start with. It&#8217;s remarkable. A baby&#8217;s perception develops as she explores and interacts with her environment. She experiences the world through her senses, or putting it in ML-terms, she is trained on the dataset generated from her senses. So how large is her dataset?</p><p>Each eye transmits data to her visual cortex at a rate of a about 1 MB/s<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. Assuming she is awake for 10h each day (babies like to sleep), her eyes generate 28TB of visual data in the first year. In 4 years her training data set is larger than the one OpenAI trained GPT-3<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> with! It takes a lot of data before she expects a cup will fall the table when pushed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RTTa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RTTa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RTTa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RTTa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RTTa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RTTa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg" width="380" height="319.8828125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1024,&quot;resizeWidth&quot;:380,&quot;bytes&quot;:213012,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/178423097?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d4f372-b7ea-44d6-b639-040b6de43740_1024x1024.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!RTTa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RTTa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RTTa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RTTa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ab0f7d6-4193-41d0-b8ed-84af085f7f82_1024x862.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">&#8220;Back to using sippy cups until you understand gravity!&#8221;</figcaption></figure></div><p>The type of data used for training also matters. GPT-3 is trained on a huge amount of text, and that helps when doing things like solving math problems. But robots have to navigate a 3D world with varying and unpredictable conditions like changing weather, lighting, new objects. Training a model on a large corpus of text isn&#8217;t going to cut it - you need video.</p><p>And there are other considerations like planning, remembering, and safety. So a lot of fun challenges ahead!  </p><h4>Progress</h4><p>This week, I&#8217;ve set up a <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit/">Jetson Orin Nano</a> with an <a href="https://www.waveshare.com/wiki/IMX219-83_Stereo_Camera">IMX-219 stereo camera</a>. I&#8217;m planning to build the prototype around this hardware as it hits a sweet spot on price and capability. It took some tinkering but everything seems to be working now.  </p><p>I&#8217;ve been auditing Deep Learning for Computer Vision (Stanford CS231N), and have set up Jupyter Notebook and have started on the first assignment. It&#8217;s really fun and I&#8217;m looking forward to going through the material in the coming weeks.</p><p>I have also been auditing Deep Learning I (Stanford CS230) to reinforce what I learn in CS231n and fill in gaps.</p><h4>Cool stuff I&#8217;ve been reading/listening to/watching</h4><p><a href="https://tensor-logic.org/">Tensor Logic: The Language of AI</a> - an interesting paper by Pedro Domingos about unifying logic programming and tensor algebra to make one AI language to rule them all. Or at least you should be able to better inspect and guard against things like hallucinations. Looking forward to kicking the tires when the language is released.</p><p><a href="https://www.youtube.com/watch?v=eXbrt_2Fvgk">Cheeky Pint interview w/ Kyle Vogt</a> - Kyle cofounded Twitch and Cruise. He&#8217;s now the CEO of The Bot Company which specializes in home robotics. He mentioned that robots have prioritized repeatability because of technology and application area (manufacturing). Modern robots prioritize adaptability in a changing home environment, and do so through neural nets. </p><p>I have very much been thinking along the same lines because of past experience: I worked with image recognition, but was discouraged by the accuracy and complexity of approaches at the time (SIFT feature detection, bag of words, Harris corner detection, etc) that didn&#8217;t seem to operate like our visual system. AlexNet winning the ImageNet competition in 2012 was a watershed moment showing that data combined with convolutional neural nets were a better approach to image classification. The old approaches are now obsolete. <em>I see the same thing happening with robotics.</em></p><p><a href="https://neurips.cc/virtual/2024/invited-talk/101127">From Seeing to Doing</a> - Fei-Fei Li&#8217;s keynote at NeurIPS was almost a year ago but feels very relevant today. She gives an overview of Visual Intelligence, a framework  composed of:</p><ul><li><p>Understanding - semantic labeling of the contents based on visible pixels which requires object recognition, segmentation, video classification, etc.</p></li><li><p>Reasoning - inferring or reasoning with information beyond visible pixels which requires understanding objects and their relationships.</p></li><li><p>Generation - creating new pixels or altering existing ones</p></li></ul><p>She raises the &#8216;flat earth&#8217; problem, where models are trained on data made up of 2D artifacts like images and text that doesn&#8217;t correspond to our 3D world.</p><p>She then extends the idea of Visual Intelligence to Spatial Intelligence in 3D.</p><p>She calls out the need (and work being done) to create an analogue of ImageNet that lets researchers and professionals benchmark approaches to achieve Spatial intelligence. She makes a very clear point that robotics research is focused on toy problems with simple environments and tasks, without standard benchmarks and that this doesn&#8217;t address challenges in the real world that is complex, dynamic, varied, and interactive.</p><p>Lots of good coverage on datasets to address some of these shortcomings, along with the research her group is doing. I&#8217;ve made a note to myself to follow up on some of these topics as they relate to the prototype. I&#8217;ll share what I learn. See you next week!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.theautonomyloop.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Autonomy Loop. </p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://www.newscientist.com/article/dn9633-calculating-the-speed-of-sight/">Reilly, Michael, &#8220;Calculating The Speed of Sight&#8221;, New Scientist</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://arxiv.org/pdf/2005.14165">OpenAI&#8217;s paper &#8216;Language Models are Few-Shot Learners&#8217; describing GPT-3 </a></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Welcome to The Autonomy Loop!]]></title><description><![CDATA[The greatest newsletter about building AI-enabled robots, maybe]]></description><link>https://www.theautonomyloop.com/p/welcome-to-the-autonomy-loop</link><guid isPermaLink="false">https://www.theautonomyloop.com/p/welcome-to-the-autonomy-loop</guid><dc:creator><![CDATA[Ajay Kapal]]></dc:creator><pubDate>Tue, 11 Nov 2025 14:15:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HW7C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HW7C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HW7C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 424w, https://substackcdn.com/image/fetch/$s_!HW7C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 848w, https://substackcdn.com/image/fetch/$s_!HW7C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!HW7C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HW7C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png" width="374" height="399.48817802503476" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1536,&quot;width&quot;:1438,&quot;resizeWidth&quot;:374,&quot;bytes&quot;:3802789,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.theautonomyloop.com/i/178211168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76709825-78f9-4362-a6c0-59fef9268393_1438x1907.bmp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HW7C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 424w, https://substackcdn.com/image/fetch/$s_!HW7C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 848w, https://substackcdn.com/image/fetch/$s_!HW7C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!HW7C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5e20e3d-82f1-4953-b87f-e83d86d14917_1438x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">&#8220;Technological Breakthrough&#8221;, by Robert Tinney</figcaption></figure></div><h4>Feels good to be writing and building </h4><p>For a decade, I was a product manager at Netflix, Google, and Meta. PMs write a lot, and I was no exception. Fortunately, I like writing. I&#8217;ve liked it ever since starting a weekly newsletter at Netflix that covered the streaming industry. It became popular, and I enjoyed sharing what I learned.</p><p>Before that, I spent over a decade as an engineer working on everything from iOS apps to networking equipment. There is a thrill you get from turning an idea into reality and I suspect many engineers feel the same way. </p><p>Writing and building are like mixing peanut butter and chocolate (feature request for Substack: let subscribers send Reese&#8217;s instead of money). I get to share what I learn as I come up to speed on an exciting technical frontier full of opportunity. It&#8217;s also motivating to report each week on progress towards building a smart, useful robot.</p><h4>About the name</h4><p>Why is this blog called The Autonomy Loop anyways? Well, a smart, useful robot has to adapt to a continuously changing environment by itself. An autonomy loop is a fancy name for a mechanism that lets a robot do that by observing its environment, deciding what to do, acting on the decision, and learning from the result - all without needing a babysitter. I like the name because it captures what this newsletter is about: the intersection of AI and Robotics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_2pG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_2pG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 424w, https://substackcdn.com/image/fetch/$s_!_2pG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 848w, https://substackcdn.com/image/fetch/$s_!_2pG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 1272w, https://substackcdn.com/image/fetch/$s_!_2pG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_2pG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png" width="462" height="425.34769833496574" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:940,&quot;width&quot;:1021,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:1460164,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://autonomyloop.substack.com/i/178211168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_2pG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 424w, https://substackcdn.com/image/fetch/$s_!_2pG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 848w, https://substackcdn.com/image/fetch/$s_!_2pG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 1272w, https://substackcdn.com/image/fetch/$s_!_2pG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8e19e9-0ef2-445e-b60c-662eef87c9ce_1021x940.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Robby, I think you&#8217;re taking this autonomy thing too far&#8230;</figcaption></figure></div><h4>What you can expect</h4><p>When I was a PM, part of my job was writing about strategy, go-to-market, product capabilities, etc. I&#8217;ll write about some of these things from time to time, but they aren&#8217;t the main focus. At The Autonomy Loop, you can expect a weekly newsletter that covers:</p><ul><li><p>my progress as I attempt to build an AI-enabled robot prototype that solves a real problem (for me and possibly others)</p></li><li><p>posts about helpful resources I find, including research papers, online courses, and technical articles</p></li><li><p>posts that go deeper into a technical concept</p></li><li><p>share-outs of articles, podcasts, and books</p></li></ul><p>It&#8217;s probably worth clarifying a couple of things: </p><ul><li><p>I am not an expert. Early posts will focus on coming up to speed, but the intent is to shift as quickly as I can to more practical topics. </p></li><li><p>Building an AI-enabled robot has many interesting technical aspects but I don&#8217;t want it to become a research project. The goal is to build a prototype that solves a specific problem.</p></li></ul><p>If all of this sounds like your cup of coffee, then this blog is for you. Welcome!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.theautonomyloop.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"><strong>Subscribe to The Autonomous Loop</strong></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>